Обсуждение: Make COPY format extendable: Extract COPY TO format implementations
Hi,
I want to work on making COPY format extendable. I attach
the first patch for it. I'll send more patches after this is
merged.
Background:
Currently, COPY TO/FROM supports only "text", "csv" and
"binary" formats. There are some requests to support more
COPY formats. For example:
* 2023-11: JSON and JSON lines [1]
* 2022-04: Apache Arrow [2]
* 2018-02: Apache Avro, Apache Parquet and Apache ORC [3]
(FYI: I want to add support for Apache Arrow.)
There were discussions how to add support for more formats. [3][4]
In these discussions, we got a consensus about making COPY
format extendable.
But it seems that nobody works on this yet. So I want to
work on this. (If there is anyone who wants to work on this
together, I'm happy.)
Summary:
The attached patch introduces CopyToFormatOps struct that is
similar to TupleTableSlotOps for TupleTableSlot but
CopyToFormatOps is for COPY TO format. CopyToFormatOps has
routines to implement a COPY TO format.
The attached patch doesn't change:
* the current behavior (all existing tests are still passed
  without changing them)
* the existing "text", "csv" and "binary" format output
  implementations including local variable names (the
  attached patch just move them and adjust indent)
* performance (no significant loss of performance)
In other words, this is just a refactoring for further
changes to make COPY format extendable. If I use "complete
the task and then request reviews for it" approach, it will
be difficult to review because changes for it will be
large. So I want to work on this step by step. Is it
acceptable?
TODOs that should be done in subsequent patches:
* Add some CopyToState readers such as CopyToStateGetDest(),
  CopyToStateGetAttnums() and CopyToStateGetOpts()
  (We will need to consider which APIs should be exported.)
  (This is for implemeing COPY TO format by extension.)
* Export CopySend*() in src/backend/commands/copyto.c
  (This is for implemeing COPY TO format by extension.)
* Add API to register a new COPY TO format implementation
* Add "CREATE XXX" to register a new COPY TO format (or COPY
  TO/FROM format) implementation
  ("CREATE COPY HANDLER" was suggested in [5].)
* Same for COPY FROM
Performance:
We got a consensus about making COPY format extendable but
we should care about performance. [6]
> I think that step 1 ought to be to convert the existing
> formats into plug-ins, and demonstrate that there's no
> significant loss of performance.
So I measured COPY TO time with/without this change. You can
see there is no significant loss of performance.
Data: Random 32 bit integers:
    CREATE TABLE data (int32 integer);
    INSERT INTO data
      SELECT random() * 10000
        FROM generate_series(1, ${n_records});
The number of records: 100K, 1M and 10M
100K without this change:
    format,elapsed time (ms)
    text,22.527
    csv,23.822
    binary,24.806
100K with this change:
    format,elapsed time (ms)
    text,22.919
    csv,24.643
    binary,24.705
1M without this change:
    format,elapsed time (ms)
    text,223.457
    csv,233.583
    binary,242.687
1M with this change:
    format,elapsed time (ms)
    text,224.591
    csv,233.964
    binary,247.164
10M without this change:
    format,elapsed time (ms)
    text,2330.383
    csv,2411.394
    binary,2590.817
10M with this change:
    format,elapsed time (ms)
    text,2231.307
    csv,2408.067
    binary,2473.617
[1]:
https://www.postgresql.org/message-id/flat/24e3ee88-ec1e-421b-89ae-8a47ee0d2df1%40joeconway.com#a5e6b8829f9a74dfc835f6f29f2e44c5
[2]:
https://www.postgresql.org/message-id/flat/CAGrfaBVyfm0wPzXVqm0%3Dh5uArYh9N_ij%2BsVpUtDHqkB%3DVyB3jw%40mail.gmail.com
[3]: https://www.postgresql.org/message-id/flat/20180210151304.fonjztsynewldfba%40gmail.com
[4]: https://www.postgresql.org/message-id/flat/3741749.1655952719%40sss.pgh.pa.us#2bb7af4a3d2c7669f9a49808d777a20d
[5]: https://www.postgresql.org/message-id/20180211211235.5x3jywe5z3lkgcsr%40alap3.anarazel.de
[6]: https://www.postgresql.org/message-id/3741749.1655952719%40sss.pgh.pa.us
Thanks,
-- 
kou
From 7f00b2b0fb878ae1c687c151dd751512d02ed83e Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Mon, 4 Dec 2023 12:32:54 +0900
Subject: [PATCH v1] Extract COPY TO format implementations
This is a part of making COPY format extendable. See also these past
discussions:
* New Copy Formats - avro/orc/parquet:
  https://www.postgresql.org/message-id/flat/20180210151304.fonjztsynewldfba%40gmail.com
* Make COPY extendable in order to support Parquet and other formats:
  https://www.postgresql.org/message-id/flat/CAJ7c6TM6Bz1c3F04Cy6%2BSzuWfKmr0kU8c_3Stnvh_8BR0D6k8Q%40mail.gmail.com
This doesn't change the current behavior. This just introduces
CopyToFormatOps, which just has function pointers of format
implementation like TupleTableSlotOps, and use it for existing "text",
"csv" and "binary" format implementations.
Note that CopyToFormatOps can't be used from extensions yet because
CopySend*() aren't exported yet. Extensions can't send formatted data
to a destination without CopySend*(). They will be exported by
subsequent patches.
Here is a benchmark result with/without this change because there was
a discussion that we should care about performance regression:
https://www.postgresql.org/message-id/3741749.1655952719%40sss.pgh.pa.us
> I think that step 1 ought to be to convert the existing formats into
> plug-ins, and demonstrate that there's no significant loss of
> performance.
You can see that there is no significant loss of performance:
Data: Random 32 bit integers:
    CREATE TABLE data (int32 integer);
    INSERT INTO data
      SELECT random() * 10000
        FROM generate_series(1, ${n_records});
The number of records: 100K, 1M and 10M
100K without this change:
    format,elapsed time (ms)
    text,22.527
    csv,23.822
    binary,24.806
100K with this change:
    format,elapsed time (ms)
    text,22.919
    csv,24.643
    binary,24.705
1M without this change:
    format,elapsed time (ms)
    text,223.457
    csv,233.583
    binary,242.687
1M with this change:
    format,elapsed time (ms)
    text,224.591
    csv,233.964
    binary,247.164
10M without this change:
    format,elapsed time (ms)
    text,2330.383
    csv,2411.394
    binary,2590.817
10M with this change:
    format,elapsed time (ms)
    text,2231.307
    csv,2408.067
    binary,2473.617
---
 src/backend/commands/copy.c   |   8 +
 src/backend/commands/copyto.c | 387 +++++++++++++++++++++-------------
 src/include/commands/copy.h   |  27 ++-
 3 files changed, 266 insertions(+), 156 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfad47b562..27a1add456 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -427,6 +427,8 @@ ProcessCopyOptions(ParseState *pstate,
 
     opts_out->file_encoding = -1;
 
+    /* Text is the default format. */
+    opts_out->to_ops = CopyToFormatOpsText;
     /* Extract options from the statement node tree */
     foreach(option, options)
     {
@@ -442,9 +444,15 @@ ProcessCopyOptions(ParseState *pstate,
             if (strcmp(fmt, "text") == 0)
                  /* default format */ ;
             else if (strcmp(fmt, "csv") == 0)
+            {
                 opts_out->csv_mode = true;
+                opts_out->to_ops = CopyToFormatOpsCSV;
+            }
             else if (strcmp(fmt, "binary") == 0)
+            {
                 opts_out->binary = true;
+                opts_out->to_ops = CopyToFormatOpsBinary;
+            }
             else
                 ereport(ERROR,
                         (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index c66a047c4a..295e96dbc5 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -131,6 +131,238 @@ static void CopySendEndOfRow(CopyToState cstate);
 static void CopySendInt32(CopyToState cstate, int32 val);
 static void CopySendInt16(CopyToState cstate, int16 val);
 
+/*
+ * CopyToFormatOps implementations.
+ */
+
+/*
+ * CopyToFormatOps implementation for "text" and "csv". CopyToFormatText*()
+ * refer cstate->opts.csv_mode and change their behavior. We can split this
+ * implementation and stop referring cstate->opts.csv_mode later.
+ */
+
+static void
+CopyToFormatTextSendEndOfRow(CopyToState cstate)
+{
+    switch (cstate->copy_dest)
+    {
+    case COPY_FILE:
+        /* Default line termination depends on platform */
+#ifndef WIN32
+        CopySendChar(cstate, '\n');
+#else
+        CopySendString(cstate, "\r\n");
+#endif
+        break;
+    case COPY_FRONTEND:
+        /* The FE/BE protocol uses \n as newline for all platforms */
+        CopySendChar(cstate, '\n');
+        break;
+    default:
+        break;
+    }
+    CopySendEndOfRow(cstate);
+}
+
+static void
+CopyToFormatTextStart(CopyToState cstate, TupleDesc tupDesc)
+{
+    int            num_phys_attrs;
+    ListCell   *cur;
+
+    num_phys_attrs = tupDesc->natts;
+    /* Get info about the columns we need to process. */
+    cstate->out_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo));
+    foreach(cur, cstate->attnumlist)
+    {
+        int            attnum = lfirst_int(cur);
+        Oid            out_func_oid;
+        bool        isvarlena;
+        Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
+
+        getTypeOutputInfo(attr->atttypid, &out_func_oid, &isvarlena);
+        fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
+    }
+
+    /*
+     * For non-binary copy, we need to convert null_print to file
+     * encoding, because it will be sent directly with CopySendString.
+     */
+    if (cstate->need_transcoding)
+        cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print,
+                                                          cstate->opts.null_print_len,
+                                                          cstate->file_encoding);
+
+    /* if a header has been requested send the line */
+    if (cstate->opts.header_line)
+    {
+        bool        hdr_delim = false;
+
+        foreach(cur, cstate->attnumlist)
+        {
+            int            attnum = lfirst_int(cur);
+            char       *colname;
+
+            if (hdr_delim)
+                CopySendChar(cstate, cstate->opts.delim[0]);
+            hdr_delim = true;
+
+            colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
+
+            if (cstate->opts.csv_mode)
+                CopyAttributeOutCSV(cstate, colname, false,
+                                    list_length(cstate->attnumlist) == 1);
+            else
+                CopyAttributeOutText(cstate, colname);
+        }
+
+        CopyToFormatTextSendEndOfRow(cstate);
+    }
+}
+
+static void
+CopyToFormatTextOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+    bool        need_delim = false;
+    FmgrInfo   *out_functions = cstate->out_functions;
+    ListCell   *cur;
+
+    foreach(cur, cstate->attnumlist)
+    {
+        int            attnum = lfirst_int(cur);
+        Datum        value = slot->tts_values[attnum - 1];
+        bool        isnull = slot->tts_isnull[attnum - 1];
+
+        if (need_delim)
+            CopySendChar(cstate, cstate->opts.delim[0]);
+        need_delim = true;
+
+        if (isnull)
+        {
+            CopySendString(cstate, cstate->opts.null_print_client);
+        }
+        else
+        {
+            char       *string;
+
+            string = OutputFunctionCall(&out_functions[attnum - 1], value);
+            if (cstate->opts.csv_mode)
+                CopyAttributeOutCSV(cstate, string,
+                                    cstate->opts.force_quote_flags[attnum - 1],
+                                    list_length(cstate->attnumlist) == 1);
+            else
+                CopyAttributeOutText(cstate, string);
+        }
+    }
+
+    CopyToFormatTextSendEndOfRow(cstate);
+}
+
+static void
+CopyToFormatTextEnd(CopyToState cstate)
+{
+}
+
+/*
+ * CopyToFormatOps implementation for "binary".
+ */
+
+static void
+CopyToFormatBinaryStart(CopyToState cstate, TupleDesc tupDesc)
+{
+    int            num_phys_attrs;
+    ListCell   *cur;
+
+    num_phys_attrs = tupDesc->natts;
+    /* Get info about the columns we need to process. */
+    cstate->out_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo));
+    foreach(cur, cstate->attnumlist)
+    {
+        int            attnum = lfirst_int(cur);
+        Oid            out_func_oid;
+        bool        isvarlena;
+        Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
+
+        getTypeBinaryOutputInfo(attr->atttypid, &out_func_oid, &isvarlena);
+        fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
+    }
+
+    {
+        /* Generate header for a binary copy */
+        int32        tmp;
+
+        /* Signature */
+        CopySendData(cstate, BinarySignature, 11);
+        /* Flags field */
+        tmp = 0;
+        CopySendInt32(cstate, tmp);
+        /* No header extension */
+        tmp = 0;
+        CopySendInt32(cstate, tmp);
+    }
+}
+
+static void
+CopyToFormatBinaryOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+    FmgrInfo   *out_functions = cstate->out_functions;
+    ListCell   *cur;
+
+    /* Binary per-tuple header */
+    CopySendInt16(cstate, list_length(cstate->attnumlist));
+
+    foreach(cur, cstate->attnumlist)
+    {
+        int            attnum = lfirst_int(cur);
+        Datum        value = slot->tts_values[attnum - 1];
+        bool        isnull = slot->tts_isnull[attnum - 1];
+
+        if (isnull)
+        {
+            CopySendInt32(cstate, -1);
+        }
+        else
+        {
+            bytea       *outputbytes;
+
+            outputbytes = SendFunctionCall(&out_functions[attnum - 1], value);
+            CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
+            CopySendData(cstate, VARDATA(outputbytes),
+                         VARSIZE(outputbytes) - VARHDRSZ);
+        }
+    }
+
+    CopySendEndOfRow(cstate);
+}
+
+static void
+CopyToFormatBinaryEnd(CopyToState cstate)
+{
+    /* Generate trailer for a binary copy */
+    CopySendInt16(cstate, -1);
+    /* Need to flush out the trailer */
+    CopySendEndOfRow(cstate);
+}
+
+const CopyToFormatOps CopyToFormatOpsText = {
+    .start = CopyToFormatTextStart,
+    .one_row = CopyToFormatTextOneRow,
+    .end = CopyToFormatTextEnd,
+};
+
+/*
+ * We can use the same CopyToFormatOps for both of "text" and "csv" because
+ * CopyToFormatText*() refer cstate->opts.csv_mode and change their
+ * behavior. We can split the implementations and stop referring
+ * cstate->opts.csv_mode later.
+ */
+const CopyToFormatOps CopyToFormatOpsCSV = CopyToFormatOpsText;
+
+const CopyToFormatOps CopyToFormatOpsBinary = {
+    .start = CopyToFormatBinaryStart,
+    .one_row = CopyToFormatBinaryOneRow,
+    .end = CopyToFormatBinaryEnd,
+};
 
 /*
  * Send copy start/stop messages for frontend copies.  These have changed
@@ -198,16 +430,6 @@ CopySendEndOfRow(CopyToState cstate)
     switch (cstate->copy_dest)
     {
         case COPY_FILE:
-            if (!cstate->opts.binary)
-            {
-                /* Default line termination depends on platform */
-#ifndef WIN32
-                CopySendChar(cstate, '\n');
-#else
-                CopySendString(cstate, "\r\n");
-#endif
-            }
-
             if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1,
                        cstate->copy_file) != 1 ||
                 ferror(cstate->copy_file))
@@ -242,10 +464,6 @@ CopySendEndOfRow(CopyToState cstate)
             }
             break;
         case COPY_FRONTEND:
-            /* The FE/BE protocol uses \n as newline for all platforms */
-            if (!cstate->opts.binary)
-                CopySendChar(cstate, '\n');
-
             /* Dump the accumulated row as one CopyData message */
             (void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len);
             break;
@@ -748,8 +966,6 @@ DoCopyTo(CopyToState cstate)
     bool        pipe = (cstate->filename == NULL && cstate->data_dest_cb == NULL);
     bool        fe_copy = (pipe && whereToSendOutput == DestRemote);
     TupleDesc    tupDesc;
-    int            num_phys_attrs;
-    ListCell   *cur;
     uint64        processed;
 
     if (fe_copy)
@@ -759,32 +975,11 @@ DoCopyTo(CopyToState cstate)
         tupDesc = RelationGetDescr(cstate->rel);
     else
         tupDesc = cstate->queryDesc->tupDesc;
-    num_phys_attrs = tupDesc->natts;
     cstate->opts.null_print_client = cstate->opts.null_print;    /* default */
 
     /* We use fe_msgbuf as a per-row buffer regardless of copy_dest */
     cstate->fe_msgbuf = makeStringInfo();
 
-    /* Get info about the columns we need to process. */
-    cstate->out_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo));
-    foreach(cur, cstate->attnumlist)
-    {
-        int            attnum = lfirst_int(cur);
-        Oid            out_func_oid;
-        bool        isvarlena;
-        Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
-
-        if (cstate->opts.binary)
-            getTypeBinaryOutputInfo(attr->atttypid,
-                                    &out_func_oid,
-                                    &isvarlena);
-        else
-            getTypeOutputInfo(attr->atttypid,
-                              &out_func_oid,
-                              &isvarlena);
-        fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
-    }
-
     /*
      * Create a temporary memory context that we can reset once per row to
      * recover palloc'd memory.  This avoids any problems with leaks inside
@@ -795,57 +990,7 @@ DoCopyTo(CopyToState cstate)
                                                "COPY TO",
                                                ALLOCSET_DEFAULT_SIZES);
 
-    if (cstate->opts.binary)
-    {
-        /* Generate header for a binary copy */
-        int32        tmp;
-
-        /* Signature */
-        CopySendData(cstate, BinarySignature, 11);
-        /* Flags field */
-        tmp = 0;
-        CopySendInt32(cstate, tmp);
-        /* No header extension */
-        tmp = 0;
-        CopySendInt32(cstate, tmp);
-    }
-    else
-    {
-        /*
-         * For non-binary copy, we need to convert null_print to file
-         * encoding, because it will be sent directly with CopySendString.
-         */
-        if (cstate->need_transcoding)
-            cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print,
-                                                              cstate->opts.null_print_len,
-                                                              cstate->file_encoding);
-
-        /* if a header has been requested send the line */
-        if (cstate->opts.header_line)
-        {
-            bool        hdr_delim = false;
-
-            foreach(cur, cstate->attnumlist)
-            {
-                int            attnum = lfirst_int(cur);
-                char       *colname;
-
-                if (hdr_delim)
-                    CopySendChar(cstate, cstate->opts.delim[0]);
-                hdr_delim = true;
-
-                colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
-
-                if (cstate->opts.csv_mode)
-                    CopyAttributeOutCSV(cstate, colname, false,
-                                        list_length(cstate->attnumlist) == 1);
-                else
-                    CopyAttributeOutText(cstate, colname);
-            }
-
-            CopySendEndOfRow(cstate);
-        }
-    }
+    cstate->opts.to_ops.start(cstate, tupDesc);
 
     if (cstate->rel)
     {
@@ -884,13 +1029,7 @@ DoCopyTo(CopyToState cstate)
         processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
     }
 
-    if (cstate->opts.binary)
-    {
-        /* Generate trailer for a binary copy */
-        CopySendInt16(cstate, -1);
-        /* Need to flush out the trailer */
-        CopySendEndOfRow(cstate);
-    }
+    cstate->opts.to_ops.end(cstate);
 
     MemoryContextDelete(cstate->rowcontext);
 
@@ -906,71 +1045,15 @@ DoCopyTo(CopyToState cstate)
 static void
 CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 {
-    bool        need_delim = false;
-    FmgrInfo   *out_functions = cstate->out_functions;
     MemoryContext oldcontext;
-    ListCell   *cur;
-    char       *string;
 
     MemoryContextReset(cstate->rowcontext);
     oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
 
-    if (cstate->opts.binary)
-    {
-        /* Binary per-tuple header */
-        CopySendInt16(cstate, list_length(cstate->attnumlist));
-    }
-
     /* Make sure the tuple is fully deconstructed */
     slot_getallattrs(slot);
 
-    foreach(cur, cstate->attnumlist)
-    {
-        int            attnum = lfirst_int(cur);
-        Datum        value = slot->tts_values[attnum - 1];
-        bool        isnull = slot->tts_isnull[attnum - 1];
-
-        if (!cstate->opts.binary)
-        {
-            if (need_delim)
-                CopySendChar(cstate, cstate->opts.delim[0]);
-            need_delim = true;
-        }
-
-        if (isnull)
-        {
-            if (!cstate->opts.binary)
-                CopySendString(cstate, cstate->opts.null_print_client);
-            else
-                CopySendInt32(cstate, -1);
-        }
-        else
-        {
-            if (!cstate->opts.binary)
-            {
-                string = OutputFunctionCall(&out_functions[attnum - 1],
-                                            value);
-                if (cstate->opts.csv_mode)
-                    CopyAttributeOutCSV(cstate, string,
-                                        cstate->opts.force_quote_flags[attnum - 1],
-                                        list_length(cstate->attnumlist) == 1);
-                else
-                    CopyAttributeOutText(cstate, string);
-            }
-            else
-            {
-                bytea       *outputbytes;
-
-                outputbytes = SendFunctionCall(&out_functions[attnum - 1],
-                                               value);
-                CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
-                CopySendData(cstate, VARDATA(outputbytes),
-                             VARSIZE(outputbytes) - VARHDRSZ);
-            }
-        }
-    }
-
-    CopySendEndOfRow(cstate);
+    cstate->opts.to_ops.one_row(cstate, slot);
 
     MemoryContextSwitchTo(oldcontext);
 }
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index f2cca0b90b..6b5231b2f3 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -30,6 +30,28 @@ typedef enum CopyHeaderChoice
     COPY_HEADER_MATCH,
 } CopyHeaderChoice;
 
+/* These are private in commands/copy[from|to].c */
+typedef struct CopyFromStateData *CopyFromState;
+typedef struct CopyToStateData *CopyToState;
+
+/* Routines for a COPY TO format implementation. */
+typedef struct CopyToFormatOps
+{
+    /* Called when COPY TO is started. This will send a header. */
+    void        (*start) (CopyToState cstate, TupleDesc tupDesc);
+
+    /* Copy one row for COPY TO. */
+    void        (*one_row) (CopyToState cstate, TupleTableSlot *slot);
+
+    /* Called when COPY TO is ended. This will send a trailer. */
+    void        (*end) (CopyToState cstate);
+} CopyToFormatOps;
+
+/* Predefined CopyToFormatOps for "text", "csv" and "binary". */
+extern PGDLLIMPORT const CopyToFormatOps CopyToFormatOpsText;
+extern PGDLLIMPORT const CopyToFormatOps CopyToFormatOpsCSV;
+extern PGDLLIMPORT const CopyToFormatOps CopyToFormatOpsBinary;
+
 /*
  * A struct to hold COPY options, in a parsed form. All of these are related
  * to formatting, except for 'freeze', which doesn't really belong here, but
@@ -63,12 +85,9 @@ typedef struct CopyFormatOptions
     bool       *force_null_flags;    /* per-column CSV FN flags */
     bool        convert_selectively;    /* do selective binary conversion? */
     List       *convert_select; /* list of column names (can be NIL) */
+    CopyToFormatOps to_ops;        /* how to format to */
 } CopyFormatOptions;
 
-/* These are private in commands/copy[from|to].c */
-typedef struct CopyFromStateData *CopyFromState;
-typedef struct CopyToStateData *CopyToState;
-
 typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
 typedef void (*copy_data_dest_cb) (void *data, int len);
 
-- 
2.40.1
			
		On Mon, Dec 04, 2023 at 03:35:48PM +0900, Sutou Kouhei wrote:
> I want to work on making COPY format extendable. I attach
> the first patch for it. I'll send more patches after this is
> merged.
Given the current discussion about adding JSON, I think this could be a
nice bit of refactoring that could ultimately open the door to providing
other COPY formats via shared libraries.
> In other words, this is just a refactoring for further
> changes to make COPY format extendable. If I use "complete
> the task and then request reviews for it" approach, it will
> be difficult to review because changes for it will be
> large. So I want to work on this step by step. Is it
> acceptable?
I think it makes sense to do this part independently, but we should be
careful to design this with the follow-up tasks in mind.
> So I measured COPY TO time with/without this change. You can
> see there is no significant loss of performance.
> 
> Data: Random 32 bit integers:
> 
>     CREATE TABLE data (int32 integer);
>     INSERT INTO data
>       SELECT random() * 10000
>         FROM generate_series(1, ${n_records});
Seems encouraging.  I assume the performance concerns stem from the use of
function pointers.  Or was there something else?
-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
			
		Hi,
Thanks for replying to this proposal!
In <20231205182458.GC2757816@nathanxps13>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Tue, 5 Dec 2023 12:24:58 -0600,
  Nathan Bossart <nathandbossart@gmail.com> wrote:
> I think it makes sense to do this part independently, but we should be
> careful to design this with the follow-up tasks in mind.
OK. I'll keep updating the "TODOs" section in the original
e-mail. It also includes design in the follow-up tasks. We
can discuss the design separately from the patches
submitting. (The current submitted patch just focuses on
refactoring but we can discuss the final design.)
> I assume the performance concerns stem from the use of
> function pointers.  Or was there something else?
I think so too.
The original e-mail that mentioned the performance concern
[1] didn't say about the reason but the use of function
pointers might be concerned.
If the currently supported formats ("text", "csv" and
"binary") are implemented as an extension, it may have more
concerns but we will keep them as built-in formats for
compatibility. So I think that no more concerns exist for
these formats.
[1]: https://www.postgresql.org/message-id/flat/3741749.1655952719%40sss.pgh.pa.us#2bb7af4a3d2c7669f9a49808d777a20d
Thanks,
-- 
kou
			
		On Wed, Dec 6, 2023 at 10:45 AM Sutou Kouhei <kou@clear-code.com> wrote:
>
> Hi,
>
> Thanks for replying to this proposal!
>
> In <20231205182458.GC2757816@nathanxps13>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Tue, 5 Dec 2023 12:24:58 -0600,
>   Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> > I think it makes sense to do this part independently, but we should be
> > careful to design this with the follow-up tasks in mind.
>
> OK. I'll keep updating the "TODOs" section in the original
> e-mail. It also includes design in the follow-up tasks. We
> can discuss the design separately from the patches
> submitting. (The current submitted patch just focuses on
> refactoring but we can discuss the final design.)
>
> > I assume the performance concerns stem from the use of
> > function pointers.  Or was there something else?
>
> I think so too.
>
> The original e-mail that mentioned the performance concern
> [1] didn't say about the reason but the use of function
> pointers might be concerned.
>
> If the currently supported formats ("text", "csv" and
> "binary") are implemented as an extension, it may have more
> concerns but we will keep them as built-in formats for
> compatibility. So I think that no more concerns exist for
> these formats.
>
For the modern formats(parquet, orc, avro, etc.), will they be
implemented as extensions or in core?
The patch looks good except for a pair of extra curly braces.
>
> [1]: https://www.postgresql.org/message-id/flat/3741749.1655952719%40sss.pgh.pa.us#2bb7af4a3d2c7669f9a49808d777a20d
>
>
> Thanks,
> --
> kou
>
>
--
Regards
Junwang Zhao
			
		Hi, In <CAEG8a3Jf7kPV3ez5OHu-pFGscKfVyd9KkubMF199etkfz=EPRg@mail.gmail.com> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Wed, 6 Dec 2023 11:18:35 +0800, Junwang Zhao <zhjwpku@gmail.com> wrote: > For the modern formats(parquet, orc, avro, etc.), will they be > implemented as extensions or in core? I think that they should be implemented as extensions because they will depend of external libraries and may not use C. For example, C++ will be used for Apache Parquet because the official Apache Parquet C++ implementation exists but the C implementation doesn't. (I can implement an extension for Apache Parquet after we complete this feature. I'll implement an extension for Apache Arrow with the official Apache Arrow C++ implementation. And it's easy that we convert Apache Arrow data to Apache Parquet with the official Apache Parquet implementation.) > The patch looks good except for a pair of extra curly braces. Thanks for the review! I attach the v2 patch that removes extra curly braces for "if (isnull)". Thanks, -- kou From 2cd0d344d68667db71b621a8c94f376ddf1707c3 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Mon, 4 Dec 2023 12:32:54 +0900 Subject: [PATCH v2] Extract COPY TO format implementations This is a part of making COPY format extendable. See also these past discussions: * New Copy Formats - avro/orc/parquet: https://www.postgresql.org/message-id/flat/20180210151304.fonjztsynewldfba%40gmail.com * Make COPY extendable in order to support Parquet and other formats: https://www.postgresql.org/message-id/flat/CAJ7c6TM6Bz1c3F04Cy6%2BSzuWfKmr0kU8c_3Stnvh_8BR0D6k8Q%40mail.gmail.com This doesn't change the current behavior. This just introduces CopyToFormatOps, which just has function pointers of format implementation like TupleTableSlotOps, and use it for existing "text", "csv" and "binary" format implementations. Note that CopyToFormatOps can't be used from extensions yet because CopySend*() aren't exported yet. Extensions can't send formatted data to a destination without CopySend*(). They will be exported by subsequent patches. Here is a benchmark result with/without this change because there was a discussion that we should care about performance regression: https://www.postgresql.org/message-id/3741749.1655952719%40sss.pgh.pa.us > I think that step 1 ought to be to convert the existing formats into > plug-ins, and demonstrate that there's no significant loss of > performance. You can see that there is no significant loss of performance: Data: Random 32 bit integers: CREATE TABLE data (int32 integer); INSERT INTO data SELECT random() * 10000 FROM generate_series(1, ${n_records}); The number of records: 100K, 1M and 10M 100K without this change: format,elapsed time (ms) text,22.527 csv,23.822 binary,24.806 100K with this change: format,elapsed time (ms) text,22.919 csv,24.643 binary,24.705 1M without this change: format,elapsed time (ms) text,223.457 csv,233.583 binary,242.687 1M with this change: format,elapsed time (ms) text,224.591 csv,233.964 binary,247.164 10M without this change: format,elapsed time (ms) text,2330.383 csv,2411.394 binary,2590.817 10M with this change: format,elapsed time (ms) text,2231.307 csv,2408.067 binary,2473.617 --- src/backend/commands/copy.c | 8 + src/backend/commands/copyto.c | 383 ++++++++++++++++++++-------------- src/include/commands/copy.h | 27 ++- 3 files changed, 262 insertions(+), 156 deletions(-) diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index cfad47b562..27a1add456 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -427,6 +427,8 @@ ProcessCopyOptions(ParseState *pstate, opts_out->file_encoding = -1; + /* Text is the default format. */ + opts_out->to_ops = CopyToFormatOpsText; /* Extract options from the statement node tree */ foreach(option, options) { @@ -442,9 +444,15 @@ ProcessCopyOptions(ParseState *pstate, if (strcmp(fmt, "text") == 0) /* default format */ ; else if (strcmp(fmt, "csv") == 0) + { opts_out->csv_mode = true; + opts_out->to_ops = CopyToFormatOpsCSV; + } else if (strcmp(fmt, "binary") == 0) + { opts_out->binary = true; + opts_out->to_ops = CopyToFormatOpsBinary; + } else ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c index c66a047c4a..79806b9a1b 100644 --- a/src/backend/commands/copyto.c +++ b/src/backend/commands/copyto.c @@ -131,6 +131,234 @@ static void CopySendEndOfRow(CopyToState cstate); static void CopySendInt32(CopyToState cstate, int32 val); static void CopySendInt16(CopyToState cstate, int16 val); +/* + * CopyToFormatOps implementations. + */ + +/* + * CopyToFormatOps implementation for "text" and "csv". CopyToFormatText*() + * refer cstate->opts.csv_mode and change their behavior. We can split this + * implementation and stop referring cstate->opts.csv_mode later. + */ + +static void +CopyToFormatTextSendEndOfRow(CopyToState cstate) +{ + switch (cstate->copy_dest) + { + case COPY_FILE: + /* Default line termination depends on platform */ +#ifndef WIN32 + CopySendChar(cstate, '\n'); +#else + CopySendString(cstate, "\r\n"); +#endif + break; + case COPY_FRONTEND: + /* The FE/BE protocol uses \n as newline for all platforms */ + CopySendChar(cstate, '\n'); + break; + default: + break; + } + CopySendEndOfRow(cstate); +} + +static void +CopyToFormatTextStart(CopyToState cstate, TupleDesc tupDesc) +{ + int num_phys_attrs; + ListCell *cur; + + num_phys_attrs = tupDesc->natts; + /* Get info about the columns we need to process. */ + cstate->out_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo)); + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + Oid out_func_oid; + bool isvarlena; + Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1); + + getTypeOutputInfo(attr->atttypid, &out_func_oid, &isvarlena); + fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]); + } + + /* + * For non-binary copy, we need to convert null_print to file + * encoding, because it will be sent directly with CopySendString. + */ + if (cstate->need_transcoding) + cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print, + cstate->opts.null_print_len, + cstate->file_encoding); + + /* if a header has been requested send the line */ + if (cstate->opts.header_line) + { + bool hdr_delim = false; + + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + char *colname; + + if (hdr_delim) + CopySendChar(cstate, cstate->opts.delim[0]); + hdr_delim = true; + + colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname); + + if (cstate->opts.csv_mode) + CopyAttributeOutCSV(cstate, colname, false, + list_length(cstate->attnumlist) == 1); + else + CopyAttributeOutText(cstate, colname); + } + + CopyToFormatTextSendEndOfRow(cstate); + } +} + +static void +CopyToFormatTextOneRow(CopyToState cstate, TupleTableSlot *slot) +{ + bool need_delim = false; + FmgrInfo *out_functions = cstate->out_functions; + ListCell *cur; + + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + Datum value = slot->tts_values[attnum - 1]; + bool isnull = slot->tts_isnull[attnum - 1]; + + if (need_delim) + CopySendChar(cstate, cstate->opts.delim[0]); + need_delim = true; + + if (isnull) + CopySendString(cstate, cstate->opts.null_print_client); + else + { + char *string; + + string = OutputFunctionCall(&out_functions[attnum - 1], value); + if (cstate->opts.csv_mode) + CopyAttributeOutCSV(cstate, string, + cstate->opts.force_quote_flags[attnum - 1], + list_length(cstate->attnumlist) == 1); + else + CopyAttributeOutText(cstate, string); + } + } + + CopyToFormatTextSendEndOfRow(cstate); +} + +static void +CopyToFormatTextEnd(CopyToState cstate) +{ +} + +/* + * CopyToFormatOps implementation for "binary". + */ + +static void +CopyToFormatBinaryStart(CopyToState cstate, TupleDesc tupDesc) +{ + int num_phys_attrs; + ListCell *cur; + + num_phys_attrs = tupDesc->natts; + /* Get info about the columns we need to process. */ + cstate->out_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo)); + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + Oid out_func_oid; + bool isvarlena; + Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1); + + getTypeBinaryOutputInfo(attr->atttypid, &out_func_oid, &isvarlena); + fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]); + } + + { + /* Generate header for a binary copy */ + int32 tmp; + + /* Signature */ + CopySendData(cstate, BinarySignature, 11); + /* Flags field */ + tmp = 0; + CopySendInt32(cstate, tmp); + /* No header extension */ + tmp = 0; + CopySendInt32(cstate, tmp); + } +} + +static void +CopyToFormatBinaryOneRow(CopyToState cstate, TupleTableSlot *slot) +{ + FmgrInfo *out_functions = cstate->out_functions; + ListCell *cur; + + /* Binary per-tuple header */ + CopySendInt16(cstate, list_length(cstate->attnumlist)); + + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + Datum value = slot->tts_values[attnum - 1]; + bool isnull = slot->tts_isnull[attnum - 1]; + + if (isnull) + CopySendInt32(cstate, -1); + else + { + bytea *outputbytes; + + outputbytes = SendFunctionCall(&out_functions[attnum - 1], value); + CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ); + CopySendData(cstate, VARDATA(outputbytes), + VARSIZE(outputbytes) - VARHDRSZ); + } + } + + CopySendEndOfRow(cstate); +} + +static void +CopyToFormatBinaryEnd(CopyToState cstate) +{ + /* Generate trailer for a binary copy */ + CopySendInt16(cstate, -1); + /* Need to flush out the trailer */ + CopySendEndOfRow(cstate); +} + +const CopyToFormatOps CopyToFormatOpsText = { + .start = CopyToFormatTextStart, + .one_row = CopyToFormatTextOneRow, + .end = CopyToFormatTextEnd, +}; + +/* + * We can use the same CopyToFormatOps for both of "text" and "csv" because + * CopyToFormatText*() refer cstate->opts.csv_mode and change their + * behavior. We can split the implementations and stop referring + * cstate->opts.csv_mode later. + */ +const CopyToFormatOps CopyToFormatOpsCSV = CopyToFormatOpsText; + +const CopyToFormatOps CopyToFormatOpsBinary = { + .start = CopyToFormatBinaryStart, + .one_row = CopyToFormatBinaryOneRow, + .end = CopyToFormatBinaryEnd, +}; /* * Send copy start/stop messages for frontend copies. These have changed @@ -198,16 +426,6 @@ CopySendEndOfRow(CopyToState cstate) switch (cstate->copy_dest) { case COPY_FILE: - if (!cstate->opts.binary) - { - /* Default line termination depends on platform */ -#ifndef WIN32 - CopySendChar(cstate, '\n'); -#else - CopySendString(cstate, "\r\n"); -#endif - } - if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1, cstate->copy_file) != 1 || ferror(cstate->copy_file)) @@ -242,10 +460,6 @@ CopySendEndOfRow(CopyToState cstate) } break; case COPY_FRONTEND: - /* The FE/BE protocol uses \n as newline for all platforms */ - if (!cstate->opts.binary) - CopySendChar(cstate, '\n'); - /* Dump the accumulated row as one CopyData message */ (void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len); break; @@ -748,8 +962,6 @@ DoCopyTo(CopyToState cstate) bool pipe = (cstate->filename == NULL && cstate->data_dest_cb == NULL); bool fe_copy = (pipe && whereToSendOutput == DestRemote); TupleDesc tupDesc; - int num_phys_attrs; - ListCell *cur; uint64 processed; if (fe_copy) @@ -759,32 +971,11 @@ DoCopyTo(CopyToState cstate) tupDesc = RelationGetDescr(cstate->rel); else tupDesc = cstate->queryDesc->tupDesc; - num_phys_attrs = tupDesc->natts; cstate->opts.null_print_client = cstate->opts.null_print; /* default */ /* We use fe_msgbuf as a per-row buffer regardless of copy_dest */ cstate->fe_msgbuf = makeStringInfo(); - /* Get info about the columns we need to process. */ - cstate->out_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo)); - foreach(cur, cstate->attnumlist) - { - int attnum = lfirst_int(cur); - Oid out_func_oid; - bool isvarlena; - Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1); - - if (cstate->opts.binary) - getTypeBinaryOutputInfo(attr->atttypid, - &out_func_oid, - &isvarlena); - else - getTypeOutputInfo(attr->atttypid, - &out_func_oid, - &isvarlena); - fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]); - } - /* * Create a temporary memory context that we can reset once per row to * recover palloc'd memory. This avoids any problems with leaks inside @@ -795,57 +986,7 @@ DoCopyTo(CopyToState cstate) "COPY TO", ALLOCSET_DEFAULT_SIZES); - if (cstate->opts.binary) - { - /* Generate header for a binary copy */ - int32 tmp; - - /* Signature */ - CopySendData(cstate, BinarySignature, 11); - /* Flags field */ - tmp = 0; - CopySendInt32(cstate, tmp); - /* No header extension */ - tmp = 0; - CopySendInt32(cstate, tmp); - } - else - { - /* - * For non-binary copy, we need to convert null_print to file - * encoding, because it will be sent directly with CopySendString. - */ - if (cstate->need_transcoding) - cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print, - cstate->opts.null_print_len, - cstate->file_encoding); - - /* if a header has been requested send the line */ - if (cstate->opts.header_line) - { - bool hdr_delim = false; - - foreach(cur, cstate->attnumlist) - { - int attnum = lfirst_int(cur); - char *colname; - - if (hdr_delim) - CopySendChar(cstate, cstate->opts.delim[0]); - hdr_delim = true; - - colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname); - - if (cstate->opts.csv_mode) - CopyAttributeOutCSV(cstate, colname, false, - list_length(cstate->attnumlist) == 1); - else - CopyAttributeOutText(cstate, colname); - } - - CopySendEndOfRow(cstate); - } - } + cstate->opts.to_ops.start(cstate, tupDesc); if (cstate->rel) { @@ -884,13 +1025,7 @@ DoCopyTo(CopyToState cstate) processed = ((DR_copy *) cstate->queryDesc->dest)->processed; } - if (cstate->opts.binary) - { - /* Generate trailer for a binary copy */ - CopySendInt16(cstate, -1); - /* Need to flush out the trailer */ - CopySendEndOfRow(cstate); - } + cstate->opts.to_ops.end(cstate); MemoryContextDelete(cstate->rowcontext); @@ -906,71 +1041,15 @@ DoCopyTo(CopyToState cstate) static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot) { - bool need_delim = false; - FmgrInfo *out_functions = cstate->out_functions; MemoryContext oldcontext; - ListCell *cur; - char *string; MemoryContextReset(cstate->rowcontext); oldcontext = MemoryContextSwitchTo(cstate->rowcontext); - if (cstate->opts.binary) - { - /* Binary per-tuple header */ - CopySendInt16(cstate, list_length(cstate->attnumlist)); - } - /* Make sure the tuple is fully deconstructed */ slot_getallattrs(slot); - foreach(cur, cstate->attnumlist) - { - int attnum = lfirst_int(cur); - Datum value = slot->tts_values[attnum - 1]; - bool isnull = slot->tts_isnull[attnum - 1]; - - if (!cstate->opts.binary) - { - if (need_delim) - CopySendChar(cstate, cstate->opts.delim[0]); - need_delim = true; - } - - if (isnull) - { - if (!cstate->opts.binary) - CopySendString(cstate, cstate->opts.null_print_client); - else - CopySendInt32(cstate, -1); - } - else - { - if (!cstate->opts.binary) - { - string = OutputFunctionCall(&out_functions[attnum - 1], - value); - if (cstate->opts.csv_mode) - CopyAttributeOutCSV(cstate, string, - cstate->opts.force_quote_flags[attnum - 1], - list_length(cstate->attnumlist) == 1); - else - CopyAttributeOutText(cstate, string); - } - else - { - bytea *outputbytes; - - outputbytes = SendFunctionCall(&out_functions[attnum - 1], - value); - CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ); - CopySendData(cstate, VARDATA(outputbytes), - VARSIZE(outputbytes) - VARHDRSZ); - } - } - } - - CopySendEndOfRow(cstate); + cstate->opts.to_ops.one_row(cstate, slot); MemoryContextSwitchTo(oldcontext); } diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h index f2cca0b90b..6b5231b2f3 100644 --- a/src/include/commands/copy.h +++ b/src/include/commands/copy.h @@ -30,6 +30,28 @@ typedef enum CopyHeaderChoice COPY_HEADER_MATCH, } CopyHeaderChoice; +/* These are private in commands/copy[from|to].c */ +typedef struct CopyFromStateData *CopyFromState; +typedef struct CopyToStateData *CopyToState; + +/* Routines for a COPY TO format implementation. */ +typedef struct CopyToFormatOps +{ + /* Called when COPY TO is started. This will send a header. */ + void (*start) (CopyToState cstate, TupleDesc tupDesc); + + /* Copy one row for COPY TO. */ + void (*one_row) (CopyToState cstate, TupleTableSlot *slot); + + /* Called when COPY TO is ended. This will send a trailer. */ + void (*end) (CopyToState cstate); +} CopyToFormatOps; + +/* Predefined CopyToFormatOps for "text", "csv" and "binary". */ +extern PGDLLIMPORT const CopyToFormatOps CopyToFormatOpsText; +extern PGDLLIMPORT const CopyToFormatOps CopyToFormatOpsCSV; +extern PGDLLIMPORT const CopyToFormatOps CopyToFormatOpsBinary; + /* * A struct to hold COPY options, in a parsed form. All of these are related * to formatting, except for 'freeze', which doesn't really belong here, but @@ -63,12 +85,9 @@ typedef struct CopyFormatOptions bool *force_null_flags; /* per-column CSV FN flags */ bool convert_selectively; /* do selective binary conversion? */ List *convert_select; /* list of column names (can be NIL) */ + CopyToFormatOps to_ops; /* how to format to */ } CopyFormatOptions; -/* These are private in commands/copy[from|to].c */ -typedef struct CopyFromStateData *CopyFromState; -typedef struct CopyToStateData *CopyToState; - typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread); typedef void (*copy_data_dest_cb) (void *data, int len); -- 2.40.1
On Wed, Dec 6, 2023 at 2:19 PM Sutou Kouhei <kou@clear-code.com> wrote:
>
> Hi,
>
> In <CAEG8a3Jf7kPV3ez5OHu-pFGscKfVyd9KkubMF199etkfz=EPRg@mail.gmail.com>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Wed, 6 Dec 2023 11:18:35 +0800,
>   Junwang Zhao <zhjwpku@gmail.com> wrote:
>
> > For the modern formats(parquet, orc, avro, etc.), will they be
> > implemented as extensions or in core?
>
> I think that they should be implemented as extensions
> because they will depend of external libraries and may not
> use C. For example, C++ will be used for Apache Parquet
> because the official Apache Parquet C++ implementation
> exists but the C implementation doesn't.
>
> (I can implement an extension for Apache Parquet after we
> complete this feature. I'll implement an extension for
> Apache Arrow with the official Apache Arrow C++
> implementation. And it's easy that we convert Apache Arrow
> data to Apache Parquet with the official Apache Parquet
> implementation.)
>
> > The patch looks good except for a pair of extra curly braces.
>
> Thanks for the review! I attach the v2 patch that removes
> extra curly braces for "if (isnull)".
>
For the extra curly braces, I mean the following code block in
CopyToFormatBinaryStart:
+ {        <-- I thought this is useless?
+ /* Generate header for a binary copy */
+ int32 tmp;
+
+ /* Signature */
+ CopySendData(cstate, BinarySignature, 11);
+ /* Flags field */
+ tmp = 0;
+ CopySendInt32(cstate, tmp);
+ /* No header extension */
+ tmp = 0;
+ CopySendInt32(cstate, tmp);
+ }
>
> Thanks,
> --
> kou
--
Regards
Junwang Zhao
			
		Hi,
In <CAEG8a3K9dE2gt3+K+h=DwTqMenR84aeYuYS+cty3SR3LAeDBAQ@mail.gmail.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Wed, 6 Dec 2023 15:11:34 +0800,
  Junwang Zhao <zhjwpku@gmail.com> wrote:
> For the extra curly braces, I mean the following code block in
> CopyToFormatBinaryStart:
> 
> + {        <-- I thought this is useless?
> + /* Generate header for a binary copy */
> + int32 tmp;
> +
> + /* Signature */
> + CopySendData(cstate, BinarySignature, 11);
> + /* Flags field */
> + tmp = 0;
> + CopySendInt32(cstate, tmp);
> + /* No header extension */
> + tmp = 0;
> + CopySendInt32(cstate, tmp);
> + }
Oh, I see. I've removed and attach the v3 patch. In general,
I don't change variable name and so on in this patch. I just
move codes in this patch. But I also removed the "tmp"
variable for this case because I think that the name isn't
suitable for larger scope. (I think that "tmp" is acceptable
in a small scope like the above code.)
New code:
/* Generate header for a binary copy */
/* Signature */
CopySendData(cstate, BinarySignature, 11);
/* Flags field */
CopySendInt32(cstate, 0);
/* No header extension */
CopySendInt32(cstate, 0);
Thanks,
-- 
kou
From 9fe0087d9a6a79a7d1a7d0af63eb16abadbf0d4a Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Mon, 4 Dec 2023 12:32:54 +0900
Subject: [PATCH v3] Extract COPY TO format implementations
This is a part of making COPY format extendable. See also these past
discussions:
* New Copy Formats - avro/orc/parquet:
  https://www.postgresql.org/message-id/flat/20180210151304.fonjztsynewldfba%40gmail.com
* Make COPY extendable in order to support Parquet and other formats:
  https://www.postgresql.org/message-id/flat/CAJ7c6TM6Bz1c3F04Cy6%2BSzuWfKmr0kU8c_3Stnvh_8BR0D6k8Q%40mail.gmail.com
This doesn't change the current behavior. This just introduces
CopyToFormatOps, which just has function pointers of format
implementation like TupleTableSlotOps, and use it for existing "text",
"csv" and "binary" format implementations.
Note that CopyToFormatOps can't be used from extensions yet because
CopySend*() aren't exported yet. Extensions can't send formatted data
to a destination without CopySend*(). They will be exported by
subsequent patches.
Here is a benchmark result with/without this change because there was
a discussion that we should care about performance regression:
https://www.postgresql.org/message-id/3741749.1655952719%40sss.pgh.pa.us
> I think that step 1 ought to be to convert the existing formats into
> plug-ins, and demonstrate that there's no significant loss of
> performance.
You can see that there is no significant loss of performance:
Data: Random 32 bit integers:
    CREATE TABLE data (int32 integer);
    INSERT INTO data
      SELECT random() * 10000
        FROM generate_series(1, ${n_records});
The number of records: 100K, 1M and 10M
100K without this change:
    format,elapsed time (ms)
    text,22.527
    csv,23.822
    binary,24.806
100K with this change:
    format,elapsed time (ms)
    text,22.919
    csv,24.643
    binary,24.705
1M without this change:
    format,elapsed time (ms)
    text,223.457
    csv,233.583
    binary,242.687
1M with this change:
    format,elapsed time (ms)
    text,224.591
    csv,233.964
    binary,247.164
10M without this change:
    format,elapsed time (ms)
    text,2330.383
    csv,2411.394
    binary,2590.817
10M with this change:
    format,elapsed time (ms)
    text,2231.307
    csv,2408.067
    binary,2473.617
---
 src/backend/commands/copy.c   |   8 +
 src/backend/commands/copyto.c | 377 ++++++++++++++++++++--------------
 src/include/commands/copy.h   |  27 ++-
 3 files changed, 256 insertions(+), 156 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfad47b562..27a1add456 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -427,6 +427,8 @@ ProcessCopyOptions(ParseState *pstate,
 
     opts_out->file_encoding = -1;
 
+    /* Text is the default format. */
+    opts_out->to_ops = CopyToFormatOpsText;
     /* Extract options from the statement node tree */
     foreach(option, options)
     {
@@ -442,9 +444,15 @@ ProcessCopyOptions(ParseState *pstate,
             if (strcmp(fmt, "text") == 0)
                  /* default format */ ;
             else if (strcmp(fmt, "csv") == 0)
+            {
                 opts_out->csv_mode = true;
+                opts_out->to_ops = CopyToFormatOpsCSV;
+            }
             else if (strcmp(fmt, "binary") == 0)
+            {
                 opts_out->binary = true;
+                opts_out->to_ops = CopyToFormatOpsBinary;
+            }
             else
                 ereport(ERROR,
                         (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index c66a047c4a..8f51090a03 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -131,6 +131,228 @@ static void CopySendEndOfRow(CopyToState cstate);
 static void CopySendInt32(CopyToState cstate, int32 val);
 static void CopySendInt16(CopyToState cstate, int16 val);
 
+/*
+ * CopyToFormatOps implementations.
+ */
+
+/*
+ * CopyToFormatOps implementation for "text" and "csv". CopyToFormatText*()
+ * refer cstate->opts.csv_mode and change their behavior. We can split this
+ * implementation and stop referring cstate->opts.csv_mode later.
+ */
+
+static void
+CopyToFormatTextSendEndOfRow(CopyToState cstate)
+{
+    switch (cstate->copy_dest)
+    {
+    case COPY_FILE:
+        /* Default line termination depends on platform */
+#ifndef WIN32
+        CopySendChar(cstate, '\n');
+#else
+        CopySendString(cstate, "\r\n");
+#endif
+        break;
+    case COPY_FRONTEND:
+        /* The FE/BE protocol uses \n as newline for all platforms */
+        CopySendChar(cstate, '\n');
+        break;
+    default:
+        break;
+    }
+    CopySendEndOfRow(cstate);
+}
+
+static void
+CopyToFormatTextStart(CopyToState cstate, TupleDesc tupDesc)
+{
+    int            num_phys_attrs;
+    ListCell   *cur;
+
+    num_phys_attrs = tupDesc->natts;
+    /* Get info about the columns we need to process. */
+    cstate->out_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo));
+    foreach(cur, cstate->attnumlist)
+    {
+        int            attnum = lfirst_int(cur);
+        Oid            out_func_oid;
+        bool        isvarlena;
+        Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
+
+        getTypeOutputInfo(attr->atttypid, &out_func_oid, &isvarlena);
+        fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
+    }
+
+    /*
+     * For non-binary copy, we need to convert null_print to file
+     * encoding, because it will be sent directly with CopySendString.
+     */
+    if (cstate->need_transcoding)
+        cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print,
+                                                          cstate->opts.null_print_len,
+                                                          cstate->file_encoding);
+
+    /* if a header has been requested send the line */
+    if (cstate->opts.header_line)
+    {
+        bool        hdr_delim = false;
+
+        foreach(cur, cstate->attnumlist)
+        {
+            int            attnum = lfirst_int(cur);
+            char       *colname;
+
+            if (hdr_delim)
+                CopySendChar(cstate, cstate->opts.delim[0]);
+            hdr_delim = true;
+
+            colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
+
+            if (cstate->opts.csv_mode)
+                CopyAttributeOutCSV(cstate, colname, false,
+                                    list_length(cstate->attnumlist) == 1);
+            else
+                CopyAttributeOutText(cstate, colname);
+        }
+
+        CopyToFormatTextSendEndOfRow(cstate);
+    }
+}
+
+static void
+CopyToFormatTextOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+    bool        need_delim = false;
+    FmgrInfo   *out_functions = cstate->out_functions;
+    ListCell   *cur;
+
+    foreach(cur, cstate->attnumlist)
+    {
+        int            attnum = lfirst_int(cur);
+        Datum        value = slot->tts_values[attnum - 1];
+        bool        isnull = slot->tts_isnull[attnum - 1];
+
+        if (need_delim)
+            CopySendChar(cstate, cstate->opts.delim[0]);
+        need_delim = true;
+
+        if (isnull)
+            CopySendString(cstate, cstate->opts.null_print_client);
+        else
+        {
+            char       *string;
+
+            string = OutputFunctionCall(&out_functions[attnum - 1], value);
+            if (cstate->opts.csv_mode)
+                CopyAttributeOutCSV(cstate, string,
+                                    cstate->opts.force_quote_flags[attnum - 1],
+                                    list_length(cstate->attnumlist) == 1);
+            else
+                CopyAttributeOutText(cstate, string);
+        }
+    }
+
+    CopyToFormatTextSendEndOfRow(cstate);
+}
+
+static void
+CopyToFormatTextEnd(CopyToState cstate)
+{
+}
+
+/*
+ * CopyToFormatOps implementation for "binary".
+ */
+
+static void
+CopyToFormatBinaryStart(CopyToState cstate, TupleDesc tupDesc)
+{
+    int            num_phys_attrs;
+    ListCell   *cur;
+
+    num_phys_attrs = tupDesc->natts;
+    /* Get info about the columns we need to process. */
+    cstate->out_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo));
+    foreach(cur, cstate->attnumlist)
+    {
+        int            attnum = lfirst_int(cur);
+        Oid            out_func_oid;
+        bool        isvarlena;
+        Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
+
+        getTypeBinaryOutputInfo(attr->atttypid, &out_func_oid, &isvarlena);
+        fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
+    }
+
+    /* Generate header for a binary copy */
+    /* Signature */
+    CopySendData(cstate, BinarySignature, 11);
+    /* Flags field */
+    CopySendInt32(cstate, 0);
+    /* No header extension */
+    CopySendInt32(cstate, 0);
+}
+
+static void
+CopyToFormatBinaryOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+    FmgrInfo   *out_functions = cstate->out_functions;
+    ListCell   *cur;
+
+    /* Binary per-tuple header */
+    CopySendInt16(cstate, list_length(cstate->attnumlist));
+
+    foreach(cur, cstate->attnumlist)
+    {
+        int            attnum = lfirst_int(cur);
+        Datum        value = slot->tts_values[attnum - 1];
+        bool        isnull = slot->tts_isnull[attnum - 1];
+
+        if (isnull)
+            CopySendInt32(cstate, -1);
+        else
+        {
+            bytea       *outputbytes;
+
+            outputbytes = SendFunctionCall(&out_functions[attnum - 1], value);
+            CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
+            CopySendData(cstate, VARDATA(outputbytes),
+                         VARSIZE(outputbytes) - VARHDRSZ);
+        }
+    }
+
+    CopySendEndOfRow(cstate);
+}
+
+static void
+CopyToFormatBinaryEnd(CopyToState cstate)
+{
+    /* Generate trailer for a binary copy */
+    CopySendInt16(cstate, -1);
+    /* Need to flush out the trailer */
+    CopySendEndOfRow(cstate);
+}
+
+const CopyToFormatOps CopyToFormatOpsText = {
+    .start = CopyToFormatTextStart,
+    .one_row = CopyToFormatTextOneRow,
+    .end = CopyToFormatTextEnd,
+};
+
+/*
+ * We can use the same CopyToFormatOps for both of "text" and "csv" because
+ * CopyToFormatText*() refer cstate->opts.csv_mode and change their
+ * behavior. We can split the implementations and stop referring
+ * cstate->opts.csv_mode later.
+ */
+const CopyToFormatOps CopyToFormatOpsCSV = CopyToFormatOpsText;
+
+const CopyToFormatOps CopyToFormatOpsBinary = {
+    .start = CopyToFormatBinaryStart,
+    .one_row = CopyToFormatBinaryOneRow,
+    .end = CopyToFormatBinaryEnd,
+};
 
 /*
  * Send copy start/stop messages for frontend copies.  These have changed
@@ -198,16 +420,6 @@ CopySendEndOfRow(CopyToState cstate)
     switch (cstate->copy_dest)
     {
         case COPY_FILE:
-            if (!cstate->opts.binary)
-            {
-                /* Default line termination depends on platform */
-#ifndef WIN32
-                CopySendChar(cstate, '\n');
-#else
-                CopySendString(cstate, "\r\n");
-#endif
-            }
-
             if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1,
                        cstate->copy_file) != 1 ||
                 ferror(cstate->copy_file))
@@ -242,10 +454,6 @@ CopySendEndOfRow(CopyToState cstate)
             }
             break;
         case COPY_FRONTEND:
-            /* The FE/BE protocol uses \n as newline for all platforms */
-            if (!cstate->opts.binary)
-                CopySendChar(cstate, '\n');
-
             /* Dump the accumulated row as one CopyData message */
             (void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len);
             break;
@@ -748,8 +956,6 @@ DoCopyTo(CopyToState cstate)
     bool        pipe = (cstate->filename == NULL && cstate->data_dest_cb == NULL);
     bool        fe_copy = (pipe && whereToSendOutput == DestRemote);
     TupleDesc    tupDesc;
-    int            num_phys_attrs;
-    ListCell   *cur;
     uint64        processed;
 
     if (fe_copy)
@@ -759,32 +965,11 @@ DoCopyTo(CopyToState cstate)
         tupDesc = RelationGetDescr(cstate->rel);
     else
         tupDesc = cstate->queryDesc->tupDesc;
-    num_phys_attrs = tupDesc->natts;
     cstate->opts.null_print_client = cstate->opts.null_print;    /* default */
 
     /* We use fe_msgbuf as a per-row buffer regardless of copy_dest */
     cstate->fe_msgbuf = makeStringInfo();
 
-    /* Get info about the columns we need to process. */
-    cstate->out_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo));
-    foreach(cur, cstate->attnumlist)
-    {
-        int            attnum = lfirst_int(cur);
-        Oid            out_func_oid;
-        bool        isvarlena;
-        Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
-
-        if (cstate->opts.binary)
-            getTypeBinaryOutputInfo(attr->atttypid,
-                                    &out_func_oid,
-                                    &isvarlena);
-        else
-            getTypeOutputInfo(attr->atttypid,
-                              &out_func_oid,
-                              &isvarlena);
-        fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
-    }
-
     /*
      * Create a temporary memory context that we can reset once per row to
      * recover palloc'd memory.  This avoids any problems with leaks inside
@@ -795,57 +980,7 @@ DoCopyTo(CopyToState cstate)
                                                "COPY TO",
                                                ALLOCSET_DEFAULT_SIZES);
 
-    if (cstate->opts.binary)
-    {
-        /* Generate header for a binary copy */
-        int32        tmp;
-
-        /* Signature */
-        CopySendData(cstate, BinarySignature, 11);
-        /* Flags field */
-        tmp = 0;
-        CopySendInt32(cstate, tmp);
-        /* No header extension */
-        tmp = 0;
-        CopySendInt32(cstate, tmp);
-    }
-    else
-    {
-        /*
-         * For non-binary copy, we need to convert null_print to file
-         * encoding, because it will be sent directly with CopySendString.
-         */
-        if (cstate->need_transcoding)
-            cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print,
-                                                              cstate->opts.null_print_len,
-                                                              cstate->file_encoding);
-
-        /* if a header has been requested send the line */
-        if (cstate->opts.header_line)
-        {
-            bool        hdr_delim = false;
-
-            foreach(cur, cstate->attnumlist)
-            {
-                int            attnum = lfirst_int(cur);
-                char       *colname;
-
-                if (hdr_delim)
-                    CopySendChar(cstate, cstate->opts.delim[0]);
-                hdr_delim = true;
-
-                colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
-
-                if (cstate->opts.csv_mode)
-                    CopyAttributeOutCSV(cstate, colname, false,
-                                        list_length(cstate->attnumlist) == 1);
-                else
-                    CopyAttributeOutText(cstate, colname);
-            }
-
-            CopySendEndOfRow(cstate);
-        }
-    }
+    cstate->opts.to_ops.start(cstate, tupDesc);
 
     if (cstate->rel)
     {
@@ -884,13 +1019,7 @@ DoCopyTo(CopyToState cstate)
         processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
     }
 
-    if (cstate->opts.binary)
-    {
-        /* Generate trailer for a binary copy */
-        CopySendInt16(cstate, -1);
-        /* Need to flush out the trailer */
-        CopySendEndOfRow(cstate);
-    }
+    cstate->opts.to_ops.end(cstate);
 
     MemoryContextDelete(cstate->rowcontext);
 
@@ -906,71 +1035,15 @@ DoCopyTo(CopyToState cstate)
 static void
 CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 {
-    bool        need_delim = false;
-    FmgrInfo   *out_functions = cstate->out_functions;
     MemoryContext oldcontext;
-    ListCell   *cur;
-    char       *string;
 
     MemoryContextReset(cstate->rowcontext);
     oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
 
-    if (cstate->opts.binary)
-    {
-        /* Binary per-tuple header */
-        CopySendInt16(cstate, list_length(cstate->attnumlist));
-    }
-
     /* Make sure the tuple is fully deconstructed */
     slot_getallattrs(slot);
 
-    foreach(cur, cstate->attnumlist)
-    {
-        int            attnum = lfirst_int(cur);
-        Datum        value = slot->tts_values[attnum - 1];
-        bool        isnull = slot->tts_isnull[attnum - 1];
-
-        if (!cstate->opts.binary)
-        {
-            if (need_delim)
-                CopySendChar(cstate, cstate->opts.delim[0]);
-            need_delim = true;
-        }
-
-        if (isnull)
-        {
-            if (!cstate->opts.binary)
-                CopySendString(cstate, cstate->opts.null_print_client);
-            else
-                CopySendInt32(cstate, -1);
-        }
-        else
-        {
-            if (!cstate->opts.binary)
-            {
-                string = OutputFunctionCall(&out_functions[attnum - 1],
-                                            value);
-                if (cstate->opts.csv_mode)
-                    CopyAttributeOutCSV(cstate, string,
-                                        cstate->opts.force_quote_flags[attnum - 1],
-                                        list_length(cstate->attnumlist) == 1);
-                else
-                    CopyAttributeOutText(cstate, string);
-            }
-            else
-            {
-                bytea       *outputbytes;
-
-                outputbytes = SendFunctionCall(&out_functions[attnum - 1],
-                                               value);
-                CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
-                CopySendData(cstate, VARDATA(outputbytes),
-                             VARSIZE(outputbytes) - VARHDRSZ);
-            }
-        }
-    }
-
-    CopySendEndOfRow(cstate);
+    cstate->opts.to_ops.one_row(cstate, slot);
 
     MemoryContextSwitchTo(oldcontext);
 }
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index f2cca0b90b..6b5231b2f3 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -30,6 +30,28 @@ typedef enum CopyHeaderChoice
     COPY_HEADER_MATCH,
 } CopyHeaderChoice;
 
+/* These are private in commands/copy[from|to].c */
+typedef struct CopyFromStateData *CopyFromState;
+typedef struct CopyToStateData *CopyToState;
+
+/* Routines for a COPY TO format implementation. */
+typedef struct CopyToFormatOps
+{
+    /* Called when COPY TO is started. This will send a header. */
+    void        (*start) (CopyToState cstate, TupleDesc tupDesc);
+
+    /* Copy one row for COPY TO. */
+    void        (*one_row) (CopyToState cstate, TupleTableSlot *slot);
+
+    /* Called when COPY TO is ended. This will send a trailer. */
+    void        (*end) (CopyToState cstate);
+} CopyToFormatOps;
+
+/* Predefined CopyToFormatOps for "text", "csv" and "binary". */
+extern PGDLLIMPORT const CopyToFormatOps CopyToFormatOpsText;
+extern PGDLLIMPORT const CopyToFormatOps CopyToFormatOpsCSV;
+extern PGDLLIMPORT const CopyToFormatOps CopyToFormatOpsBinary;
+
 /*
  * A struct to hold COPY options, in a parsed form. All of these are related
  * to formatting, except for 'freeze', which doesn't really belong here, but
@@ -63,12 +85,9 @@ typedef struct CopyFormatOptions
     bool       *force_null_flags;    /* per-column CSV FN flags */
     bool        convert_selectively;    /* do selective binary conversion? */
     List       *convert_select; /* list of column names (can be NIL) */
+    CopyToFormatOps to_ops;        /* how to format to */
 } CopyFormatOptions;
 
-/* These are private in commands/copy[from|to].c */
-typedef struct CopyFromStateData *CopyFromState;
-typedef struct CopyToStateData *CopyToState;
-
 typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
 typedef void (*copy_data_dest_cb) (void *data, int len);
 
-- 
2.40.1
			
		Sutou Kouhei wrote: > * 2022-04: Apache Arrow [2] > * 2018-02: Apache Avro, Apache Parquet and Apache ORC [3] > > (FYI: I want to add support for Apache Arrow.) > > There were discussions how to add support for more formats. [3][4] > In these discussions, we got a consensus about making COPY > format extendable. These formats seem all column-oriented whereas COPY is row-oriented at the protocol level [1]. With regard to the procotol, how would it work to support these formats? [1] https://www.postgresql.org/docs/current/protocol-flow.html#PROTOCOL-COPY Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite
On Wed, Dec 6, 2023 at 3:28 PM Sutou Kouhei <kou@clear-code.com> wrote:
>
> Hi,
>
> In <CAEG8a3K9dE2gt3+K+h=DwTqMenR84aeYuYS+cty3SR3LAeDBAQ@mail.gmail.com>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Wed, 6 Dec 2023 15:11:34 +0800,
>   Junwang Zhao <zhjwpku@gmail.com> wrote:
>
> > For the extra curly braces, I mean the following code block in
> > CopyToFormatBinaryStart:
> >
> > + {        <-- I thought this is useless?
> > + /* Generate header for a binary copy */
> > + int32 tmp;
> > +
> > + /* Signature */
> > + CopySendData(cstate, BinarySignature, 11);
> > + /* Flags field */
> > + tmp = 0;
> > + CopySendInt32(cstate, tmp);
> > + /* No header extension */
> > + tmp = 0;
> > + CopySendInt32(cstate, tmp);
> > + }
>
> Oh, I see. I've removed and attach the v3 patch. In general,
> I don't change variable name and so on in this patch. I just
> move codes in this patch. But I also removed the "tmp"
> variable for this case because I think that the name isn't
> suitable for larger scope. (I think that "tmp" is acceptable
> in a small scope like the above code.)
>
> New code:
>
> /* Generate header for a binary copy */
> /* Signature */
> CopySendData(cstate, BinarySignature, 11);
> /* Flags field */
> CopySendInt32(cstate, 0);
> /* No header extension */
> CopySendInt32(cstate, 0);
>
>
> Thanks,
> --
> kou
Hi Kou,
I read the thread[1] you posted and I think Andres's suggestion sounds great.
Should we extract both *copy to* and *copy from* for the first step, in that
case we can add the pg_copy_handler catalog smoothly later.
Attached V4 adds 'extract copy from' and it passed the cirrus ci,
please take a look.
I added a hook *copy_from_end* but this might be removed later if not used.
[1]: https://www.postgresql.org/message-id/20180211211235.5x3jywe5z3lkgcsr%40alap3.anarazel.de
--
Regards
Junwang Zhao
			
		Вложения
On Wed, Dec 6, 2023 at 8:32 PM Daniel Verite <daniel@manitou-mail.org> wrote: > > Sutou Kouhei wrote: > > > * 2022-04: Apache Arrow [2] > > * 2018-02: Apache Avro, Apache Parquet and Apache ORC [3] > > > > (FYI: I want to add support for Apache Arrow.) > > > > There were discussions how to add support for more formats. [3][4] > > In these discussions, we got a consensus about making COPY > > format extendable. > > > These formats seem all column-oriented whereas COPY is row-oriented > at the protocol level [1]. > With regard to the procotol, how would it work to support these formats? > They have kind of *RowGroup* concepts, a bunch of rows goes to a RowBatch and the data of the same column goes together. I think they should fit the COPY semantics and there are some FDW out there for these modern formats, like [1]. If we support COPY to deal with the format, it will be easier to interact with them(without creating server/usermapping/foreign table). [1]: https://github.com/adjust/parquet_fdw > > [1] https://www.postgresql.org/docs/current/protocol-flow.html#PROTOCOL-COPY > > > Best regards, > -- > Daniel Vérité > https://postgresql.verite.pro/ > Twitter: @DanielVerite > > -- Regards Junwang Zhao
On Wed, Dec 06, 2023 at 10:07:51PM +0800, Junwang Zhao wrote: > I read the thread[1] you posted and I think Andres's suggestion sounds great. > > Should we extract both *copy to* and *copy from* for the first step, in that > case we can add the pg_copy_handler catalog smoothly later. > > Attached V4 adds 'extract copy from' and it passed the cirrus ci, > please take a look. > > I added a hook *copy_from_end* but this might be removed later if not used. > > [1]: https://www.postgresql.org/message-id/20180211211235.5x3jywe5z3lkgcsr%40alap3.anarazel.de I was looking at the differences between v3 posted by Sutou-san and v4 from you, seeing that: +/* Routines for a COPY HANDLER implementation. */ +typedef struct CopyHandlerOps { /* Called when COPY TO is started. This will send a header. */ - void (*start) (CopyToState cstate, TupleDesc tupDesc); + void (*copy_to_start) (CopyToState cstate, TupleDesc tupDesc); /* Copy one row for COPY TO. */ - void (*one_row) (CopyToState cstate, TupleTableSlot *slot); + void (*copy_to_one_row) (CopyToState cstate, TupleTableSlot *slot); /* Called when COPY TO is ended. This will send a trailer. */ - void (*end) (CopyToState cstate); -} CopyToFormatOps; + void (*copy_to_end) (CopyToState cstate); + + void (*copy_from_start) (CopyFromState cstate, TupleDesc tupDesc); + bool (*copy_from_next) (CopyFromState cstate, ExprContext *econtext, + Datum *values, bool *nulls); + void (*copy_from_error_callback) (CopyFromState cstate); + void (*copy_from_end) (CopyFromState cstate); +} CopyHandlerOps; And we've spent a good deal of time refactoring the copy code so as the logic behind TO and FROM is split. Having a set of routines that groups both does not look like a step in the right direction to me, and v4 is an attempt at solving two problems, while v3 aims to improve one case. It seems to me that each callback portion should be focused on staying in its own area of the code, aka copyfrom*.c or copyto*.c. -- Michael
Вложения
Hi,
In <CAEG8a3LSRhK601Bn50u71BgfNWm4q3kv-o-KEq=hrbyLbY_EsA@mail.gmail.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Wed, 6 Dec 2023 22:07:51 +0800,
  Junwang Zhao <zhjwpku@gmail.com> wrote:
> Should we extract both *copy to* and *copy from* for the first step, in that
> case we can add the pg_copy_handler catalog smoothly later.
I don't object it (mixing TO/FROM changes to one patch) but
it may make review difficult. Is it acceptable?
FYI: I planed that I implement TO part, and then FROM part,
and then unify TO/FROM parts if needed. [1]
> Attached V4 adds 'extract copy from' and it passed the cirrus ci,
> please take a look.
Thanks. Here are my comments:
> +        /*
> +            * Error is relevant to a particular line.
> +            *
> +            * If line_buf still contains the correct line, print it.
> +            */
> +        if (cstate->line_buf_valid)
We need to fix the indentation.
> +CopyFromFormatBinaryStart(CopyFromState cstate, TupleDesc tupDesc)
> +{
> +    FmgrInfo   *in_functions;
> +    Oid           *typioparams;
> +    Oid            in_func_oid;
> +    AttrNumber    num_phys_attrs;
> +
> +    /*
> +     * Pick up the required catalog information for each attribute in the
> +     * relation, including the input function, the element type (to pass to
> +     * the input function), and info about defaults and constraints. (Which
> +     * input function we use depends on text/binary format choice.)
> +     */
> +    num_phys_attrs = tupDesc->natts;
> +    in_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo));
> +    typioparams = (Oid *) palloc(num_phys_attrs * sizeof(Oid));
We need to update the comment because defaults and
constraints aren't picked up here.
> +CopyFromFormatTextStart(CopyFromState cstate, TupleDesc tupDesc)
...
> +    /*
> +     * Pick up the required catalog information for each attribute in the
> +     * relation, including the input function, the element type (to pass to
> +     * the input function), and info about defaults and constraints. (Which
> +     * input function we use depends on text/binary format choice.)
> +     */
> +    in_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo));
> +    typioparams = (Oid *) palloc(num_phys_attrs * sizeof(Oid));
ditto.
> @@ -1716,15 +1776,6 @@ BeginCopyFrom(ParseState *pstate,
>          ReceiveCopyBinaryHeader(cstate);
>      }
I think that this block should be moved to
CopyFromFormatBinaryStart() too. But we need to run it after
we setup inputs such as data_source_cb, pipe and filename...
+/* Routines for a COPY HANDLER implementation. */
+typedef struct CopyHandlerOps
+{
+    /* Called when COPY TO is started. This will send a header. */
+    void        (*copy_to_start) (CopyToState cstate, TupleDesc tupDesc);
+
+    /* Copy one row for COPY TO. */
+    void        (*copy_to_one_row) (CopyToState cstate, TupleTableSlot *slot);
+
+    /* Called when COPY TO is ended. This will send a trailer. */
+    void        (*copy_to_end) (CopyToState cstate);
+
+    void        (*copy_from_start) (CopyFromState cstate, TupleDesc tupDesc);
+    bool        (*copy_from_next) (CopyFromState cstate, ExprContext *econtext,
+                                    Datum *values, bool *nulls);
+    void        (*copy_from_error_callback) (CopyFromState cstate);
+    void        (*copy_from_end) (CopyFromState cstate);
+} CopyHandlerOps;
It seems that "copy_" prefix is redundant. Should we use
"to_start" instead of "copy_to_start" and so on?
BTW, it seems that "COPY FROM (FORMAT json)" may not be implemented. [2]
We may need to care about NULL copy_from_* cases.
> I added a hook *copy_from_end* but this might be removed later if not used.
It may be useful to clean up resources for COPY FROM but the
patch doesn't call the copy_from_end. How about removing it
for now? We can add it and call it from EndCopyFrom() later?
Because it's not needed for now.
I think that we should focus on refactoring instead of
adding a new feature in this patch.
[1]: https://www.postgresql.org/message-id/20231204.153548.2126325458835528809.kou%40clear-code.com
[2]: https://www.postgresql.org/message-id/flat/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
Thanks,
-- 
kou
			
		On Thu, Dec 7, 2023 at 8:39 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Wed, Dec 06, 2023 at 10:07:51PM +0800, Junwang Zhao wrote: > > I read the thread[1] you posted and I think Andres's suggestion sounds great. > > > > Should we extract both *copy to* and *copy from* for the first step, in that > > case we can add the pg_copy_handler catalog smoothly later. > > > > Attached V4 adds 'extract copy from' and it passed the cirrus ci, > > please take a look. > > > > I added a hook *copy_from_end* but this might be removed later if not used. > > > > [1]: https://www.postgresql.org/message-id/20180211211235.5x3jywe5z3lkgcsr%40alap3.anarazel.de > > I was looking at the differences between v3 posted by Sutou-san and > v4 from you, seeing that: > > +/* Routines for a COPY HANDLER implementation. */ > +typedef struct CopyHandlerOps > { > /* Called when COPY TO is started. This will send a header. */ > - void (*start) (CopyToState cstate, TupleDesc tupDesc); > + void (*copy_to_start) (CopyToState cstate, TupleDesc tupDesc); > > /* Copy one row for COPY TO. */ > - void (*one_row) (CopyToState cstate, TupleTableSlot *slot); > + void (*copy_to_one_row) (CopyToState cstate, TupleTableSlot *slot); > > /* Called when COPY TO is ended. This will send a trailer. */ > - void (*end) (CopyToState cstate); > -} CopyToFormatOps; > + void (*copy_to_end) (CopyToState cstate); > + > + void (*copy_from_start) (CopyFromState cstate, TupleDesc tupDesc); > + bool (*copy_from_next) (CopyFromState cstate, ExprContext *econtext, > + Datum *values, bool *nulls); > + void (*copy_from_error_callback) (CopyFromState cstate); > + void (*copy_from_end) (CopyFromState cstate); > +} CopyHandlerOps; > > And we've spent a good deal of time refactoring the copy code so as > the logic behind TO and FROM is split. Having a set of routines that > groups both does not look like a step in the right direction to me, The point of this refactor (from my view) is to make it possible to add new copy handlers in extensions, just like access method. As Andres suggested, a system catalog like *pg_copy_handler*, if we split TO and FROM into two sets of routines, does that mean we have to create two catalog( pg_copy_from_handler and pg_copy_to_handler)? > and v4 is an attempt at solving two problems, while v3 aims to improve > one case. It seems to me that each callback portion should be focused > on staying in its own area of the code, aka copyfrom*.c or copyto*.c. > -- > Michael -- Regards Junwang Zhao
On Thu, Dec 7, 2023 at 1:05 PM Sutou Kouhei <kou@clear-code.com> wrote:
>
> Hi,
>
> In <CAEG8a3LSRhK601Bn50u71BgfNWm4q3kv-o-KEq=hrbyLbY_EsA@mail.gmail.com>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Wed, 6 Dec 2023 22:07:51 +0800,
>   Junwang Zhao <zhjwpku@gmail.com> wrote:
>
> > Should we extract both *copy to* and *copy from* for the first step, in that
> > case we can add the pg_copy_handler catalog smoothly later.
>
> I don't object it (mixing TO/FROM changes to one patch) but
> it may make review difficult. Is it acceptable?
>
> FYI: I planed that I implement TO part, and then FROM part,
> and then unify TO/FROM parts if needed. [1]
I'm fine with step by step refactoring, let's just wait for more
suggestions.
>
> > Attached V4 adds 'extract copy from' and it passed the cirrus ci,
> > please take a look.
>
> Thanks. Here are my comments:
>
> > +             /*
> > +                     * Error is relevant to a particular line.
> > +                     *
> > +                     * If line_buf still contains the correct line, print it.
> > +                     */
> > +             if (cstate->line_buf_valid)
>
> We need to fix the indentation.
>
> > +CopyFromFormatBinaryStart(CopyFromState cstate, TupleDesc tupDesc)
> > +{
> > +     FmgrInfo   *in_functions;
> > +     Oid                *typioparams;
> > +     Oid                     in_func_oid;
> > +     AttrNumber      num_phys_attrs;
> > +
> > +     /*
> > +      * Pick up the required catalog information for each attribute in the
> > +      * relation, including the input function, the element type (to pass to
> > +      * the input function), and info about defaults and constraints. (Which
> > +      * input function we use depends on text/binary format choice.)
> > +      */
> > +     num_phys_attrs = tupDesc->natts;
> > +     in_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo));
> > +     typioparams = (Oid *) palloc(num_phys_attrs * sizeof(Oid));
>
> We need to update the comment because defaults and
> constraints aren't picked up here.
>
> > +CopyFromFormatTextStart(CopyFromState cstate, TupleDesc tupDesc)
> ...
> > +     /*
> > +      * Pick up the required catalog information for each attribute in the
> > +      * relation, including the input function, the element type (to pass to
> > +      * the input function), and info about defaults and constraints. (Which
> > +      * input function we use depends on text/binary format choice.)
> > +      */
> > +     in_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo));
> > +     typioparams = (Oid *) palloc(num_phys_attrs * sizeof(Oid));
>
> ditto.
>
>
> > @@ -1716,15 +1776,6 @@ BeginCopyFrom(ParseState *pstate,
> >               ReceiveCopyBinaryHeader(cstate);
> >       }
>
> I think that this block should be moved to
> CopyFromFormatBinaryStart() too. But we need to run it after
> we setup inputs such as data_source_cb, pipe and filename...
>
> +/* Routines for a COPY HANDLER implementation. */
> +typedef struct CopyHandlerOps
> +{
> +       /* Called when COPY TO is started. This will send a header. */
> +       void            (*copy_to_start) (CopyToState cstate, TupleDesc tupDesc);
> +
> +       /* Copy one row for COPY TO. */
> +       void            (*copy_to_one_row) (CopyToState cstate, TupleTableSlot *slot);
> +
> +       /* Called when COPY TO is ended. This will send a trailer. */
> +       void            (*copy_to_end) (CopyToState cstate);
> +
> +       void            (*copy_from_start) (CopyFromState cstate, TupleDesc tupDesc);
> +       bool            (*copy_from_next) (CopyFromState cstate, ExprContext *econtext,
> +                                                                  Datum *values, bool *nulls);
> +       void            (*copy_from_error_callback) (CopyFromState cstate);
> +       void            (*copy_from_end) (CopyFromState cstate);
> +} CopyHandlerOps;
>
> It seems that "copy_" prefix is redundant. Should we use
> "to_start" instead of "copy_to_start" and so on?
>
> BTW, it seems that "COPY FROM (FORMAT json)" may not be implemented. [2]
> We may need to care about NULL copy_from_* cases.
>
>
> > I added a hook *copy_from_end* but this might be removed later if not used.
>
> It may be useful to clean up resources for COPY FROM but the
> patch doesn't call the copy_from_end. How about removing it
> for now? We can add it and call it from EndCopyFrom() later?
> Because it's not needed for now.
>
> I think that we should focus on refactoring instead of
> adding a new feature in this patch.
>
>
> [1]: https://www.postgresql.org/message-id/20231204.153548.2126325458835528809.kou%40clear-code.com
> [2]:
https://www.postgresql.org/message-id/flat/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
>
>
> Thanks,
> --
> kou
--
Regards
Junwang Zhao
			
		On 2023-12-07 Th 03:37, Junwang Zhao wrote: > > The point of this refactor (from my view) is to make it possible to add new > copy handlers in extensions, just like access method. As Andres suggested, > a system catalog like *pg_copy_handler*, if we split TO and FROM into two > sets of routines, does that mean we have to create two catalog( > pg_copy_from_handler and pg_copy_to_handler)? Surely not. Either have two fields, one for the TO handler and one for the FROM handler, or a flag on each row indicating if it's a FROM or TO handler. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On Fri, Dec 8, 2023 at 1:39 AM Andrew Dunstan <andrew@dunslane.net> wrote: > > > On 2023-12-07 Th 03:37, Junwang Zhao wrote: > > > > The point of this refactor (from my view) is to make it possible to add new > > copy handlers in extensions, just like access method. As Andres suggested, > > a system catalog like *pg_copy_handler*, if we split TO and FROM into two > > sets of routines, does that mean we have to create two catalog( > > pg_copy_from_handler and pg_copy_to_handler)? > > > > Surely not. Either have two fields, one for the TO handler and one for > the FROM handler, or a flag on each row indicating if it's a FROM or TO > handler. True. But why do we need a system catalog like pg_copy_handler in the first place? I imagined that an extension can define a handler function returning a set of callbacks and the parser can lookup the handler function by name, like FDW and TABLESAMPLE. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
On Fri, Dec 8, 2023 at 3:27 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Fri, Dec 8, 2023 at 1:39 AM Andrew Dunstan <andrew@dunslane.net> wrote: > > > > > > On 2023-12-07 Th 03:37, Junwang Zhao wrote: > > > > > > The point of this refactor (from my view) is to make it possible to add new > > > copy handlers in extensions, just like access method. As Andres suggested, > > > a system catalog like *pg_copy_handler*, if we split TO and FROM into two > > > sets of routines, does that mean we have to create two catalog( > > > pg_copy_from_handler and pg_copy_to_handler)? > > > > > > > > Surely not. Either have two fields, one for the TO handler and one for > > the FROM handler, or a flag on each row indicating if it's a FROM or TO > > handler. If we wrap the two fields into a single structure, that will still be in copy.h, which I think is not necessary. A single routing wrapper should be enough, the actual implementation still stays separate copy_[to/from].c files. > > True. > > But why do we need a system catalog like pg_copy_handler in the first > place? I imagined that an extension can define a handler function > returning a set of callbacks and the parser can lookup the handler > function by name, like FDW and TABLESAMPLE. > I can see FDW related utility commands but no TABLESAMPLE related, and there is a pg_foreign_data_wrapper system catalog which has a *fdwhandler* field. If we want extensions to create a new copy handler, I think something like pg_copy_hander should be necessary. > Regards, > > -- > Masahiko Sawada > Amazon Web Services: https://aws.amazon.com I go one step further to implement the pg_copy_handler, attached V5 is the implementation with some changes suggested by Kou. You can also review this on this github pull request [1]. [1]: https://github.com/zhjwpku/postgres/pull/1/files -- Regards Junwang Zhao
Вложения
On Fri, Dec 08, 2023 at 10:32:27AM +0800, Junwang Zhao wrote:
> I can see FDW related utility commands but no TABLESAMPLE related,
> and there is a pg_foreign_data_wrapper system catalog which has
> a *fdwhandler* field.
+ */ +CATALOG(pg_copy_handler,4551,CopyHandlerRelationId)
Using a catalog is an over-engineered design.  Others have provided
hints about that upthread, but it would be enough to have one or two
handler types that are wrapped around one or two SQL *functions*, like
tablesamples.  It seems like you've missed it, but feel free to read
about tablesample-method.sgml, that explains how this is achieved for
tablesamples.
> If we want extensions to create a new copy handler, I think
> something like pg_copy_hander should be necessary.
A catalog is not necessary, that's the point, because it can be
replaced by a scan of pg_proc with the function name defined in a COPY
query (be it through a FORMAT, or different option in a DefElem).
An example of extension with tablesamples is contrib/tsm_system_rows/,
that just uses a function returning a tsm_handler:
CREATE FUNCTION system_rows(internal)
RETURNS tsm_handler
AS 'MODULE_PATHNAME', 'tsm_system_rows_handler'
LANGUAGE C STRICT;
Then SELECT queries rely on the contents of the TABLESAMPLE clause to
find the set of callbacks it should use by calling the function.
+/* Routines for a COPY HANDLER implementation. */
+typedef struct CopyRoutine
+{
FWIW, I find weird the concept of having one handler for both COPY
FROM and COPY TO as each one of them has callbacks that are mutually
exclusive to the other, but I'm OK if there is a consensus of only
one.  So I'd suggest to use *two* NodeTags instead for a cleaner
split, meaning that we'd need two functions for each method.  My point
is that a custom COPY handler could just define a COPY TO handler or a
COPY FROM handler, though it mostly comes down to a matter of taste
regarding how clean the error handling becomes if one tries to use a
set of callbacks with a COPY type (TO or FROM) not matching it.
--
Michael
			
		Вложения
On Fri, Dec 8, 2023 at 2:17 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Fri, Dec 08, 2023 at 10:32:27AM +0800, Junwang Zhao wrote:
> > I can see FDW related utility commands but no TABLESAMPLE related,
> > and there is a pg_foreign_data_wrapper system catalog which has
> > a *fdwhandler* field.
>
> + */ +CATALOG(pg_copy_handler,4551,CopyHandlerRelationId)
>
> Using a catalog is an over-engineered design.  Others have provided
> hints about that upthread, but it would be enough to have one or two
> handler types that are wrapped around one or two SQL *functions*, like
> tablesamples.  It seems like you've missed it, but feel free to read
> about tablesample-method.sgml, that explains how this is achieved for
> tablesamples.
Agreed. My previous example of FDW was not a good one, I missed something.
>
> > If we want extensions to create a new copy handler, I think
> > something like pg_copy_hander should be necessary.
>
> A catalog is not necessary, that's the point, because it can be
> replaced by a scan of pg_proc with the function name defined in a COPY
> query (be it through a FORMAT, or different option in a DefElem).
> An example of extension with tablesamples is contrib/tsm_system_rows/,
> that just uses a function returning a tsm_handler:
> CREATE FUNCTION system_rows(internal)
> RETURNS tsm_handler
> AS 'MODULE_PATHNAME', 'tsm_system_rows_handler'
> LANGUAGE C STRICT;
>
> Then SELECT queries rely on the contents of the TABLESAMPLE clause to
> find the set of callbacks it should use by calling the function.
>
> +/* Routines for a COPY HANDLER implementation. */
> +typedef struct CopyRoutine
> +{
>
> FWIW, I find weird the concept of having one handler for both COPY
> FROM and COPY TO as each one of them has callbacks that are mutually
> exclusive to the other, but I'm OK if there is a consensus of only
> one.  So I'd suggest to use *two* NodeTags instead for a cleaner
> split, meaning that we'd need two functions for each method.  My point
> is that a custom COPY handler could just define a COPY TO handler or a
> COPY FROM handler, though it mostly comes down to a matter of taste
> regarding how clean the error handling becomes if one tries to use a
> set of callbacks with a COPY type (TO or FROM) not matching it.
I tend to agree to have separate two functions for each method. But
given we implement it in tablesample-way, I think we need to make it
clear how to call one of the two functions depending on COPY TO and
FROM.
IIUC in tablesamples cases, we scan pg_proc to find the handler
function like system_rows(internal) by the method name specified in
the query. On the other hand, in COPY cases, the queries would be
going to be like:
COPY tab TO stdout WITH (format = 'arrow');
and
COPY tab FROM stdin WITH (format = 'arrow');
So a custom COPY extension would not be able to define SQL functions
just like arrow(internal) for example. We might need to define a rule
like the function returning copy_in/out_handler must be defined as
<method name>_to(internal) and <method_name>_from(internal).
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
			
		On Fri, Dec 08, 2023 at 03:42:06PM +0900, Masahiko Sawada wrote: > So a custom COPY extension would not be able to define SQL functions > just like arrow(internal) for example. We might need to define a rule > like the function returning copy_in/out_handler must be defined as > <method name>_to(internal) and <method_name>_from(internal). Yeah, I was wondering if there was a trick to avoid the input internal argument conflict, but cannot recall something elegant on the top of my mind. Anyway, I'd be OK with any approach as long as it plays nicely with the query integration, and that's FORMAT's DefElem with its string value to do the function lookups. -- Michael
Вложения
RE: Make COPY format extendable: Extract COPY TO format implementations
От
 
		    	"Hayato Kuroda (Fujitsu)"
		    Дата:
		        Dear Junagn, Sutou-san,
Basically I agree your point - improving a extendibility is good.
(I remember that this theme was talked at Japan PostgreSQL conference)
Below are my comments for your patch.
01. General
Just to confirm - is it OK to partially implement APIs? E.g., only COPY TO is
available. Currently it seems not to consider a case which is not implemented.
02. General
It might be trivial, but could you please clarify how users can extend? Is it OK
to do below steps?
1. Create a handler function, via CREATE FUNCTION,
2. Register a handler, via new SQL (CREATE COPY HANDLER),
3. Specify the added handler as COPY ... FORMAT clause.
03. General
Could you please add document-related tasks to your TODO? I imagined like
fdwhandler.sgml.
04. General - copyright
For newly added files, the below copyright seems sufficient. See applyparallelworker.c.
```
 * Copyright (c) 2023, PostgreSQL Global Development Group
```
05. src/include/catalog/* files
IIUC, 8000 or higher OIDs should be used while developing a patch. src/include/catalog/unused_oids
would suggest a candidate which you can use.
06. copy.c
I felt that we can create files per copying methods, like copy_{text|csv|binary}.c,
like indexes.
How do other think?
07. fmt_to_name()
I'm not sure the function is really needed. Can we follow like get_foreign_data_wrapper_oid()
and remove the funciton?
08. GetCopyRoutineByName()
Should we use syscache for searching a catalog?
09. CopyToFormatTextSendEndOfRow(), CopyToFormatBinaryStart()
Comments still refer CopyHandlerOps, whereas it was renamed.
10. copy.h
Per foreign.h and fdwapi.h, should we add a new header file and move some APIs?
11. copy.h
```
-/* These are private in commands/copy[from|to].c */
-typedef struct CopyFromStateData *CopyFromState;
-typedef struct CopyToStateData *CopyToState;
```
Are above changes really needed?
12. CopyFormatOptions
Can we remove `bool binary` in future?
13. external functions
```
+extern void CopyToFormatTextStart(CopyToState cstate, TupleDesc tupDesc);
+extern void CopyToFormatTextOneRow(CopyToState cstate, TupleTableSlot *slot);
+extern void CopyToFormatTextEnd(CopyToState cstate);
+extern void CopyFromFormatTextStart(CopyFromState cstate, TupleDesc tupDesc);
+extern bool CopyFromFormatTextNext(CopyFromState cstate, ExprContext *econtext,
+
Datum *values, bool *nulls);
+extern void CopyFromFormatTextErrorCallback(CopyFromState cstate);
+
+extern void CopyToFormatBinaryStart(CopyToState cstate, TupleDesc tupDesc);
+extern void CopyToFormatBinaryOneRow(CopyToState cstate, TupleTableSlot *slot);
+extern void CopyToFormatBinaryEnd(CopyToState cstate);
+extern void CopyFromFormatBinaryStart(CopyFromState cstate, TupleDesc tupDesc);
+extern bool CopyFromFormatBinaryNext(CopyFromState cstate,
ExprContext *econtext,
+
  Datum *values, bool *nulls);
+extern void CopyFromFormatBinaryErrorCallback(CopyFromState cstate);
```
FYI - If you add files for {text|csv|binary}, these declarations can be removed.
Best Regards,
Hayato Kuroda
FUJITSU LIMITED
			
		On Sat, Dec 9, 2023 at 10:43 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Junagn, Sutou-san,
>
> Basically I agree your point - improving a extendibility is good.
> (I remember that this theme was talked at Japan PostgreSQL conference)
> Below are my comments for your patch.
>
> 01. General
>
> Just to confirm - is it OK to partially implement APIs? E.g., only COPY TO is
> available. Currently it seems not to consider a case which is not implemented.
>
For partially implements, we can leave the hook as NULL, and check the NULL
at *ProcessCopyOptions* and report error if not supported.
> 02. General
>
> It might be trivial, but could you please clarify how users can extend? Is it OK
> to do below steps?
>
> 1. Create a handler function, via CREATE FUNCTION,
> 2. Register a handler, via new SQL (CREATE COPY HANDLER),
> 3. Specify the added handler as COPY ... FORMAT clause.
>
My original thought was option 2, but as Michael point, option 1 is
the right way
to go.
> 03. General
>
> Could you please add document-related tasks to your TODO? I imagined like
> fdwhandler.sgml.
>
> 04. General - copyright
>
> For newly added files, the below copyright seems sufficient. See applyparallelworker.c.
>
> ```
>  * Copyright (c) 2023, PostgreSQL Global Development Group
> ```
>
> 05. src/include/catalog/* files
>
> IIUC, 8000 or higher OIDs should be used while developing a patch. src/include/catalog/unused_oids
> would suggest a candidate which you can use.
Yeah, I will run renumber_oids.pl at last.
>
> 06. copy.c
>
> I felt that we can create files per copying methods, like copy_{text|csv|binary}.c,
> like indexes.
> How do other think?
Not sure about this, it seems others have put a lot of effort into
splitting TO and From.
Also like to hear from others.
>
> 07. fmt_to_name()
>
> I'm not sure the function is really needed. Can we follow like get_foreign_data_wrapper_oid()
> and remove the funciton?
I have referenced some code from greenplum, will remove this.
>
> 08. GetCopyRoutineByName()
>
> Should we use syscache for searching a catalog?
>
> 09. CopyToFormatTextSendEndOfRow(), CopyToFormatBinaryStart()
>
> Comments still refer CopyHandlerOps, whereas it was renamed.
>
> 10. copy.h
>
> Per foreign.h and fdwapi.h, should we add a new header file and move some APIs?
>
> 11. copy.h
>
> ```
> -/* These are private in commands/copy[from|to].c */
> -typedef struct CopyFromStateData *CopyFromState;
> -typedef struct CopyToStateData *CopyToState;
> ```
>
> Are above changes really needed?
>
> 12. CopyFormatOptions
>
> Can we remove `bool binary` in future?
>
> 13. external functions
>
> ```
> +extern void CopyToFormatTextStart(CopyToState cstate, TupleDesc tupDesc);
> +extern void CopyToFormatTextOneRow(CopyToState cstate, TupleTableSlot *slot);
> +extern void CopyToFormatTextEnd(CopyToState cstate);
> +extern void CopyFromFormatTextStart(CopyFromState cstate, TupleDesc tupDesc);
> +extern bool CopyFromFormatTextNext(CopyFromState cstate, ExprContext *econtext,
> +
> Datum *values, bool *nulls);
> +extern void CopyFromFormatTextErrorCallback(CopyFromState cstate);
> +
> +extern void CopyToFormatBinaryStart(CopyToState cstate, TupleDesc tupDesc);
> +extern void CopyToFormatBinaryOneRow(CopyToState cstate, TupleTableSlot *slot);
> +extern void CopyToFormatBinaryEnd(CopyToState cstate);
> +extern void CopyFromFormatBinaryStart(CopyFromState cstate, TupleDesc tupDesc);
> +extern bool CopyFromFormatBinaryNext(CopyFromState cstate,
> ExprContext *econtext,
> +
>   Datum *values, bool *nulls);
> +extern void CopyFromFormatBinaryErrorCallback(CopyFromState cstate);
> ```
>
> FYI - If you add files for {text|csv|binary}, these declarations can be removed.
>
> Best Regards,
> Hayato Kuroda
> FUJITSU LIMITED
>
Thanks for all the valuable suggestions.
--
Regards
Junwang Zhao
			
		Hi Junwang
Please also see my presentation slides from last years PostgreSQL
Conference in Berlin (attached)
The main Idea is to make not just "format", but also "transport" and
"stream processing" extendable via virtual function tables.
Btw, will any of you here be in Prague next week ?
Would be a good opportunity to discuss this in person.
Best Regards
Hannu
On Sat, Dec 9, 2023 at 9:39 AM Junwang Zhao <zhjwpku@gmail.com> wrote:
>
> On Sat, Dec 9, 2023 at 10:43 AM Hayato Kuroda (Fujitsu)
> <kuroda.hayato@fujitsu.com> wrote:
> >
> > Dear Junagn, Sutou-san,
> >
> > Basically I agree your point - improving a extendibility is good.
> > (I remember that this theme was talked at Japan PostgreSQL conference)
> > Below are my comments for your patch.
> >
> > 01. General
> >
> > Just to confirm - is it OK to partially implement APIs? E.g., only COPY TO is
> > available. Currently it seems not to consider a case which is not implemented.
> >
> For partially implements, we can leave the hook as NULL, and check the NULL
> at *ProcessCopyOptions* and report error if not supported.
>
> > 02. General
> >
> > It might be trivial, but could you please clarify how users can extend? Is it OK
> > to do below steps?
> >
> > 1. Create a handler function, via CREATE FUNCTION,
> > 2. Register a handler, via new SQL (CREATE COPY HANDLER),
> > 3. Specify the added handler as COPY ... FORMAT clause.
> >
> My original thought was option 2, but as Michael point, option 1 is
> the right way
> to go.
>
> > 03. General
> >
> > Could you please add document-related tasks to your TODO? I imagined like
> > fdwhandler.sgml.
> >
> > 04. General - copyright
> >
> > For newly added files, the below copyright seems sufficient. See applyparallelworker.c.
> >
> > ```
> >  * Copyright (c) 2023, PostgreSQL Global Development Group
> > ```
> >
> > 05. src/include/catalog/* files
> >
> > IIUC, 8000 or higher OIDs should be used while developing a patch. src/include/catalog/unused_oids
> > would suggest a candidate which you can use.
>
> Yeah, I will run renumber_oids.pl at last.
>
> >
> > 06. copy.c
> >
> > I felt that we can create files per copying methods, like copy_{text|csv|binary}.c,
> > like indexes.
> > How do other think?
>
> Not sure about this, it seems others have put a lot of effort into
> splitting TO and From.
> Also like to hear from others.
>
> >
> > 07. fmt_to_name()
> >
> > I'm not sure the function is really needed. Can we follow like get_foreign_data_wrapper_oid()
> > and remove the funciton?
>
> I have referenced some code from greenplum, will remove this.
>
> >
> > 08. GetCopyRoutineByName()
> >
> > Should we use syscache for searching a catalog?
> >
> > 09. CopyToFormatTextSendEndOfRow(), CopyToFormatBinaryStart()
> >
> > Comments still refer CopyHandlerOps, whereas it was renamed.
> >
> > 10. copy.h
> >
> > Per foreign.h and fdwapi.h, should we add a new header file and move some APIs?
> >
> > 11. copy.h
> >
> > ```
> > -/* These are private in commands/copy[from|to].c */
> > -typedef struct CopyFromStateData *CopyFromState;
> > -typedef struct CopyToStateData *CopyToState;
> > ```
> >
> > Are above changes really needed?
> >
> > 12. CopyFormatOptions
> >
> > Can we remove `bool binary` in future?
> >
> > 13. external functions
> >
> > ```
> > +extern void CopyToFormatTextStart(CopyToState cstate, TupleDesc tupDesc);
> > +extern void CopyToFormatTextOneRow(CopyToState cstate, TupleTableSlot *slot);
> > +extern void CopyToFormatTextEnd(CopyToState cstate);
> > +extern void CopyFromFormatTextStart(CopyFromState cstate, TupleDesc tupDesc);
> > +extern bool CopyFromFormatTextNext(CopyFromState cstate, ExprContext *econtext,
> > +
> > Datum *values, bool *nulls);
> > +extern void CopyFromFormatTextErrorCallback(CopyFromState cstate);
> > +
> > +extern void CopyToFormatBinaryStart(CopyToState cstate, TupleDesc tupDesc);
> > +extern void CopyToFormatBinaryOneRow(CopyToState cstate, TupleTableSlot *slot);
> > +extern void CopyToFormatBinaryEnd(CopyToState cstate);
> > +extern void CopyFromFormatBinaryStart(CopyFromState cstate, TupleDesc tupDesc);
> > +extern bool CopyFromFormatBinaryNext(CopyFromState cstate,
> > ExprContext *econtext,
> > +
> >   Datum *values, bool *nulls);
> > +extern void CopyFromFormatBinaryErrorCallback(CopyFromState cstate);
> > ```
> >
> > FYI - If you add files for {text|csv|binary}, these declarations can be removed.
> >
> > Best Regards,
> > Hayato Kuroda
> > FUJITSU LIMITED
> >
>
> Thanks for all the valuable suggestions.
>
> --
> Regards
> Junwang Zhao
>
>
			
		Вложения
Hi,
Thanks for reviewing our latest patch!
In 
 <TY3PR01MB9889C9234CD220A3A7075F0DF589A@TY3PR01MB9889.jpnprd01.prod.outlook.com>
  "RE: Make COPY format extendable: Extract COPY TO format implementations" on Sat, 9 Dec 2023 02:43:49 +0000,
  "Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com> wrote:
> (I remember that this theme was talked at Japan PostgreSQL conference)
Yes. I should have talked to you more at the conference...
I will do it next time!
Can we discuss how to proceed this improvement?
There are 2 approaches for it:
1. Do the followings concurrently:
   a. Implementing small changes that got a consensus and
      merge them step-by-step
      (e.g. We got a consensus that we need to extract the
      current format related routines.)
   b. Discuss design
   (v1-v3 patches use this approach.)
2. Implement one (large) complete patch set with design
   discussion and merge it
   (v4- patches use this approach.)
Which approach is preferred? (Or should we choose another
approach?)
I thought that 1. is preferred because it will reduce review
cost. So I chose 1.
If 2. is preferred, I'll use 2. (I'll add more changes to
the latest patch.)
Thanks,
-- 
kou
			
		Hi, In <CAD21AoDkoGL6yJ_HjNOg9cU=aAdW8uQ3rSQOeRS0SX85LPPNwQ@mail.gmail.com> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 8 Dec 2023 15:42:06 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > So a custom COPY extension would not be able to define SQL functions > just like arrow(internal) for example. We might need to define a rule > like the function returning copy_in/out_handler must be defined as > <method name>_to(internal) and <method_name>_from(internal). We may not need to add "_to"/"_from" suffix by checking both of argument type and return type. Because we use different return type for copy_in/out_handler. But the current LookupFuncName() family doesn't check return type. If we use this approach, we need to improve the current LookupFuncName() family too. Thanks, -- kou
Hi, In <CAMT0RQRqVo4fGDWHqOn+wr_eoiXQVfyC=8-c=H=y6VcNxi6BvQ@mail.gmail.com> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Sat, 9 Dec 2023 12:38:46 +0100, Hannu Krosing <hannuk@google.com> wrote: > Please also see my presentation slides from last years PostgreSQL > Conference in Berlin (attached) Thanks for sharing your idea here. > The main Idea is to make not just "format", but also "transport" and > "stream processing" extendable via virtual function tables. "Transport" and "stream processing" are out of scope in this thread. How about starting new threads for them and discuss them there? > Btw, will any of you here be in Prague next week ? Sorry. No. Thanks, -- kou
On Sun, Dec 10, 2023 at 4:44 AM Sutou Kouhei <kou@clear-code.com> wrote: > > Hi, > > Thanks for reviewing our latest patch! > > In > <TY3PR01MB9889C9234CD220A3A7075F0DF589A@TY3PR01MB9889.jpnprd01.prod.outlook.com> > "RE: Make COPY format extendable: Extract COPY TO format implementations" on Sat, 9 Dec 2023 02:43:49 +0000, > "Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com> wrote: > > > (I remember that this theme was talked at Japan PostgreSQL conference) > > Yes. I should have talked to you more at the conference... > I will do it next time! > > > Can we discuss how to proceed this improvement? > > There are 2 approaches for it: > > 1. Do the followings concurrently: > a. Implementing small changes that got a consensus and > merge them step-by-step > (e.g. We got a consensus that we need to extract the > current format related routines.) > b. Discuss design > > (v1-v3 patches use this approach.) > > 2. Implement one (large) complete patch set with design > discussion and merge it > > (v4- patches use this approach.) > > Which approach is preferred? (Or should we choose another > approach?) > > I thought that 1. is preferred because it will reduce review > cost. So I chose 1. > > If 2. is preferred, I'll use 2. (I'll add more changes to > the latest patch.) > I'm ok with both, and I'd like to work with you for the parquet extension, excited about this new feature, thanks for bringing this up. Forgive me for making so much noise about approach 2, I just want to hear about more suggestions of the final shape of this feature. > > Thanks, > -- > kou -- Regards Junwang Zhao
On Sun, Dec 10, 2023 at 5:44 AM Sutou Kouhei <kou@clear-code.com> wrote: > > Hi, > > Thanks for reviewing our latest patch! > > In > <TY3PR01MB9889C9234CD220A3A7075F0DF589A@TY3PR01MB9889.jpnprd01.prod.outlook.com> > "RE: Make COPY format extendable: Extract COPY TO format implementations" on Sat, 9 Dec 2023 02:43:49 +0000, > "Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com> wrote: > > > (I remember that this theme was talked at Japan PostgreSQL conference) > > Yes. I should have talked to you more at the conference... > I will do it next time! > > > Can we discuss how to proceed this improvement? > > There are 2 approaches for it: > > 1. Do the followings concurrently: > a. Implementing small changes that got a consensus and > merge them step-by-step > (e.g. We got a consensus that we need to extract the > current format related routines.) > b. Discuss design It's preferable to make patches small for easy review. We can merge them anytime before commit if necessary. I think we need to discuss overall design about callbacks and how extensions define a custom copy handler etc. It may require some PoC patches. Once we have a consensus on overall design we polish patches including the documentation changes and regression tests. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
On Sun, Dec 10, 2023 at 5:55 AM Sutou Kouhei <kou@clear-code.com> wrote: > > Hi, > > In <CAD21AoDkoGL6yJ_HjNOg9cU=aAdW8uQ3rSQOeRS0SX85LPPNwQ@mail.gmail.com> > "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 8 Dec 2023 15:42:06 +0900, > Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > So a custom COPY extension would not be able to define SQL functions > > just like arrow(internal) for example. We might need to define a rule > > like the function returning copy_in/out_handler must be defined as > > <method name>_to(internal) and <method_name>_from(internal). > > We may not need to add "_to"/"_from" suffix by checking both > of argument type and return type. Because we use different > return type for copy_in/out_handler. > > But the current LookupFuncName() family doesn't check return > type. If we use this approach, we need to improve the > current LookupFuncName() family too. IIUC we cannot create two same name functions with the same arguments but a different return value type in the first place. It seems to me to be an overkill to change such a design. Another idea is to encapsulate copy_to/from_handler by a super class like copy_handler. The handler function is called with an argument, say copyto, and returns copy_handler encapsulating either copy_to/from_handler depending on the argument. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
On Mon, Dec 11, 2023 at 10:57:15AM +0900, Masahiko Sawada wrote: > IIUC we cannot create two same name functions with the same arguments > but a different return value type in the first place. It seems to me > to be an overkill to change such a design. Agreed to not touch the logictics of LookupFuncName() for the sake of this thread. I have not checked the SQL specification, but I recall that there are a few assumptions from the spec embedded in the lookup logic particularly when it comes to specify a procedure name without arguments. > Another idea is to encapsulate copy_to/from_handler by a super class > like copy_handler. The handler function is called with an argument, > say copyto, and returns copy_handler encapsulating either > copy_to/from_handler depending on the argument. Yep, that's possible as well and can work as a cross-check between the argument and the NodeTag assigned to the handler structure returned by the function. At the end, the final result of the patch should IMO include: - Documentation about how one can register a custom copy_handler. - Something in src/test/modules/, minimalistic still useful that can be used as a template when one wants to implement their own handler. The documentation should mention about this module. - No need for SQL functions for all the in-core handlers: let's just return pointers to them based on the options given. It would be probably cleaner to split the patch so as the code is refactored and evaluated with the in-core handlers first, and then extended with the pluggable facilities and the function lookups. -- Michael
Вложения
On Sat, Dec 9, 2023 at 7:38 PM Hannu Krosing <hannuk@google.com> wrote:
>
> Hi Junwang
>
> Please also see my presentation slides from last years PostgreSQL
> Conference in Berlin (attached)
I read through the slides, really promising ideas, it's will be great
if we can get there at last.
>
> The main Idea is to make not just "format", but also "transport" and
> "stream processing" extendable via virtual function tables.
The code is really coupled, it is not easy to do all of these in one round,
it will be great if you have a POC patch.
>
> Btw, will any of you here be in Prague next week ?
> Would be a good opportunity to discuss this in person.
Sorry, no.
>
>
> Best Regards
> Hannu
>
> On Sat, Dec 9, 2023 at 9:39 AM Junwang Zhao <zhjwpku@gmail.com> wrote:
> >
> > On Sat, Dec 9, 2023 at 10:43 AM Hayato Kuroda (Fujitsu)
> > <kuroda.hayato@fujitsu.com> wrote:
> > >
> > > Dear Junagn, Sutou-san,
> > >
> > > Basically I agree your point - improving a extendibility is good.
> > > (I remember that this theme was talked at Japan PostgreSQL conference)
> > > Below are my comments for your patch.
> > >
> > > 01. General
> > >
> > > Just to confirm - is it OK to partially implement APIs? E.g., only COPY TO is
> > > available. Currently it seems not to consider a case which is not implemented.
> > >
> > For partially implements, we can leave the hook as NULL, and check the NULL
> > at *ProcessCopyOptions* and report error if not supported.
> >
> > > 02. General
> > >
> > > It might be trivial, but could you please clarify how users can extend? Is it OK
> > > to do below steps?
> > >
> > > 1. Create a handler function, via CREATE FUNCTION,
> > > 2. Register a handler, via new SQL (CREATE COPY HANDLER),
> > > 3. Specify the added handler as COPY ... FORMAT clause.
> > >
> > My original thought was option 2, but as Michael point, option 1 is
> > the right way
> > to go.
> >
> > > 03. General
> > >
> > > Could you please add document-related tasks to your TODO? I imagined like
> > > fdwhandler.sgml.
> > >
> > > 04. General - copyright
> > >
> > > For newly added files, the below copyright seems sufficient. See applyparallelworker.c.
> > >
> > > ```
> > >  * Copyright (c) 2023, PostgreSQL Global Development Group
> > > ```
> > >
> > > 05. src/include/catalog/* files
> > >
> > > IIUC, 8000 or higher OIDs should be used while developing a patch. src/include/catalog/unused_oids
> > > would suggest a candidate which you can use.
> >
> > Yeah, I will run renumber_oids.pl at last.
> >
> > >
> > > 06. copy.c
> > >
> > > I felt that we can create files per copying methods, like copy_{text|csv|binary}.c,
> > > like indexes.
> > > How do other think?
> >
> > Not sure about this, it seems others have put a lot of effort into
> > splitting TO and From.
> > Also like to hear from others.
> >
> > >
> > > 07. fmt_to_name()
> > >
> > > I'm not sure the function is really needed. Can we follow like get_foreign_data_wrapper_oid()
> > > and remove the funciton?
> >
> > I have referenced some code from greenplum, will remove this.
> >
> > >
> > > 08. GetCopyRoutineByName()
> > >
> > > Should we use syscache for searching a catalog?
> > >
> > > 09. CopyToFormatTextSendEndOfRow(), CopyToFormatBinaryStart()
> > >
> > > Comments still refer CopyHandlerOps, whereas it was renamed.
> > >
> > > 10. copy.h
> > >
> > > Per foreign.h and fdwapi.h, should we add a new header file and move some APIs?
> > >
> > > 11. copy.h
> > >
> > > ```
> > > -/* These are private in commands/copy[from|to].c */
> > > -typedef struct CopyFromStateData *CopyFromState;
> > > -typedef struct CopyToStateData *CopyToState;
> > > ```
> > >
> > > Are above changes really needed?
> > >
> > > 12. CopyFormatOptions
> > >
> > > Can we remove `bool binary` in future?
> > >
> > > 13. external functions
> > >
> > > ```
> > > +extern void CopyToFormatTextStart(CopyToState cstate, TupleDesc tupDesc);
> > > +extern void CopyToFormatTextOneRow(CopyToState cstate, TupleTableSlot *slot);
> > > +extern void CopyToFormatTextEnd(CopyToState cstate);
> > > +extern void CopyFromFormatTextStart(CopyFromState cstate, TupleDesc tupDesc);
> > > +extern bool CopyFromFormatTextNext(CopyFromState cstate, ExprContext *econtext,
> > > +
> > > Datum *values, bool *nulls);
> > > +extern void CopyFromFormatTextErrorCallback(CopyFromState cstate);
> > > +
> > > +extern void CopyToFormatBinaryStart(CopyToState cstate, TupleDesc tupDesc);
> > > +extern void CopyToFormatBinaryOneRow(CopyToState cstate, TupleTableSlot *slot);
> > > +extern void CopyToFormatBinaryEnd(CopyToState cstate);
> > > +extern void CopyFromFormatBinaryStart(CopyFromState cstate, TupleDesc tupDesc);
> > > +extern bool CopyFromFormatBinaryNext(CopyFromState cstate,
> > > ExprContext *econtext,
> > > +
> > >   Datum *values, bool *nulls);
> > > +extern void CopyFromFormatBinaryErrorCallback(CopyFromState cstate);
> > > ```
> > >
> > > FYI - If you add files for {text|csv|binary}, these declarations can be removed.
> > >
> > > Best Regards,
> > > Hayato Kuroda
> > > FUJITSU LIMITED
> > >
> >
> > Thanks for all the valuable suggestions.
> >
> > --
> > Regards
> > Junwang Zhao
> >
> >
--
Regards
Junwang Zhao
			
		On Mon, Dec 11, 2023 at 7:19 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Mon, Dec 11, 2023 at 10:57:15AM +0900, Masahiko Sawada wrote: > > IIUC we cannot create two same name functions with the same arguments > > but a different return value type in the first place. It seems to me > > to be an overkill to change such a design. > > Agreed to not touch the logictics of LookupFuncName() for the sake of > this thread. I have not checked the SQL specification, but I recall > that there are a few assumptions from the spec embedded in the lookup > logic particularly when it comes to specify a procedure name without > arguments. > > > Another idea is to encapsulate copy_to/from_handler by a super class > > like copy_handler. The handler function is called with an argument, > > say copyto, and returns copy_handler encapsulating either > > copy_to/from_handler depending on the argument. > > Yep, that's possible as well and can work as a cross-check between the > argument and the NodeTag assigned to the handler structure returned by > the function. > > At the end, the final result of the patch should IMO include: > - Documentation about how one can register a custom copy_handler. > - Something in src/test/modules/, minimalistic still useful that can > be used as a template when one wants to implement their own handler. > The documentation should mention about this module. > - No need for SQL functions for all the in-core handlers: let's just > return pointers to them based on the options given. Agreed. > It would be probably cleaner to split the patch so as the code is > refactored and evaluated with the in-core handlers first, and then > extended with the pluggable facilities and the function lookups. Agreed. I've sketched the above idea including a test module in src/test/module/test_copy_format, based on v2 patch. It's not splitted and is dirty so just for discussion. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Вложения
On Mon, Dec 11, 2023 at 10:32 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Mon, Dec 11, 2023 at 7:19 PM Michael Paquier <michael@paquier.xyz> wrote: > > > > On Mon, Dec 11, 2023 at 10:57:15AM +0900, Masahiko Sawada wrote: > > > IIUC we cannot create two same name functions with the same arguments > > > but a different return value type in the first place. It seems to me > > > to be an overkill to change such a design. > > > > Agreed to not touch the logictics of LookupFuncName() for the sake of > > this thread. I have not checked the SQL specification, but I recall > > that there are a few assumptions from the spec embedded in the lookup > > logic particularly when it comes to specify a procedure name without > > arguments. > > > > > Another idea is to encapsulate copy_to/from_handler by a super class > > > like copy_handler. The handler function is called with an argument, > > > say copyto, and returns copy_handler encapsulating either > > > copy_to/from_handler depending on the argument. > > > > Yep, that's possible as well and can work as a cross-check between the > > argument and the NodeTag assigned to the handler structure returned by > > the function. > > > > At the end, the final result of the patch should IMO include: > > - Documentation about how one can register a custom copy_handler. > > - Something in src/test/modules/, minimalistic still useful that can > > be used as a template when one wants to implement their own handler. > > The documentation should mention about this module. > > - No need for SQL functions for all the in-core handlers: let's just > > return pointers to them based on the options given. > > Agreed. > > > It would be probably cleaner to split the patch so as the code is > > refactored and evaluated with the in-core handlers first, and then > > extended with the pluggable facilities and the function lookups. > > Agreed. > > I've sketched the above idea including a test module in > src/test/module/test_copy_format, based on v2 patch. It's not splitted > and is dirty so just for discussion. > The test_copy_format extension doesn't use the fields of CopyToState and CopyFromState in this patch, I think we should move CopyFromStateData and CopyToStateData to commands/copy.h, what do you think? The framework in the patch LGTM. > > Regards, > > -- > Masahiko Sawada > Amazon Web Services: https://aws.amazon.com -- Regards Junwang Zhao
RE: Make COPY format extendable: Extract COPY TO format implementations
От
 
		    	"Hayato Kuroda (Fujitsu)"
		    Дата:
		        Dear Sutou-san, Junwang, Sorry for the delay reply. > > Can we discuss how to proceed this improvement? > > There are 2 approaches for it: > > 1. Do the followings concurrently: > a. Implementing small changes that got a consensus and > merge them step-by-step > (e.g. We got a consensus that we need to extract the > current format related routines.) > b. Discuss design > > (v1-v3 patches use this approach.) > > 2. Implement one (large) complete patch set with design > discussion and merge it > > (v4- patches use this approach.) > > Which approach is preferred? (Or should we choose another > approach?) > > I thought that 1. is preferred because it will reduce review > cost. So I chose 1. I'm ok to use approach 1, but could you please divide a large patch? E.g., 0001. defines an infrastructure for copy-API 0002. adjusts current codes to use APIs 0003. adds a test module in src/test/modules or contrib. ... This approach helps reviewers to see patches deeper. Separated patches can be combined when they are close to committable. Best Regards, Hayato Kuroda FUJITSU LIMITED
On Tue, Dec 12, 2023 at 11:09 AM Junwang Zhao <zhjwpku@gmail.com> wrote: > > On Mon, Dec 11, 2023 at 10:32 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Mon, Dec 11, 2023 at 7:19 PM Michael Paquier <michael@paquier.xyz> wrote: > > > > > > On Mon, Dec 11, 2023 at 10:57:15AM +0900, Masahiko Sawada wrote: > > > > IIUC we cannot create two same name functions with the same arguments > > > > but a different return value type in the first place. It seems to me > > > > to be an overkill to change such a design. > > > > > > Agreed to not touch the logictics of LookupFuncName() for the sake of > > > this thread. I have not checked the SQL specification, but I recall > > > that there are a few assumptions from the spec embedded in the lookup > > > logic particularly when it comes to specify a procedure name without > > > arguments. > > > > > > > Another idea is to encapsulate copy_to/from_handler by a super class > > > > like copy_handler. The handler function is called with an argument, > > > > say copyto, and returns copy_handler encapsulating either > > > > copy_to/from_handler depending on the argument. > > > > > > Yep, that's possible as well and can work as a cross-check between the > > > argument and the NodeTag assigned to the handler structure returned by > > > the function. > > > > > > At the end, the final result of the patch should IMO include: > > > - Documentation about how one can register a custom copy_handler. > > > - Something in src/test/modules/, minimalistic still useful that can > > > be used as a template when one wants to implement their own handler. > > > The documentation should mention about this module. > > > - No need for SQL functions for all the in-core handlers: let's just > > > return pointers to them based on the options given. > > > > Agreed. > > > > > It would be probably cleaner to split the patch so as the code is > > > refactored and evaluated with the in-core handlers first, and then > > > extended with the pluggable facilities and the function lookups. > > > > Agreed. > > > > I've sketched the above idea including a test module in > > src/test/module/test_copy_format, based on v2 patch. It's not splitted > > and is dirty so just for discussion. > > > The test_copy_format extension doesn't use the fields of CopyToState and > CopyFromState in this patch, I think we should move CopyFromStateData > and CopyToStateData to commands/copy.h, what do you think? Yes, I basically agree with that, where we move CopyFromStateData to might depend on how we define COPY FROM APIs though. I think we can move CopyToStateData to copy.h at least. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Hi,
In <CAD21AoCvjGserrtEU=UcA3Mfyfe6ftf9OXPHv9fiJ9DmXMJ2nQ@mail.gmail.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 11 Dec 2023 10:57:15 +0900,
  Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> IIUC we cannot create two same name functions with the same arguments
> but a different return value type in the first place. It seems to me
> to be an overkill to change such a design.
Oh, sorry. I didn't notice it.
> Another idea is to encapsulate copy_to/from_handler by a super class
> like copy_handler. The handler function is called with an argument,
> say copyto, and returns copy_handler encapsulating either
> copy_to/from_handler depending on the argument.
It's for using "${copy_format_name}" such as "json" and
"parquet" as a function name, right? If we use the
"${copy_format_name}" approach, we can't use function names
that are already used by tablesample method handler such as
"system" and "bernoulli" for COPY FORMAT name. Because both
of tablesample method handler function and COPY FORMAT
handler function use "(internal)" as arguments.
I think that tablesample method names and COPY FORMAT names
will not be conflicted but the limitation (using the same
namespace for tablesample method and COPY FORMAT) is
unnecessary limitation.
How about using prefix ("copy_to_${copy_format_name}" or
something) or suffix ("${copy_format_name}_copy_to" or
something) for function names? For example,
"copy_to_json"/"copy_from_json" for "json" COPY FORMAT.
("copy_${copy_format_name}" that returns copy_handler
encapsulating either copy_to/from_handler depending on the
argument may be an option.)
Thanks,
-- 
kou
			
		On Thu, Dec 14, 2023 at 6:44 PM Sutou Kouhei <kou@clear-code.com> wrote:
>
> Hi,
>
> In <CAD21AoCvjGserrtEU=UcA3Mfyfe6ftf9OXPHv9fiJ9DmXMJ2nQ@mail.gmail.com>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 11 Dec 2023 10:57:15 +0900,
>   Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> > IIUC we cannot create two same name functions with the same arguments
> > but a different return value type in the first place. It seems to me
> > to be an overkill to change such a design.
>
> Oh, sorry. I didn't notice it.
>
> > Another idea is to encapsulate copy_to/from_handler by a super class
> > like copy_handler. The handler function is called with an argument,
> > say copyto, and returns copy_handler encapsulating either
> > copy_to/from_handler depending on the argument.
>
> It's for using "${copy_format_name}" such as "json" and
> "parquet" as a function name, right?
Right.
> If we use the
> "${copy_format_name}" approach, we can't use function names
> that are already used by tablesample method handler such as
> "system" and "bernoulli" for COPY FORMAT name. Because both
> of tablesample method handler function and COPY FORMAT
> handler function use "(internal)" as arguments.
>
> I think that tablesample method names and COPY FORMAT names
> will not be conflicted but the limitation (using the same
> namespace for tablesample method and COPY FORMAT) is
> unnecessary limitation.
Presumably, such function name collisions are not limited to
tablesample and copy, but apply to all functions that have an
"internal" argument. To avoid collisions, extensions can be created in
a different schema than public. And note that built-in format copy
handler doesn't need to declare its handler function.
>
> How about using prefix ("copy_to_${copy_format_name}" or
> something) or suffix ("${copy_format_name}_copy_to" or
> something) for function names? For example,
> "copy_to_json"/"copy_from_json" for "json" COPY FORMAT.
>
> ("copy_${copy_format_name}" that returns copy_handler
> encapsulating either copy_to/from_handler depending on the
> argument may be an option.)
While there is a way to avoid collision as I mentioned above, I can
see the point that we might want to avoid using a generic function
name such as "arrow" and "parquet" as custom copy handler functions.
Adding a prefix or suffix would be one option but to give extensions
more flexibility, another option would be to support format = 'custom'
and add the "handler" option to specify a copy handler function name
to call. For example, COPY ... FROM ... WITH (FORMAT = 'custom',
HANDLER = 'arrow_copy_handler').
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
			
		Hi, In <CAD21AoCZv3cVU+NxR2s9J_dWvjrS350GFFr2vMgCH8wWxQ5hTQ@mail.gmail.com> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 15 Dec 2023 05:19:43 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > To avoid collisions, extensions can be created in a > different schema than public. Thanks. I didn't notice it. > And note that built-in format copy handler doesn't need to > declare its handler function. Right. I know it. > Adding a prefix or suffix would be one option but to give extensions > more flexibility, another option would be to support format = 'custom' > and add the "handler" option to specify a copy handler function name > to call. For example, COPY ... FROM ... WITH (FORMAT = 'custom', > HANDLER = 'arrow_copy_handler'). Interesting. If we use this option, users can choose an COPY FORMAT implementation they like from multiple implementations. For example, a developer may implement a COPY FROM FORMAT = 'json' handler with PostgreSQL's JSON related API and another developer may implement a handler with simdjson[1] which is a fast JSON parser. Users can choose whichever they like. But specifying HANDLER = '...' explicitly is a bit inconvenient. Because only one handler will be installed in most use cases. In the case, users don't need to choose one handler. If we choose this option, it may be better that we also provide a mechanism that can work without HANDLER. Searching a function by name like tablesample method does is an option. [1]: https://github.com/simdjson/simdjson Thanks, -- kou
Hi,
In 
 <OS3PR01MB9882F023300EDC5AFD8A8339F58EA@OS3PR01MB9882.jpnprd01.prod.outlook.com>
  "RE: Make COPY format extendable: Extract COPY TO format implementations" on Tue, 12 Dec 2023 02:31:53 +0000,
  "Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com> wrote:
>> Can we discuss how to proceed this improvement?
>> 
>> There are 2 approaches for it:
>> 
>> 1. Do the followings concurrently:
>>    a. Implementing small changes that got a consensus and
>>       merge them step-by-step
>>       (e.g. We got a consensus that we need to extract the
>>       current format related routines.)
>>    b. Discuss design
>> 
>>    (v1-v3 patches use this approach.)
>> 
>> 2. Implement one (large) complete patch set with design
>>    discussion and merge it
>> 
>>    (v4- patches use this approach.)
>> 
>> Which approach is preferred? (Or should we choose another
>> approach?)
>> 
>> I thought that 1. is preferred because it will reduce review
>> cost. So I chose 1.
> 
> I'm ok to use approach 1, but could you please divide a large patch? E.g.,
> 
> 0001. defines an infrastructure for copy-API
> 0002. adjusts current codes to use APIs
> 0003. adds a test module in src/test/modules or contrib.
> ...
> 
> This approach helps reviewers to see patches deeper. Separated patches can be
> combined when they are close to committable.
It seems that I should have chosen another approach based on
comments so far:
3. Do the followings in order:
   a. Implement a workable (but maybe dirty and/or incomplete)
      implementation to discuss design like [1], discuss
      design with it and get a consensus on design
   b. Implement small patches based on the design
[1]: https://www.postgresql.org/message-id/CAD21AoCunywHird3GaPzWe6s9JG1wzxj3Cr6vGN36DDheGjOjA%40mail.gmail.com 
I'll implement a custom COPY FORMAT handler with [1] and
provide a feedback with the experience. (It's for a.)
Thanks,
-- 
kou
			
		On Fri, Dec 15, 2023 at 8:53 AM Sutou Kouhei <kou@clear-code.com> wrote: > > Hi, > > In <CAD21AoCZv3cVU+NxR2s9J_dWvjrS350GFFr2vMgCH8wWxQ5hTQ@mail.gmail.com> > "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 15 Dec 2023 05:19:43 +0900, > Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > To avoid collisions, extensions can be created in a > > different schema than public. > > Thanks. I didn't notice it. > > > And note that built-in format copy handler doesn't need to > > declare its handler function. > > Right. I know it. > > > Adding a prefix or suffix would be one option but to give extensions > > more flexibility, another option would be to support format = 'custom' > > and add the "handler" option to specify a copy handler function name > > to call. For example, COPY ... FROM ... WITH (FORMAT = 'custom', > > HANDLER = 'arrow_copy_handler'). > I like the prefix/suffix idea, easy to implement. *custom* is not a FORMAT, and user has to know the name of the specific handler names, not intuitive. > Interesting. If we use this option, users can choose an COPY > FORMAT implementation they like from multiple > implementations. For example, a developer may implement a > COPY FROM FORMAT = 'json' handler with PostgreSQL's JSON > related API and another developer may implement a handler > with simdjson[1] which is a fast JSON parser. Users can > choose whichever they like. Not sure about this, why not move Json copy handler to contrib as an example for others, any extensions share the same format function name and just install one? No bound would implement another CSV or TEXT copy handler IMHO. > > But specifying HANDLER = '...' explicitly is a bit > inconvenient. Because only one handler will be installed in > most use cases. In the case, users don't need to choose one > handler. > > If we choose this option, it may be better that we also > provide a mechanism that can work without HANDLER. Searching > a function by name like tablesample method does is an option. > > > [1]: https://github.com/simdjson/simdjson > > > Thanks, > -- > kou -- Regards Junwang Zhao
On Fri, Dec 15, 2023 at 9:53 AM Sutou Kouhei <kou@clear-code.com> wrote: > > Hi, > > In <CAD21AoCZv3cVU+NxR2s9J_dWvjrS350GFFr2vMgCH8wWxQ5hTQ@mail.gmail.com> > "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 15 Dec 2023 05:19:43 +0900, > Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > To avoid collisions, extensions can be created in a > > different schema than public. > > Thanks. I didn't notice it. > > > And note that built-in format copy handler doesn't need to > > declare its handler function. > > Right. I know it. > > > Adding a prefix or suffix would be one option but to give extensions > > more flexibility, another option would be to support format = 'custom' > > and add the "handler" option to specify a copy handler function name > > to call. For example, COPY ... FROM ... WITH (FORMAT = 'custom', > > HANDLER = 'arrow_copy_handler'). > > Interesting. If we use this option, users can choose an COPY > FORMAT implementation they like from multiple > implementations. For example, a developer may implement a > COPY FROM FORMAT = 'json' handler with PostgreSQL's JSON > related API and another developer may implement a handler > with simdjson[1] which is a fast JSON parser. Users can > choose whichever they like. > > But specifying HANDLER = '...' explicitly is a bit > inconvenient. Because only one handler will be installed in > most use cases. In the case, users don't need to choose one > handler. > > If we choose this option, it may be better that we also > provide a mechanism that can work without HANDLER. Searching > a function by name like tablesample method does is an option. Agreed. We can search the function by format name by default and the user can optionally specify the handler function name in case where the names of the installed custom copy handler collide. Probably the handler option stuff could be a follow-up patch. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Hi, In <CAEG8a3JuShA6g19Nt_Ejk15BrNA6PmeCbK7p81izZi71muGq3g@mail.gmail.com> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 15 Dec 2023 11:27:30 +0800, Junwang Zhao <zhjwpku@gmail.com> wrote: >> > Adding a prefix or suffix would be one option but to give extensions >> > more flexibility, another option would be to support format = 'custom' >> > and add the "handler" option to specify a copy handler function name >> > to call. For example, COPY ... FROM ... WITH (FORMAT = 'custom', >> > HANDLER = 'arrow_copy_handler'). >> > I like the prefix/suffix idea, easy to implement. *custom* is not a FORMAT, > and user has to know the name of the specific handler names, not > intuitive. Ah! I misunderstood this idea. "custom" is the special format to use "HANDLER". I thought that we can use it like (FORMAT = 'arrow', HANDLER = 'arrow_copy_handler_impl1') and (FORMAT = 'arrow', HANDLER = 'arrow_copy_handler_impl2') . >> Interesting. If we use this option, users can choose an COPY >> FORMAT implementation they like from multiple >> implementations. For example, a developer may implement a >> COPY FROM FORMAT = 'json' handler with PostgreSQL's JSON >> related API and another developer may implement a handler >> with simdjson[1] which is a fast JSON parser. Users can >> choose whichever they like. > Not sure about this, why not move Json copy handler to contrib > as an example for others, any extensions share the same format > function name and just install one? No bound would implement > another CSV or TEXT copy handler IMHO. I should have used a different format not JSON as an example for easy to understand. I just wanted to say that extension developers can implement another implementation without conflicting another implementation. Thanks, -- kou
On Fri, Dec 15, 2023 at 12:45 PM Sutou Kouhei <kou@clear-code.com> wrote: > > Hi, > > In <CAEG8a3JuShA6g19Nt_Ejk15BrNA6PmeCbK7p81izZi71muGq3g@mail.gmail.com> > "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 15 Dec 2023 11:27:30 +0800, > Junwang Zhao <zhjwpku@gmail.com> wrote: > > >> > Adding a prefix or suffix would be one option but to give extensions > >> > more flexibility, another option would be to support format = 'custom' > >> > and add the "handler" option to specify a copy handler function name > >> > to call. For example, COPY ... FROM ... WITH (FORMAT = 'custom', > >> > HANDLER = 'arrow_copy_handler'). > >> > > I like the prefix/suffix idea, easy to implement. *custom* is not a FORMAT, > > and user has to know the name of the specific handler names, not > > intuitive. > > Ah! I misunderstood this idea. "custom" is the special > format to use "HANDLER". I thought that we can use it like > > (FORMAT = 'arrow', HANDLER = 'arrow_copy_handler_impl1') > > and > > (FORMAT = 'arrow', HANDLER = 'arrow_copy_handler_impl2') > > . > > >> Interesting. If we use this option, users can choose an COPY > >> FORMAT implementation they like from multiple > >> implementations. For example, a developer may implement a > >> COPY FROM FORMAT = 'json' handler with PostgreSQL's JSON > >> related API and another developer may implement a handler > >> with simdjson[1] which is a fast JSON parser. Users can > >> choose whichever they like. > > Not sure about this, why not move Json copy handler to contrib > > as an example for others, any extensions share the same format > > function name and just install one? No bound would implement > > another CSV or TEXT copy handler IMHO. > > I should have used a different format not JSON as an example > for easy to understand. I just wanted to say that extension > developers can implement another implementation without > conflicting another implementation. Yeah, I can see the value of the HANDLER option now. The possibility of two extensions for the same format using same hanlder name should be rare I guess ;) > > > Thanks, > -- > kou -- Regards Junwang Zhao
Hi,
In <CAD21AoCunywHird3GaPzWe6s9JG1wzxj3Cr6vGN36DDheGjOjA@mail.gmail.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 11 Dec 2023 23:31:29 +0900,
  Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> I've sketched the above idea including a test module in
> src/test/module/test_copy_format, based on v2 patch. It's not splitted
> and is dirty so just for discussion.
I implemented a sample COPY TO handler for Apache Arrow that
supports only integer and text.
I needed to extend the patch:
1. Add an opaque space for custom COPY TO handler
   * Add CopyToState{Get,Set}Opaque()
   https://github.com/kou/postgres/commit/5a610b6a066243f971e029432db67152cfe5e944
2. Export CopyToState::attnumlist
   * Add CopyToStateGetAttNumList()
   https://github.com/kou/postgres/commit/15fcba8b4e95afa86edb3f677a7bdb1acb1e7688
3. Export CopySend*()
   * Rename CopySend*() to CopyToStateSend*() and export them
   * Exception: CopySendEndOfRow() to CopyToStateFlush() because
     it just flushes the internal buffer now.
   https://github.com/kou/postgres/commit/289a5640135bde6733a1b8e2c412221ad522901e
The attached patch is based on the Sawada-san's patch and
includes the above changes. Note that this patch is also
dirty so just for discussion.
My suggestions from this experience:
1. Split COPY handler to COPY TO handler and COPY FROM handler
   * CopyFormatRoutine is a bit tricky. An extension needs
     to create a CopyFormatRoutine node and
     a CopyToFormatRoutine node.
   * If we just require "copy_to_${FORMAT}(internal)"
     function and "copy_from_${FORMAT}(internal)" function,
     we can remove the tricky approach. And it also avoid
     name collisions with other handler such as tablesample
     handler.
     See also:
https://www.postgresql.org/message-id/flat/20231214.184414.2179134502876898942.kou%40clear-code.com#af71f364d0a9f5c144e45b447e5c16c9
2. Need an opaque space like IndexScanDesc::opaque does
   * A custom COPY TO handler needs to keep its data
3. Export CopySend*()
   * If we like minimum API, we just need to export
     CopySendData() and CopySendEndOfRow(). But
     CopySend{String,Char,Int32,Int16}() will be convenient
     custom COPY TO handlers. (A custom COPY TO handler for
     Apache Arrow doesn't need them.)
Questions:
1. What value should be used for "format" in
   PgMsg_CopyOutResponse message?
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/commands/copyto.c;h=c66a047c4a79cc614784610f385f1cd0935350f3;hb=9ca6e7b9411e36488ef539a2c1f6846ac92a7072#l144
   It's 1 for binary format and 0 for text/csv format.
   Should we make it customizable by custom COPY TO handler?
   If so, what value should be used for this?
2. Do we need more tries for design discussion for the first
   implementation? If we need, what should we try?
Thanks,
-- 
kou
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfad47b562..e7597894bf 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -23,6 +23,7 @@
 #include "access/xact.h"
 #include "catalog/pg_authid.h"
 #include "commands/copy.h"
+#include "commands/copyapi.h"
 #include "commands/defrem.h"
 #include "executor/executor.h"
 #include "mb/pg_wchar.h"
@@ -32,6 +33,7 @@
 #include "parser/parse_coerce.h"
 #include "parser/parse_collate.h"
 #include "parser/parse_expr.h"
+#include "parser/parse_func.h"
 #include "parser/parse_relation.h"
 #include "rewrite/rewriteHandler.h"
 #include "utils/acl.h"
@@ -427,6 +429,8 @@ ProcessCopyOptions(ParseState *pstate,
 
     opts_out->file_encoding = -1;
 
+    /* Text is the default format. */
+    opts_out->to_ops = &CopyToTextFormatRoutine;
     /* Extract options from the statement node tree */
     foreach(option, options)
     {
@@ -442,9 +446,26 @@ ProcessCopyOptions(ParseState *pstate,
             if (strcmp(fmt, "text") == 0)
                  /* default format */ ;
             else if (strcmp(fmt, "csv") == 0)
+            {
                 opts_out->csv_mode = true;
+                opts_out->to_ops = &CopyToCSVFormatRoutine;
+            }
             else if (strcmp(fmt, "binary") == 0)
+            {
                 opts_out->binary = true;
+                opts_out->to_ops = &CopyToBinaryFormatRoutine;
+            }
+            else if (!is_from)
+            {
+                /*
+                 * XXX: Currently we support custom COPY format only for COPY
+                 * TO.
+                 *
+                 * XXX: need to check the combination of the existing options
+                 * and a custom format (e.g., FREEZE)?
+                 */
+                opts_out->to_ops = GetCopyToFormatRoutine(fmt);
+            }
             else
                 ereport(ERROR,
                         (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -864,3 +885,62 @@ CopyGetAttnums(TupleDesc tupDesc, Relation rel, List *attnamelist)
 
     return attnums;
 }
+
+static CopyFormatRoutine *
+GetCopyFormatRoutine(char *format_name, bool is_from)
+{
+    Oid            handlerOid;
+    Oid            funcargtypes[1];
+    CopyFormatRoutine *cp;
+    Datum        datum;
+
+    funcargtypes[0] = INTERNALOID;
+    handlerOid = LookupFuncName(list_make1(makeString(format_name)), 1,
+                                funcargtypes, true);
+
+    if (!OidIsValid(handlerOid))
+        ereport(ERROR,
+                (errcode(ERRCODE_UNDEFINED_OBJECT),
+                 errmsg("COPY format \"%s\" not recognized", format_name)));
+
+    datum = OidFunctionCall1(handlerOid, BoolGetDatum(is_from));
+
+    cp = (CopyFormatRoutine *) DatumGetPointer(datum);
+
+    if (cp == NULL || !IsA(cp, CopyFormatRoutine))
+        elog(ERROR, "copy handler function %u did not return a CopyFormatRoutine struct",
+             handlerOid);
+
+    if (!IsA(cp->routine, CopyToFormatRoutine) &&
+        !IsA(cp->routine, CopyFromFormatRoutine))
+        elog(ERROR, "copy handler function %u returned invalid CopyFormatRoutine struct",
+             handlerOid);
+
+    if (!cp->is_from && !IsA(cp->routine, CopyToFormatRoutine))
+        elog(ERROR, "copy handler function %u returned COPY FROM routines but expected COPY TO routines",
+             handlerOid);
+
+    if (cp->is_from && !IsA(cp->routine, CopyFromFormatRoutine))
+        elog(ERROR, "copy handler function %u returned COPY TO routines but expected COPY FROM routines",
+             handlerOid);
+
+    return cp;
+}
+
+CopyToFormatRoutine *
+GetCopyToFormatRoutine(char *format_name)
+{
+    CopyFormatRoutine *cp;
+
+    cp = GetCopyFormatRoutine(format_name, false);
+    return (CopyToFormatRoutine *) castNode(CopyToFormatRoutine, cp->routine);
+}
+
+CopyFromFormatRoutine *
+GetCopyFromFormatRoutine(char *format_name)
+{
+    CopyFormatRoutine *cp;
+
+    cp = GetCopyFormatRoutine(format_name, true);
+    return (CopyFromFormatRoutine *) castNode(CopyFromFormatRoutine, cp->routine);
+}
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index c66a047c4a..3b1c2a277c 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -99,6 +99,9 @@ typedef struct CopyToStateData
     FmgrInfo   *out_functions;    /* lookup info for output functions */
     MemoryContext rowcontext;    /* per-row evaluation context */
     uint64        bytes_processed;    /* number of bytes processed so far */
+
+    /* For custom format implementation */
+    void *opaque; /* private space */
 } CopyToStateData;
 
 /* DestReceiver for COPY (query) TO */
@@ -124,13 +127,229 @@ static void CopyAttributeOutCSV(CopyToState cstate, const char *string,
 /* Low-level communications functions */
 static void SendCopyBegin(CopyToState cstate);
 static void SendCopyEnd(CopyToState cstate);
-static void CopySendData(CopyToState cstate, const void *databuf, int datasize);
-static void CopySendString(CopyToState cstate, const char *str);
-static void CopySendChar(CopyToState cstate, char c);
-static void CopySendEndOfRow(CopyToState cstate);
-static void CopySendInt32(CopyToState cstate, int32 val);
-static void CopySendInt16(CopyToState cstate, int16 val);
 
+/* Exported functions that are used by custom format routines. */
+
+/* TODO: Document */
+void *CopyToStateGetOpaque(CopyToState cstate)
+{
+    return cstate->opaque;
+}
+
+/* TODO: Document */
+void CopyToStateSetOpaque(CopyToState cstate, void *opaque)
+{
+     cstate->opaque = opaque;
+}
+
+/* TODO: Document */
+List *CopyToStateGetAttNumList(CopyToState cstate)
+{
+    return cstate->attnumlist;
+}
+
+/*
+ * CopyToFormatOps implementations.
+ */
+
+/*
+ * CopyToFormatOps implementation for "text" and "csv". CopyToFormatText*()
+ * refer cstate->opts.csv_mode and change their behavior. We can split this
+ * implementation and stop referring cstate->opts.csv_mode later.
+ */
+
+static void
+CopyToFormatTextSendEndOfRow(CopyToState cstate)
+{
+    switch (cstate->copy_dest)
+    {
+        case COPY_FILE:
+            /* Default line termination depends on platform */
+#ifndef WIN32
+            CopyToStateSendChar(cstate, '\n');
+#else
+            CopyToStateSendString(cstate, "\r\n");
+#endif
+            break;
+        case COPY_FRONTEND:
+            /* The FE/BE protocol uses \n as newline for all platforms */
+            CopyToStateSendChar(cstate, '\n');
+            break;
+        default:
+            break;
+    }
+    CopyToStateFlush(cstate);
+}
+
+static void
+CopyToFormatTextStart(CopyToState cstate, TupleDesc tupDesc)
+{
+    int            num_phys_attrs;
+    ListCell   *cur;
+
+    num_phys_attrs = tupDesc->natts;
+    /* Get info about the columns we need to process. */
+    cstate->out_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo));
+    foreach(cur, cstate->attnumlist)
+    {
+        int            attnum = lfirst_int(cur);
+        Oid            out_func_oid;
+        bool        isvarlena;
+        Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
+
+        getTypeOutputInfo(attr->atttypid, &out_func_oid, &isvarlena);
+        fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
+    }
+
+    /*
+     * For non-binary copy, we need to convert null_print to file encoding,
+     * because it will be sent directly with CopyToStateSendString.
+     */
+    if (cstate->need_transcoding)
+        cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print,
+                                                          cstate->opts.null_print_len,
+                                                          cstate->file_encoding);
+
+    /* if a header has been requested send the line */
+    if (cstate->opts.header_line)
+    {
+        bool        hdr_delim = false;
+
+        foreach(cur, cstate->attnumlist)
+        {
+            int            attnum = lfirst_int(cur);
+            char       *colname;
+
+            if (hdr_delim)
+                CopyToStateSendChar(cstate, cstate->opts.delim[0]);
+            hdr_delim = true;
+
+            colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
+
+            if (cstate->opts.csv_mode)
+                CopyAttributeOutCSV(cstate, colname, false,
+                                    list_length(cstate->attnumlist) == 1);
+            else
+                CopyAttributeOutText(cstate, colname);
+        }
+
+        CopyToFormatTextSendEndOfRow(cstate);
+    }
+}
+
+static void
+CopyToFormatTextOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+    bool        need_delim = false;
+    FmgrInfo   *out_functions = cstate->out_functions;
+    ListCell   *cur;
+
+    foreach(cur, cstate->attnumlist)
+    {
+        int            attnum = lfirst_int(cur);
+        Datum        value = slot->tts_values[attnum - 1];
+        bool        isnull = slot->tts_isnull[attnum - 1];
+
+        if (need_delim)
+            CopyToStateSendChar(cstate, cstate->opts.delim[0]);
+        need_delim = true;
+
+        if (isnull)
+            CopyToStateSendString(cstate, cstate->opts.null_print_client);
+        else
+        {
+            char       *string;
+
+            string = OutputFunctionCall(&out_functions[attnum - 1], value);
+            if (cstate->opts.csv_mode)
+                CopyAttributeOutCSV(cstate, string,
+                                    cstate->opts.force_quote_flags[attnum - 1],
+                                    list_length(cstate->attnumlist) == 1);
+            else
+                CopyAttributeOutText(cstate, string);
+        }
+    }
+
+    CopyToFormatTextSendEndOfRow(cstate);
+}
+
+static void
+CopyToFormatTextEnd(CopyToState cstate)
+{
+}
+
+/*
+ * CopyToFormatOps implementation for "binary".
+ */
+
+static void
+CopyToFormatBinaryStart(CopyToState cstate, TupleDesc tupDesc)
+{
+    int            num_phys_attrs;
+    ListCell   *cur;
+
+    num_phys_attrs = tupDesc->natts;
+    /* Get info about the columns we need to process. */
+    cstate->out_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo));
+    foreach(cur, cstate->attnumlist)
+    {
+        int            attnum = lfirst_int(cur);
+        Oid            out_func_oid;
+        bool        isvarlena;
+        Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
+
+        getTypeBinaryOutputInfo(attr->atttypid, &out_func_oid, &isvarlena);
+        fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
+    }
+
+    /* Generate header for a binary copy */
+    /* Signature */
+    CopyToStateSendData(cstate, BinarySignature, 11);
+    /* Flags field */
+    CopyToStateSendInt32(cstate, 0);
+    /* No header extension */
+    CopyToStateSendInt32(cstate, 0);
+}
+
+static void
+CopyToFormatBinaryOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+    FmgrInfo   *out_functions = cstate->out_functions;
+    ListCell   *cur;
+
+    /* Binary per-tuple header */
+    CopyToStateSendInt16(cstate, list_length(cstate->attnumlist));
+
+    foreach(cur, cstate->attnumlist)
+    {
+        int            attnum = lfirst_int(cur);
+        Datum        value = slot->tts_values[attnum - 1];
+        bool        isnull = slot->tts_isnull[attnum - 1];
+
+        if (isnull)
+            CopyToStateSendInt32(cstate, -1);
+        else
+        {
+            bytea       *outputbytes;
+
+            outputbytes = SendFunctionCall(&out_functions[attnum - 1], value);
+            CopyToStateSendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
+            CopyToStateSendData(cstate, VARDATA(outputbytes),
+                         VARSIZE(outputbytes) - VARHDRSZ);
+        }
+    }
+
+    CopyToStateFlush(cstate);
+}
+
+static void
+CopyToFormatBinaryEnd(CopyToState cstate)
+{
+    /* Generate trailer for a binary copy */
+    CopyToStateSendInt16(cstate, -1);
+    /* Need to flush out the trailer */
+    CopyToStateFlush(cstate);
+}
 
 /*
  * Send copy start/stop messages for frontend copies.  These have changed
@@ -163,51 +382,41 @@ SendCopyEnd(CopyToState cstate)
 }
 
 /*----------
- * CopySendData sends output data to the destination (file or frontend)
- * CopySendString does the same for null-terminated strings
- * CopySendChar does the same for single characters
- * CopySendEndOfRow does the appropriate thing at end of each data row
- *    (data is not actually flushed except by CopySendEndOfRow)
+ * CopyToStateSendData sends output data to the destination (file or frontend)
+ * CopyToStateSendString does the same for null-terminated strings
+ * CopyToStateSendChar does the same for single characters
+ * CopyToStateFlush does the appropriate thing at end of each data row
+ *    (data is not actually flushed except by CopyToStateFlush)
  *
  * NB: no data conversion is applied by these functions
  *----------
  */
-static void
-CopySendData(CopyToState cstate, const void *databuf, int datasize)
+void
+CopyToStateSendData(CopyToState cstate, const void *databuf, int datasize)
 {
     appendBinaryStringInfo(cstate->fe_msgbuf, databuf, datasize);
 }
 
-static void
-CopySendString(CopyToState cstate, const char *str)
+void
+CopyToStateSendString(CopyToState cstate, const char *str)
 {
     appendBinaryStringInfo(cstate->fe_msgbuf, str, strlen(str));
 }
 
-static void
-CopySendChar(CopyToState cstate, char c)
+void
+CopyToStateSendChar(CopyToState cstate, char c)
 {
     appendStringInfoCharMacro(cstate->fe_msgbuf, c);
 }
 
-static void
-CopySendEndOfRow(CopyToState cstate)
+void
+CopyToStateFlush(CopyToState cstate)
 {
     StringInfo    fe_msgbuf = cstate->fe_msgbuf;
 
     switch (cstate->copy_dest)
     {
         case COPY_FILE:
-            if (!cstate->opts.binary)
-            {
-                /* Default line termination depends on platform */
-#ifndef WIN32
-                CopySendChar(cstate, '\n');
-#else
-                CopySendString(cstate, "\r\n");
-#endif
-            }
-
             if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1,
                        cstate->copy_file) != 1 ||
                 ferror(cstate->copy_file))
@@ -242,10 +451,6 @@ CopySendEndOfRow(CopyToState cstate)
             }
             break;
         case COPY_FRONTEND:
-            /* The FE/BE protocol uses \n as newline for all platforms */
-            if (!cstate->opts.binary)
-                CopySendChar(cstate, '\n');
-
             /* Dump the accumulated row as one CopyData message */
             (void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len);
             break;
@@ -266,27 +471,27 @@ CopySendEndOfRow(CopyToState cstate)
  */
 
 /*
- * CopySendInt32 sends an int32 in network byte order
+ * CopyToStateSendInt32 sends an int32 in network byte order
  */
-static inline void
-CopySendInt32(CopyToState cstate, int32 val)
+void
+CopyToStateSendInt32(CopyToState cstate, int32 val)
 {
     uint32        buf;
 
     buf = pg_hton32((uint32) val);
-    CopySendData(cstate, &buf, sizeof(buf));
+    CopyToStateSendData(cstate, &buf, sizeof(buf));
 }
 
 /*
- * CopySendInt16 sends an int16 in network byte order
+ * CopyToStateSendInt16 sends an int16 in network byte order
  */
-static inline void
-CopySendInt16(CopyToState cstate, int16 val)
+void
+CopyToStateSendInt16(CopyToState cstate, int16 val)
 {
     uint16        buf;
 
     buf = pg_hton16((uint16) val);
-    CopySendData(cstate, &buf, sizeof(buf));
+    CopyToStateSendData(cstate, &buf, sizeof(buf));
 }
 
 /*
@@ -748,8 +953,6 @@ DoCopyTo(CopyToState cstate)
     bool        pipe = (cstate->filename == NULL && cstate->data_dest_cb == NULL);
     bool        fe_copy = (pipe && whereToSendOutput == DestRemote);
     TupleDesc    tupDesc;
-    int            num_phys_attrs;
-    ListCell   *cur;
     uint64        processed;
 
     if (fe_copy)
@@ -759,32 +962,11 @@ DoCopyTo(CopyToState cstate)
         tupDesc = RelationGetDescr(cstate->rel);
     else
         tupDesc = cstate->queryDesc->tupDesc;
-    num_phys_attrs = tupDesc->natts;
     cstate->opts.null_print_client = cstate->opts.null_print;    /* default */
 
     /* We use fe_msgbuf as a per-row buffer regardless of copy_dest */
     cstate->fe_msgbuf = makeStringInfo();
 
-    /* Get info about the columns we need to process. */
-    cstate->out_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo));
-    foreach(cur, cstate->attnumlist)
-    {
-        int            attnum = lfirst_int(cur);
-        Oid            out_func_oid;
-        bool        isvarlena;
-        Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
-
-        if (cstate->opts.binary)
-            getTypeBinaryOutputInfo(attr->atttypid,
-                                    &out_func_oid,
-                                    &isvarlena);
-        else
-            getTypeOutputInfo(attr->atttypid,
-                              &out_func_oid,
-                              &isvarlena);
-        fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
-    }
-
     /*
      * Create a temporary memory context that we can reset once per row to
      * recover palloc'd memory.  This avoids any problems with leaks inside
@@ -795,57 +977,7 @@ DoCopyTo(CopyToState cstate)
                                                "COPY TO",
                                                ALLOCSET_DEFAULT_SIZES);
 
-    if (cstate->opts.binary)
-    {
-        /* Generate header for a binary copy */
-        int32        tmp;
-
-        /* Signature */
-        CopySendData(cstate, BinarySignature, 11);
-        /* Flags field */
-        tmp = 0;
-        CopySendInt32(cstate, tmp);
-        /* No header extension */
-        tmp = 0;
-        CopySendInt32(cstate, tmp);
-    }
-    else
-    {
-        /*
-         * For non-binary copy, we need to convert null_print to file
-         * encoding, because it will be sent directly with CopySendString.
-         */
-        if (cstate->need_transcoding)
-            cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print,
-                                                              cstate->opts.null_print_len,
-                                                              cstate->file_encoding);
-
-        /* if a header has been requested send the line */
-        if (cstate->opts.header_line)
-        {
-            bool        hdr_delim = false;
-
-            foreach(cur, cstate->attnumlist)
-            {
-                int            attnum = lfirst_int(cur);
-                char       *colname;
-
-                if (hdr_delim)
-                    CopySendChar(cstate, cstate->opts.delim[0]);
-                hdr_delim = true;
-
-                colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
-
-                if (cstate->opts.csv_mode)
-                    CopyAttributeOutCSV(cstate, colname, false,
-                                        list_length(cstate->attnumlist) == 1);
-                else
-                    CopyAttributeOutText(cstate, colname);
-            }
-
-            CopySendEndOfRow(cstate);
-        }
-    }
+    cstate->opts.to_ops->start_fn(cstate, tupDesc);
 
     if (cstate->rel)
     {
@@ -884,13 +1016,7 @@ DoCopyTo(CopyToState cstate)
         processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
     }
 
-    if (cstate->opts.binary)
-    {
-        /* Generate trailer for a binary copy */
-        CopySendInt16(cstate, -1);
-        /* Need to flush out the trailer */
-        CopySendEndOfRow(cstate);
-    }
+    cstate->opts.to_ops->end_fn(cstate);
 
     MemoryContextDelete(cstate->rowcontext);
 
@@ -906,71 +1032,15 @@ DoCopyTo(CopyToState cstate)
 static void
 CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 {
-    bool        need_delim = false;
-    FmgrInfo   *out_functions = cstate->out_functions;
     MemoryContext oldcontext;
-    ListCell   *cur;
-    char       *string;
 
     MemoryContextReset(cstate->rowcontext);
     oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
 
-    if (cstate->opts.binary)
-    {
-        /* Binary per-tuple header */
-        CopySendInt16(cstate, list_length(cstate->attnumlist));
-    }
-
     /* Make sure the tuple is fully deconstructed */
     slot_getallattrs(slot);
 
-    foreach(cur, cstate->attnumlist)
-    {
-        int            attnum = lfirst_int(cur);
-        Datum        value = slot->tts_values[attnum - 1];
-        bool        isnull = slot->tts_isnull[attnum - 1];
-
-        if (!cstate->opts.binary)
-        {
-            if (need_delim)
-                CopySendChar(cstate, cstate->opts.delim[0]);
-            need_delim = true;
-        }
-
-        if (isnull)
-        {
-            if (!cstate->opts.binary)
-                CopySendString(cstate, cstate->opts.null_print_client);
-            else
-                CopySendInt32(cstate, -1);
-        }
-        else
-        {
-            if (!cstate->opts.binary)
-            {
-                string = OutputFunctionCall(&out_functions[attnum - 1],
-                                            value);
-                if (cstate->opts.csv_mode)
-                    CopyAttributeOutCSV(cstate, string,
-                                        cstate->opts.force_quote_flags[attnum - 1],
-                                        list_length(cstate->attnumlist) == 1);
-                else
-                    CopyAttributeOutText(cstate, string);
-            }
-            else
-            {
-                bytea       *outputbytes;
-
-                outputbytes = SendFunctionCall(&out_functions[attnum - 1],
-                                               value);
-                CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
-                CopySendData(cstate, VARDATA(outputbytes),
-                             VARSIZE(outputbytes) - VARHDRSZ);
-            }
-        }
-    }
-
-    CopySendEndOfRow(cstate);
+    cstate->opts.to_ops->onerow_fn(cstate, slot);
 
     MemoryContextSwitchTo(oldcontext);
 }
@@ -981,7 +1051,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 #define DUMPSOFAR() \
     do { \
         if (ptr > start) \
-            CopySendData(cstate, start, ptr - start); \
+            CopyToStateSendData(cstate, start, ptr - start); \
     } while (0)
 
 static void
@@ -1000,7 +1070,7 @@ CopyAttributeOutText(CopyToState cstate, const char *string)
     /*
      * We have to grovel through the string searching for control characters
      * and instances of the delimiter character.  In most cases, though, these
-     * are infrequent.  To avoid overhead from calling CopySendData once per
+     * are infrequent.  To avoid overhead from calling CopyToStateSendData once per
      * character, we dump out all characters between escaped characters in a
      * single call.  The loop invariant is that the data from "start" to "ptr"
      * can be sent literally, but hasn't yet been.
@@ -1055,14 +1125,14 @@ CopyAttributeOutText(CopyToState cstate, const char *string)
                 }
                 /* if we get here, we need to convert the control char */
                 DUMPSOFAR();
-                CopySendChar(cstate, '\\');
-                CopySendChar(cstate, c);
+                CopyToStateSendChar(cstate, '\\');
+                CopyToStateSendChar(cstate, c);
                 start = ++ptr;    /* do not include char in next run */
             }
             else if (c == '\\' || c == delimc)
             {
                 DUMPSOFAR();
-                CopySendChar(cstate, '\\');
+                CopyToStateSendChar(cstate, '\\');
                 start = ptr++;    /* we include char in next run */
             }
             else if (IS_HIGHBIT_SET(c))
@@ -1115,14 +1185,14 @@ CopyAttributeOutText(CopyToState cstate, const char *string)
                 }
                 /* if we get here, we need to convert the control char */
                 DUMPSOFAR();
-                CopySendChar(cstate, '\\');
-                CopySendChar(cstate, c);
+                CopyToStateSendChar(cstate, '\\');
+                CopyToStateSendChar(cstate, c);
                 start = ++ptr;    /* do not include char in next run */
             }
             else if (c == '\\' || c == delimc)
             {
                 DUMPSOFAR();
-                CopySendChar(cstate, '\\');
+                CopyToStateSendChar(cstate, '\\');
                 start = ptr++;    /* we include char in next run */
             }
             else
@@ -1189,7 +1259,7 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string,
 
     if (use_quote)
     {
-        CopySendChar(cstate, quotec);
+        CopyToStateSendChar(cstate, quotec);
 
         /*
          * We adopt the same optimization strategy as in CopyAttributeOutText
@@ -1200,7 +1270,7 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string,
             if (c == quotec || c == escapec)
             {
                 DUMPSOFAR();
-                CopySendChar(cstate, escapec);
+                CopyToStateSendChar(cstate, escapec);
                 start = ptr;    /* we include char in next run */
             }
             if (IS_HIGHBIT_SET(c) && cstate->encoding_embeds_ascii)
@@ -1210,12 +1280,12 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string,
         }
         DUMPSOFAR();
 
-        CopySendChar(cstate, quotec);
+        CopyToStateSendChar(cstate, quotec);
     }
     else
     {
         /* If it doesn't need quoting, we can just dump it as-is */
-        CopySendString(cstate, ptr);
+        CopyToStateSendString(cstate, ptr);
     }
 }
 
@@ -1284,3 +1354,33 @@ CreateCopyDestReceiver(void)
 
     return (DestReceiver *) self;
 }
+
+CopyToFormatRoutine CopyToTextFormatRoutine =
+{
+    .type = T_CopyToFormatRoutine,
+    .start_fn = CopyToFormatTextStart,
+    .onerow_fn = CopyToFormatTextOneRow,
+    .end_fn = CopyToFormatTextEnd,
+};
+
+/*
+ * We can use the same CopyToFormatOps for both of "text" and "csv" because
+ * CopyToFormatText*() refer cstate->opts.csv_mode and change their
+ * behavior. We can split the implementations and stop referring
+ * cstate->opts.csv_mode later.
+ */
+CopyToFormatRoutine CopyToCSVFormatRoutine =
+{
+    .type = T_CopyToFormatRoutine,
+    .start_fn = CopyToFormatTextStart,
+    .onerow_fn = CopyToFormatTextOneRow,
+    .end_fn = CopyToFormatTextEnd,
+};
+
+CopyToFormatRoutine CopyToBinaryFormatRoutine =
+{
+    .type = T_CopyToFormatRoutine,
+    .start_fn = CopyToFormatBinaryStart,
+    .onerow_fn = CopyToFormatBinaryOneRow,
+    .end_fn = CopyToFormatBinaryEnd,
+};
diff --git a/src/backend/nodes/Makefile b/src/backend/nodes/Makefile
index 66bbad8e6e..173ee11811 100644
--- a/src/backend/nodes/Makefile
+++ b/src/backend/nodes/Makefile
@@ -49,6 +49,7 @@ node_headers = \
     access/sdir.h \
     access/tableam.h \
     access/tsmapi.h \
+    commands/copyapi.h \
     commands/event_trigger.h \
     commands/trigger.h \
     executor/tuptable.h \
diff --git a/src/backend/nodes/gen_node_support.pl b/src/backend/nodes/gen_node_support.pl
index 72c7963578..c48015a612 100644
--- a/src/backend/nodes/gen_node_support.pl
+++ b/src/backend/nodes/gen_node_support.pl
@@ -61,6 +61,7 @@ my @all_input_files = qw(
   access/sdir.h
   access/tableam.h
   access/tsmapi.h
+  commands/copyapi.h
   commands/event_trigger.h
   commands/trigger.h
   executor/tuptable.h
@@ -85,6 +86,7 @@ my @nodetag_only_files = qw(
   access/sdir.h
   access/tableam.h
   access/tsmapi.h
+  commands/copyapi.h
   commands/event_trigger.h
   commands/trigger.h
   executor/tuptable.h
diff --git a/src/backend/utils/adt/pseudotypes.c b/src/backend/utils/adt/pseudotypes.c
index 3ba8cb192c..4391e5cefc 100644
--- a/src/backend/utils/adt/pseudotypes.c
+++ b/src/backend/utils/adt/pseudotypes.c
@@ -373,6 +373,7 @@ PSEUDOTYPE_DUMMY_IO_FUNCS(fdw_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(table_am_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(index_am_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(tsm_handler);
+PSEUDOTYPE_DUMMY_IO_FUNCS(copy_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(internal);
 PSEUDOTYPE_DUMMY_IO_FUNCS(anyelement);
 PSEUDOTYPE_DUMMY_IO_FUNCS(anynonarray);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 77e8b13764..9e0f33ad9e 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -7602,6 +7602,12 @@
 { oid => '3312', descr => 'I/O',
   proname => 'tsm_handler_out', prorettype => 'cstring',
   proargtypes => 'tsm_handler', prosrc => 'tsm_handler_out' },
+{ oid => '8753', descr => 'I/O',
+  proname => 'copy_handler_in', proisstrict => 'f', prorettype => 'copy_handler',
+  proargtypes => 'cstring', prosrc => 'copy_handler_in' },
+{ oid => '8754', descr => 'I/O',
+  proname => 'copy_handler_out', prorettype => 'cstring',
+  proargtypes => 'copy_handler', prosrc => 'copy_handler_out' },
 { oid => '267', descr => 'I/O',
   proname => 'table_am_handler_in', proisstrict => 'f',
   prorettype => 'table_am_handler', proargtypes => 'cstring',
diff --git a/src/include/catalog/pg_type.dat b/src/include/catalog/pg_type.dat
index f6110a850d..4fe5c17818 100644
--- a/src/include/catalog/pg_type.dat
+++ b/src/include/catalog/pg_type.dat
@@ -632,6 +632,12 @@
   typcategory => 'P', typinput => 'tsm_handler_in',
   typoutput => 'tsm_handler_out', typreceive => '-', typsend => '-',
   typalign => 'i' },
+{ oid => '8752',
+  descr => 'pseudo-type for the result of a copy to/from method functoin',
+  typname => 'copy_handler', typlen => '4', typbyval => 't', typtype => 'p',
+  typcategory => 'P', typinput => 'copy_handler_in',
+  typoutput => 'copy_handler_out', typreceive => '-', typsend => '-',
+  typalign => 'i' },
 { oid => '269',
   typname => 'table_am_handler',
   descr => 'pseudo-type for the result of a table AM handler function',
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index f2cca0b90b..cd081bd925 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -18,6 +18,7 @@
 #include "nodes/parsenodes.h"
 #include "parser/parse_node.h"
 #include "tcop/dest.h"
+#include "commands/copyapi.h"
 
 /*
  * Represents whether a header line should be present, and whether it must
@@ -63,12 +64,9 @@ typedef struct CopyFormatOptions
     bool       *force_null_flags;    /* per-column CSV FN flags */
     bool        convert_selectively;    /* do selective binary conversion? */
     List       *convert_select; /* list of column names (can be NIL) */
+    CopyToFormatRoutine *to_ops;    /* callback routines for COPY TO */
 } CopyFormatOptions;
 
-/* These are private in commands/copy[from|to].c */
-typedef struct CopyFromStateData *CopyFromState;
-typedef struct CopyToStateData *CopyToState;
-
 typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
 typedef void (*copy_data_dest_cb) (void *data, int len);
 
@@ -102,4 +100,9 @@ extern uint64 DoCopyTo(CopyToState cstate);
 extern List *CopyGetAttnums(TupleDesc tupDesc, Relation rel,
                             List *attnamelist);
 
+/* build-in COPY TO format routines */
+extern CopyToFormatRoutine CopyToTextFormatRoutine;
+extern CopyToFormatRoutine CopyToCSVFormatRoutine;
+extern CopyToFormatRoutine CopyToBinaryFormatRoutine;
+
 #endif                            /* COPY_H */
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
new file mode 100644
index 0000000000..2a38d72ce7
--- /dev/null
+++ b/src/include/commands/copyapi.h
@@ -0,0 +1,71 @@
+/*-------------------------------------------------------------------------
+ *
+ * copyapi.h
+ *      API for COPY TO/FROM
+ *
+ * Copyright (c) 2015-2023, PostgreSQL Global Development Group
+ *
+ * src/include/command/copyapi.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef COPYAPI_H
+#define COPYAPI_H
+
+#include "executor/tuptable.h"
+
+typedef struct CopyToStateData *CopyToState;
+extern void *CopyToStateGetOpaque(CopyToState cstate);
+extern void CopyToStateSetOpaque(CopyToState cstate, void *opaque);
+extern List *CopyToStateGetAttNumList(CopyToState cstate);
+
+extern void CopyToStateSendData(CopyToState cstate, const void *databuf, int datasize);
+extern void CopyToStateSendString(CopyToState cstate, const char *str);
+extern void CopyToStateSendChar(CopyToState cstate, char c);
+extern void CopyToStateSendInt32(CopyToState cstate, int32 val);
+extern void CopyToStateSendInt16(CopyToState cstate, int16 val);
+extern void CopyToStateFlush(CopyToState cstate);
+
+typedef struct CopyFromStateData *CopyFromState;
+
+
+typedef void (*CopyToStart_function) (CopyToState cstate, TupleDesc tupDesc);
+typedef void (*CopyToOneRow_function) (CopyToState cstate, TupleTableSlot *slot);
+typedef void (*CopyToEnd_function) (CopyToState cstate);
+
+/* XXX: just copied from COPY TO routines */
+typedef void (*CopyFromStart_function) (CopyFromState cstate, TupleDesc tupDesc);
+typedef void (*CopyFromOneRow_function) (CopyFromState cstate, TupleTableSlot *slot);
+typedef void (*CopyFromEnd_function) (CopyFromState cstate);
+
+typedef struct CopyFormatRoutine
+{
+    NodeTag        type;
+
+    bool        is_from;
+    Node       *routine;
+}            CopyFormatRoutine;
+
+typedef struct CopyToFormatRoutine
+{
+    NodeTag        type;
+
+    CopyToStart_function start_fn;
+    CopyToOneRow_function onerow_fn;
+    CopyToEnd_function end_fn;
+}            CopyToFormatRoutine;
+
+/* XXX: just copied from COPY TO routines */
+typedef struct CopyFromFormatRoutine
+{
+    NodeTag        type;
+
+    CopyFromStart_function start_fn;
+    CopyFromOneRow_function onerow_fn;
+    CopyFromEnd_function end_fn;
+}            CopyFromFormatRoutine;
+
+extern CopyToFormatRoutine * GetCopyToFormatRoutine(char *format_name);
+extern CopyFromFormatRoutine * GetCopyFromFormatRoutine(char *format_name);
+
+#endif                            /* COPYAPI_H */
diff --git a/src/include/nodes/meson.build b/src/include/nodes/meson.build
index 626dc696d5..53b262568c 100644
--- a/src/include/nodes/meson.build
+++ b/src/include/nodes/meson.build
@@ -11,6 +11,7 @@ node_support_input_i = [
   'access/sdir.h',
   'access/tableam.h',
   'access/tsmapi.h',
+  'commands/copyapi.h',
   'commands/event_trigger.h',
   'commands/trigger.h',
   'executor/tuptable.h',
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index 5d33fa6a9a..204cfd3f49 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -15,6 +15,7 @@ SUBDIRS = \
           spgist_name_ops \
           test_bloomfilter \
           test_copy_callbacks \
+          test_copy_format \
           test_custom_rmgrs \
           test_ddl_deparse \
           test_dsa \
diff --git a/src/test/modules/meson.build b/src/test/modules/meson.build
index b76f588559..2fbe1abd4a 100644
--- a/src/test/modules/meson.build
+++ b/src/test/modules/meson.build
@@ -12,6 +12,7 @@ subdir('spgist_name_ops')
 subdir('ssl_passphrase_callback')
 subdir('test_bloomfilter')
 subdir('test_copy_callbacks')
+subdir('test_copy_format')
 subdir('test_custom_rmgrs')
 subdir('test_ddl_deparse')
 subdir('test_dsa')
diff --git a/src/test/modules/test_copy_format/.gitignore b/src/test/modules/test_copy_format/.gitignore
new file mode 100644
index 0000000000..5dcb3ff972
--- /dev/null
+++ b/src/test/modules/test_copy_format/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/src/test/modules/test_copy_format/Makefile b/src/test/modules/test_copy_format/Makefile
new file mode 100644
index 0000000000..f2b89b56a1
--- /dev/null
+++ b/src/test/modules/test_copy_format/Makefile
@@ -0,0 +1,23 @@
+# src/test/modules/test_copy_format/Makefile
+
+MODULE_big = test_copy_format
+OBJS = \
+    $(WIN32RES) \
+    test_copy_format.o
+PGFILEDESC = "test_copy_format - test custom COPY format"
+
+EXTENSION = test_copy_format
+DATA = test_copy_format--1.0.sql
+
+REGRESS = test_copy_format
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_copy_format
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out
b/src/test/modules/test_copy_format/expected/test_copy_format.out
new file mode 100644
index 0000000000..8becdb6369
--- /dev/null
+++ b/src/test/modules/test_copy_format/expected/test_copy_format.out
@@ -0,0 +1,14 @@
+CREATE EXTENSION test_copy_format;
+CREATE TABLE public.test (a INT, b INT, c INT);
+INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+COPY public.test FROM stdin WITH (format 'testfmt');
+ERROR:  COPY format "testfmt" not recognized
+LINE 1: COPY public.test FROM stdin WITH (format 'testfmt');
+                                          ^
+COPY public.test TO stdout WITH (format 'testfmt');
+NOTICE:  testfmt_handler called with is_from 0
+NOTICE:  testfmt_copyto_start called
+NOTICE:  testfmt_copyto_onerow called
+NOTICE:  testfmt_copyto_onerow called
+NOTICE:  testfmt_copyto_onerow called
+NOTICE:  testfmt_copyto_end called
diff --git a/src/test/modules/test_copy_format/meson.build b/src/test/modules/test_copy_format/meson.build
new file mode 100644
index 0000000000..4adf048280
--- /dev/null
+++ b/src/test/modules/test_copy_format/meson.build
@@ -0,0 +1,33 @@
+# Copyright (c) 2022-2023, PostgreSQL Global Development Group
+
+test_copy_format_sources = files(
+  'test_copy_format.c',
+)
+
+if host_system == 'windows'
+  test_copy_format_sources += rc_lib_gen.process(win32ver_rc, extra_args: [
+    '--NAME', 'test_copy_format',
+    '--FILEDESC', 'test_copy_format - test COPY format routines',])
+endif
+
+test_copy_format = shared_module('test_copy_format',
+  test_copy_format_sources,
+  kwargs: pg_test_mod_args,
+)
+test_install_libs += test_copy_format
+
+test_install_data += files(
+  'test_copy_format.control',
+  'test_copy_format--1.0.sql',
+)
+
+tests += {
+  'name': 'test_copy_format',
+  'sd': meson.current_source_dir(),
+  'bd': meson.current_build_dir(),
+  'regress': {
+    'sql': [
+      'test_copy_format',
+    ],
+  },
+}
diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql
b/src/test/modules/test_copy_format/sql/test_copy_format.sql
new file mode 100644
index 0000000000..1052135252
--- /dev/null
+++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql
@@ -0,0 +1,5 @@
+CREATE EXTENSION test_copy_format;
+CREATE TABLE public.test (a INT, b INT, c INT);
+INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+COPY public.test FROM stdin WITH (format 'testfmt');
+COPY public.test TO stdout WITH (format 'testfmt');
diff --git a/src/test/modules/test_copy_format/test_copy_format--1.0.sql
b/src/test/modules/test_copy_format/test_copy_format--1.0.sql
new file mode 100644
index 0000000000..2749924831
--- /dev/null
+++ b/src/test/modules/test_copy_format/test_copy_format--1.0.sql
@@ -0,0 +1,9 @@
+/* src/test/modules/test_copy_format/test_copy_format--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_copy_format" to load this file. \quit
+
+
+CREATE FUNCTION testfmt(internal)
+    RETURNS copy_handler
+    AS 'MODULE_PATHNAME', 'copy_testfmt_handler' LANGUAGE C;
diff --git a/src/test/modules/test_copy_format/test_copy_format.c
b/src/test/modules/test_copy_format/test_copy_format.c
new file mode 100644
index 0000000000..8a584f4814
--- /dev/null
+++ b/src/test/modules/test_copy_format/test_copy_format.c
@@ -0,0 +1,70 @@
+/*--------------------------------------------------------------------------
+ *
+ * test_copy_format.c
+ *        Code for custom COPY format.
+ *
+ * Portions Copyright (c) 1996-2023, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *        src/test/modules/test_copy_format/test_copy_format.c
+ *
+ * -------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/table.h"
+#include "commands/copyapi.h"
+#include "fmgr.h"
+#include "utils/rel.h"
+
+PG_MODULE_MAGIC;
+
+static void
+testfmt_copyto_start(CopyToState cstate, TupleDesc tupDesc)
+{
+    ereport(NOTICE,
+            (errmsg("testfmt_copyto_start called")));
+}
+
+static void
+testfmt_copyto_onerow(CopyToState cstate, TupleTableSlot *slot)
+{
+    ereport(NOTICE,
+            (errmsg("testfmt_copyto_onerow called")));
+}
+
+static void
+testfmt_copyto_end(CopyToState cstate)
+{
+    ereport(NOTICE,
+            (errmsg("testfmt_copyto_end called")));
+}
+
+PG_FUNCTION_INFO_V1(copy_testfmt_handler);
+Datum
+copy_testfmt_handler(PG_FUNCTION_ARGS)
+{
+    bool        is_from = PG_GETARG_BOOL(0);
+    CopyFormatRoutine *cp = makeNode(CopyFormatRoutine);;
+
+    ereport(NOTICE,
+            (errmsg("testfmt_handler called with is_from %d", is_from)));
+
+    cp->is_from = is_from;
+    if (!is_from)
+    {
+        CopyToFormatRoutine *cpt = makeNode(CopyToFormatRoutine);
+
+        cpt->start_fn = testfmt_copyto_start;
+        cpt->onerow_fn = testfmt_copyto_onerow;
+        cpt->end_fn = testfmt_copyto_end;
+
+        cp->routine = (Node *) cpt;
+    }
+    else
+        elog(ERROR, "custom COPY format \"testfmt\" does not support COPY FROM");
+
+    PG_RETURN_POINTER(cp);
+}
diff --git a/src/test/modules/test_copy_format/test_copy_format.control
b/src/test/modules/test_copy_format/test_copy_format.control
new file mode 100644
index 0000000000..57e0ef9d91
--- /dev/null
+++ b/src/test/modules/test_copy_format/test_copy_format.control
@@ -0,0 +1,4 @@
+comment = 'Test code for COPY format'
+default_version = '1.0'
+module_pathname = '$libdir/test_copy_format'
+relocatable = true
			
		On Thu, Dec 21, 2023 at 06:35:04PM +0900, Sutou Kouhei wrote:
>    * If we just require "copy_to_${FORMAT}(internal)"
>      function and "copy_from_${FORMAT}(internal)" function,
>      we can remove the tricky approach. And it also avoid
>      name collisions with other handler such as tablesample
>      handler.
>      See also:
>
https://www.postgresql.org/message-id/flat/20231214.184414.2179134502876898942.kou%40clear-code.com#af71f364d0a9f5c144e45b447e5c16c9
Hmm.  I prefer the unique name approach for the COPY portions without
enforcing any naming policy on the function names returning the
handlers, actually, though I can see your point.
> 2. Need an opaque space like IndexScanDesc::opaque does
>
>    * A custom COPY TO handler needs to keep its data
Sounds useful to me to have a private area passed down to the
callbacks.
> 3. Export CopySend*()
>
>    * If we like minimum API, we just need to export
>      CopySendData() and CopySendEndOfRow(). But
>      CopySend{String,Char,Int32,Int16}() will be convenient
>      custom COPY TO handlers. (A custom COPY TO handler for
>      Apache Arrow doesn't need them.)
Hmm.  Not sure on this one.  This may come down to externalize the
manipulation of fe_msgbuf.  Particularly, could it be possible that
some custom formats don't care at all about the network order?
> Questions:
>
> 1. What value should be used for "format" in
>    PgMsg_CopyOutResponse message?
>
>    It's 1 for binary format and 0 for text/csv format.
>
>    Should we make it customizable by custom COPY TO handler?
>    If so, what value should be used for this?
Interesting point.  It looks very tempting to give more flexibility to
people who'd like to use their own code as we have one byte in the
protocol but just use 0/1.  Hence it feels natural to have a callback
for that.
It also means that we may want to think harder about copy_is_binary in
libpq in the future step.  Now, having a backend implementation does
not need any libpq bits, either, because a client stack may just want
to speak the Postgres protocol directly.  Perhaps a custom COPY
implementation would be OK with how things are in libpq, as well,
tweaking its way through with just text or binary.
> 2. Do we need more tries for design discussion for the first
>    implementation? If we need, what should we try?
A makeNode() is used with an allocation in the current memory context
in the function returning the handler.  I would have assume that this
stuff returns a handler as a const struct like table AMs.
--
Michael
			
		Вложения
On Fri, Dec 22, 2023 at 10:00 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, Dec 21, 2023 at 06:35:04PM +0900, Sutou Kouhei wrote:
> >    * If we just require "copy_to_${FORMAT}(internal)"
> >      function and "copy_from_${FORMAT}(internal)" function,
> >      we can remove the tricky approach. And it also avoid
> >      name collisions with other handler such as tablesample
> >      handler.
> >      See also:
> >
https://www.postgresql.org/message-id/flat/20231214.184414.2179134502876898942.kou%40clear-code.com#af71f364d0a9f5c144e45b447e5c16c9
>
> Hmm.  I prefer the unique name approach for the COPY portions without
> enforcing any naming policy on the function names returning the
> handlers, actually, though I can see your point.
Yeah, another idea is to provide support functions to return a
CopyFormatRoutine wrapping either CopyToFormatRoutine or
CopyFromFormatRoutine. For example:
extern CopyFormatRoutine *MakeCopyToFormatRoutine(const
CopyToFormatRoutine *routine);
extensions can do like:
static const CopyToFormatRoutine testfmt_handler = {
    .type = T_CopyToFormatRoutine,
    .start_fn = testfmt_copyto_start,
    .onerow_fn = testfmt_copyto_onerow,
    .end_fn = testfmt_copyto_end
};
Datum
copy_testfmt_handler(PG_FUNCTION_ARGS)
{
    CopyFormatRoutine *routine = MakeCopyToFormatRoutine(&testfmt_handler);
    :
>
> > 2. Need an opaque space like IndexScanDesc::opaque does
> >
> >    * A custom COPY TO handler needs to keep its data
>
> Sounds useful to me to have a private area passed down to the
> callbacks.
>
+1
>
> > Questions:
> >
> > 1. What value should be used for "format" in
> >    PgMsg_CopyOutResponse message?
> >
> >    It's 1 for binary format and 0 for text/csv format.
> >
> >    Should we make it customizable by custom COPY TO handler?
> >    If so, what value should be used for this?
>
> Interesting point.  It looks very tempting to give more flexibility to
> people who'd like to use their own code as we have one byte in the
> protocol but just use 0/1.  Hence it feels natural to have a callback
> for that.
+1
>
> It also means that we may want to think harder about copy_is_binary in
> libpq in the future step.  Now, having a backend implementation does
> not need any libpq bits, either, because a client stack may just want
> to speak the Postgres protocol directly.  Perhaps a custom COPY
> implementation would be OK with how things are in libpq, as well,
> tweaking its way through with just text or binary.
>
> > 2. Do we need more tries for design discussion for the first
> >    implementation? If we need, what should we try?
>
> A makeNode() is used with an allocation in the current memory context
> in the function returning the handler.  I would have assume that this
> stuff returns a handler as a const struct like table AMs.
+1
The example I mentioned above does that.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
			
		On Thu, Dec 21, 2023 at 6:35 PM Sutou Kouhei <kou@clear-code.com> wrote:
>
> Hi,
>
> In <CAD21AoCunywHird3GaPzWe6s9JG1wzxj3Cr6vGN36DDheGjOjA@mail.gmail.com>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 11 Dec 2023 23:31:29 +0900,
>   Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> > I've sketched the above idea including a test module in
> > src/test/module/test_copy_format, based on v2 patch. It's not splitted
> > and is dirty so just for discussion.
>
> I implemented a sample COPY TO handler for Apache Arrow that
> supports only integer and text.
>
> I needed to extend the patch:
>
> 1. Add an opaque space for custom COPY TO handler
>    * Add CopyToState{Get,Set}Opaque()
>    https://github.com/kou/postgres/commit/5a610b6a066243f971e029432db67152cfe5e944
>
> 2. Export CopyToState::attnumlist
>    * Add CopyToStateGetAttNumList()
>    https://github.com/kou/postgres/commit/15fcba8b4e95afa86edb3f677a7bdb1acb1e7688
I think we can move CopyToState to copy.h and we don't need to have
set/get functions for its fields.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
			
		On Thu, Dec 21, 2023 at 5:35 PM Sutou Kouhei <kou@clear-code.com> wrote:
>
> Hi,
>
> In <CAD21AoCunywHird3GaPzWe6s9JG1wzxj3Cr6vGN36DDheGjOjA@mail.gmail.com>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 11 Dec 2023 23:31:29 +0900,
>   Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> > I've sketched the above idea including a test module in
> > src/test/module/test_copy_format, based on v2 patch. It's not splitted
> > and is dirty so just for discussion.
>
> I implemented a sample COPY TO handler for Apache Arrow that
> supports only integer and text.
>
> I needed to extend the patch:
>
> 1. Add an opaque space for custom COPY TO handler
>    * Add CopyToState{Get,Set}Opaque()
>    https://github.com/kou/postgres/commit/5a610b6a066243f971e029432db67152cfe5e944
>
> 2. Export CopyToState::attnumlist
>    * Add CopyToStateGetAttNumList()
>    https://github.com/kou/postgres/commit/15fcba8b4e95afa86edb3f677a7bdb1acb1e7688
>
> 3. Export CopySend*()
>    * Rename CopySend*() to CopyToStateSend*() and export them
>    * Exception: CopySendEndOfRow() to CopyToStateFlush() because
>      it just flushes the internal buffer now.
>    https://github.com/kou/postgres/commit/289a5640135bde6733a1b8e2c412221ad522901e
>
I guess the purpose of these helpers is to avoid expose CopyToState to
copy.h, but I
think expose CopyToState to user might make life easier, users might want to use
the memory contexts of the structure (though I agree not all the
fields are necessary
for extension handers).
> The attached patch is based on the Sawada-san's patch and
> includes the above changes. Note that this patch is also
> dirty so just for discussion.
>
> My suggestions from this experience:
>
> 1. Split COPY handler to COPY TO handler and COPY FROM handler
>
>    * CopyFormatRoutine is a bit tricky. An extension needs
>      to create a CopyFormatRoutine node and
>      a CopyToFormatRoutine node.
>
>    * If we just require "copy_to_${FORMAT}(internal)"
>      function and "copy_from_${FORMAT}(internal)" function,
>      we can remove the tricky approach. And it also avoid
>      name collisions with other handler such as tablesample
>      handler.
>      See also:
>
https://www.postgresql.org/message-id/flat/20231214.184414.2179134502876898942.kou%40clear-code.com#af71f364d0a9f5c144e45b447e5c16c9
>
> 2. Need an opaque space like IndexScanDesc::opaque does
>
>    * A custom COPY TO handler needs to keep its data
I once thought users might want to parse their own options, maybe this
is a use case for this opaque space.
For the name, I thought private_data might be a better candidate than
opaque, but I do not insist.
>
> 3. Export CopySend*()
>
>    * If we like minimum API, we just need to export
>      CopySendData() and CopySendEndOfRow(). But
>      CopySend{String,Char,Int32,Int16}() will be convenient
>      custom COPY TO handlers. (A custom COPY TO handler for
>      Apache Arrow doesn't need them.)
Do you use the arrow library to control the memory? Is there a way that
we can let the arrow use postgres' memory context? I'm not sure this
is necessary, just raise the question for discussion.
>
> Questions:
>
> 1. What value should be used for "format" in
>    PgMsg_CopyOutResponse message?
>
>
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/commands/copyto.c;h=c66a047c4a79cc614784610f385f1cd0935350f3;hb=9ca6e7b9411e36488ef539a2c1f6846ac92a7072#l144
>
>    It's 1 for binary format and 0 for text/csv format.
>
>    Should we make it customizable by custom COPY TO handler?
>    If so, what value should be used for this?
>
> 2. Do we need more tries for design discussion for the first
>    implementation? If we need, what should we try?
>
>
> Thanks,
> --
> kou
+PG_FUNCTION_INFO_V1(copy_testfmt_handler);
+Datum
+copy_testfmt_handler(PG_FUNCTION_ARGS)
+{
+ bool is_from = PG_GETARG_BOOL(0);
+ CopyFormatRoutine *cp = makeNode(CopyFormatRoutine);;
+
extra semicolon.
--
Regards
Junwang Zhao
			
		Hi,
In <ZYTfqGppMc9e_w2k@paquier.xyz>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 22 Dec 2023 10:00:24 +0900,
  Michael Paquier <michael@paquier.xyz> wrote:
>> 3. Export CopySend*()
>> 
>>    * If we like minimum API, we just need to export
>>      CopySendData() and CopySendEndOfRow(). But
>>      CopySend{String,Char,Int32,Int16}() will be convenient
>>      custom COPY TO handlers. (A custom COPY TO handler for
>>      Apache Arrow doesn't need them.)
> 
> Hmm.  Not sure on this one.  This may come down to externalize the
> manipulation of fe_msgbuf.  Particularly, could it be possible that
> some custom formats don't care at all about the network order?
It means that all custom formats should control byte order
by themselves instead of using CopySendInt*() that always
use network byte order, right? It makes sense. Let's export
only CopySendData() and CopySendEndOfRow().
>> 1. What value should be used for "format" in
>>    PgMsg_CopyOutResponse message?
>> 
>>    It's 1 for binary format and 0 for text/csv format.
>> 
>>    Should we make it customizable by custom COPY TO handler?
>>    If so, what value should be used for this?
> 
> Interesting point.  It looks very tempting to give more flexibility to
> people who'd like to use their own code as we have one byte in the
> protocol but just use 0/1.  Hence it feels natural to have a callback
> for that.
OK. Let's add a callback something like:
typedef int16 (*CopyToGetFormat_function) (CopyToState cstate);
> It also means that we may want to think harder about copy_is_binary in
> libpq in the future step.  Now, having a backend implementation does
> not need any libpq bits, either, because a client stack may just want
> to speak the Postgres protocol directly.  Perhaps a custom COPY
> implementation would be OK with how things are in libpq, as well,
> tweaking its way through with just text or binary.
Can we defer this discussion after we commit a basic custom
COPY format handler mechanism?
>> 2. Do we need more tries for design discussion for the first
>>    implementation? If we need, what should we try?
> 
> A makeNode() is used with an allocation in the current memory context
> in the function returning the handler.  I would have assume that this
> stuff returns a handler as a const struct like table AMs.
If we use this approach, we can't use the Sawada-san's
idea[1] that provides a convenient API to hide
CopyFormatRoutine internal. The idea provides
MakeCopy{To,From}FormatRoutine(). They return a new
CopyFormatRoutine* with suitable is_from member. They can't
use static const CopyFormatRoutine because they may be called
multiple times in the same process.
We can use the satic const struct approach by choosing one
of the followings:
1. Use separated function for COPY {TO,FROM} format handlers
   as I suggested.
2. Don't provide convenient API. Developers construct
   CopyFormatRoutine by themselves. But it may be a bit
   tricky.
3. Similar to 2. but don't use a bit tricky approach (don't
   embed Copy{To,From}FormatRoutine nodes into
   CopyFormatRoutine).
   Use unified function for COPY {TO,FROM} format handlers
   but CopyFormatRoutine always have both of COPY {TO,FROM}
   format routines and these routines aren't nodes:
   typedef struct CopyToFormatRoutine
   {
           CopyToStart_function start_fn;
           CopyToOneRow_function onerow_fn;
           CopyToEnd_function end_fn;
   } CopyToFormatRoutine;
   /* XXX: just copied from COPY TO routines */
   typedef struct CopyFromFormatRoutine
   {
           CopyFromStart_function start_fn;
           CopyFromOneRow_function onerow_fn;
           CopyFromEnd_function end_fn;
   } CopyFromFormatRoutine;
   typedef struct CopyFormatRoutine
   {
           NodeTag        type;
           CopyToFormatRoutine       to_routine;
           CopyFromFormatRoutine       from_routine;
   } CopyFormatRoutine;
   ----
   static const CopyFormatRoutine testfmt_handler = {
       .type = T_CopyFormatRoutine,
       .to_routine = {
           .start_fn = testfmt_copyto_start,
           .onerow_fn = testfmt_copyto_onerow,
           .end_fn = testfmt_copyto_end,
       },
       .from_routine = {
           .start_fn = testfmt_copyfrom_start,
           .onerow_fn = testfmt_copyfrom_onerow,
           .end_fn = testfmt_copyfrom_end,
       },
   };
   PG_FUNCTION_INFO_V1(copy_testfmt_handler);
   Datum
   copy_testfmt_handler(PG_FUNCTION_ARGS)
   {
           PG_RETURN_POINTER(&testfmt_handler);
   }
4. ... other idea?
[1]
https://www.postgresql.org/message-id/flat/CAD21AoDs9cOjuVbA_krGizAdc50KE%2BFjAuEXWF0NZwbMnc7F3Q%40mail.gmail.com#71bb03d9237252382b245dd33e705a3a
Thanks,
-- 
kou
			
		Hi,
In <CAD21AoD=UapH4Wh06G6H5XAzPJ0iJg9YcW8r7E2UEJkZ8QsosA@mail.gmail.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 22 Dec 2023 10:48:18 +0900,
  Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> I needed to extend the patch:
>>
>> 1. Add an opaque space for custom COPY TO handler
>>    * Add CopyToState{Get,Set}Opaque()
>>    https://github.com/kou/postgres/commit/5a610b6a066243f971e029432db67152cfe5e944
>>
>> 2. Export CopyToState::attnumlist
>>    * Add CopyToStateGetAttNumList()
>>    https://github.com/kou/postgres/commit/15fcba8b4e95afa86edb3f677a7bdb1acb1e7688
> 
> I think we can move CopyToState to copy.h and we don't need to have
> set/get functions for its fields.
I don't object the idea if other PostgreSQL developers
prefer the approach. Is there any PostgreSQL developer who
objects that we export Copy{To,From}StateData as public API?
Thanks,
-- 
kou
			
		Hi,
In <CAEG8a3+jG_NKOUmcxDyEX2xSggBXReZ4H=e3RFsUtedY88A03w@mail.gmail.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 22 Dec 2023 10:58:05 +0800,
  Junwang Zhao <zhjwpku@gmail.com> wrote:
>> 1. Add an opaque space for custom COPY TO handler
>>    * Add CopyToState{Get,Set}Opaque()
>>    https://github.com/kou/postgres/commit/5a610b6a066243f971e029432db67152cfe5e944
>>
>> 2. Export CopyToState::attnumlist
>>    * Add CopyToStateGetAttNumList()
>>    https://github.com/kou/postgres/commit/15fcba8b4e95afa86edb3f677a7bdb1acb1e7688
>>
>> 3. Export CopySend*()
>>    * Rename CopySend*() to CopyToStateSend*() and export them
>>    * Exception: CopySendEndOfRow() to CopyToStateFlush() because
>>      it just flushes the internal buffer now.
>>    https://github.com/kou/postgres/commit/289a5640135bde6733a1b8e2c412221ad522901e
>>
> I guess the purpose of these helpers is to avoid expose CopyToState to
> copy.h,
Yes.
>         but I
> think expose CopyToState to user might make life easier, users might want to use
> the memory contexts of the structure (though I agree not all the
> fields are necessary
> for extension handers).
OK. I don't object it as I said in another e-mail:
https://www.postgresql.org/message-id/flat/20240110.120644.1876591646729327180.kou%40clear-code.com#d923173e9625c20319750155083cbd72
>> 2. Need an opaque space like IndexScanDesc::opaque does
>>
>>    * A custom COPY TO handler needs to keep its data
> 
> I once thought users might want to parse their own options, maybe this
> is a use case for this opaque space.
Good catch! I forgot to suggest a callback for custom format
options. How about the following API?
----
...
typedef bool (*CopyToProcessOption_function) (CopyToState cstate, DefElem *defel);
...
typedef bool (*CopyFromProcessOption_function) (CopyFromState cstate, DefElem *defel);
typedef struct CopyToFormatRoutine
{
    ...
    CopyToProcessOption_function process_option_fn;
} CopyToFormatRoutine;
typedef struct CopyFromFormatRoutine
{
    ...
    CopyFromProcessOption_function process_option_fn;
} CopyFromFormatRoutine;
----
----
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index e7597894bf..1aa8b62551 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -416,6 +416,7 @@ void
 ProcessCopyOptions(ParseState *pstate,
                    CopyFormatOptions *opts_out,
                    bool is_from,
+                   void *cstate, /* CopyToState* for !is_from, CopyFromState* for is_from */
                    List *options)
 {
     bool        format_specified = false;
@@ -593,11 +594,19 @@ ProcessCopyOptions(ParseState *pstate,
                          parser_errposition(pstate, defel->location)));
         }
         else
-            ereport(ERROR,
-                    (errcode(ERRCODE_SYNTAX_ERROR),
-                     errmsg("option \"%s\" not recognized",
-                            defel->defname),
-                     parser_errposition(pstate, defel->location)));
+        {
+            bool processed;
+            if (is_from)
+                processed = opts_out->from_ops->process_option_fn(cstate, defel);
+            else
+                processed = opts_out->to_ops->process_option_fn(cstate, defel);
+            if (!processed)
+                ereport(ERROR,
+                        (errcode(ERRCODE_SYNTAX_ERROR),
+                         errmsg("option \"%s\" not recognized",
+                                defel->defname),
+                         parser_errposition(pstate, defel->location)));
+        }
     }
 
     /*
----
> For the name, I thought private_data might be a better candidate than
> opaque, but I do not insist.
I don't have a strong opinion for this. Here are the number
of headers that use "private_data" and "opaque":
$ grep -r private_data --files-with-matches src/include | wc -l
6
$ grep -r opaque --files-with-matches src/include | wc -l
38
It seems that we use "opaque" than "private_data" in general.
but it seems that we use
"opaque" than "private_data" in our code.
> Do you use the arrow library to control the memory?
Yes.
>                                                     Is there a way that
> we can let the arrow use postgres' memory context?
Yes. Apache Arrow C++ provides a memory pool feature and we
can implement PostgreSQL's memory context based memory pool
for this. (But this is a custom COPY TO/FROM handler's
implementation details.)
>                                                    I'm not sure this
> is necessary, just raise the question for discussion.
Could you clarify what should we discuss? We should require
that COPY TO/FROM handlers should use PostgreSQL's memory
context for all internal memory allocations?
> +PG_FUNCTION_INFO_V1(copy_testfmt_handler);
> +Datum
> +copy_testfmt_handler(PG_FUNCTION_ARGS)
> +{
> + bool is_from = PG_GETARG_BOOL(0);
> + CopyFormatRoutine *cp = makeNode(CopyFormatRoutine);;
> +
> 
> extra semicolon.
I noticed it too :-)
But I ignored it because the current implementation is only
for discussion. We know that it may be dirty.
Thanks,
-- 
kou
			
		On Wed, Jan 10, 2024 at 12:00 PM Sutou Kouhei <kou@clear-code.com> wrote:
>
> Hi,
>
> In <ZYTfqGppMc9e_w2k@paquier.xyz>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 22 Dec 2023 10:00:24 +0900,
>   Michael Paquier <michael@paquier.xyz> wrote:
>
> >> 3. Export CopySend*()
> >>
> >>    * If we like minimum API, we just need to export
> >>      CopySendData() and CopySendEndOfRow(). But
> >>      CopySend{String,Char,Int32,Int16}() will be convenient
> >>      custom COPY TO handlers. (A custom COPY TO handler for
> >>      Apache Arrow doesn't need them.)
> >
> > Hmm.  Not sure on this one.  This may come down to externalize the
> > manipulation of fe_msgbuf.  Particularly, could it be possible that
> > some custom formats don't care at all about the network order?
>
> It means that all custom formats should control byte order
> by themselves instead of using CopySendInt*() that always
> use network byte order, right? It makes sense. Let's export
> only CopySendData() and CopySendEndOfRow().
>
>
> >> 1. What value should be used for "format" in
> >>    PgMsg_CopyOutResponse message?
> >>
> >>    It's 1 for binary format and 0 for text/csv format.
> >>
> >>    Should we make it customizable by custom COPY TO handler?
> >>    If so, what value should be used for this?
> >
> > Interesting point.  It looks very tempting to give more flexibility to
> > people who'd like to use their own code as we have one byte in the
> > protocol but just use 0/1.  Hence it feels natural to have a callback
> > for that.
>
> OK. Let's add a callback something like:
>
> typedef int16 (*CopyToGetFormat_function) (CopyToState cstate);
>
> > It also means that we may want to think harder about copy_is_binary in
> > libpq in the future step.  Now, having a backend implementation does
> > not need any libpq bits, either, because a client stack may just want
> > to speak the Postgres protocol directly.  Perhaps a custom COPY
> > implementation would be OK with how things are in libpq, as well,
> > tweaking its way through with just text or binary.
>
> Can we defer this discussion after we commit a basic custom
> COPY format handler mechanism?
>
> >> 2. Do we need more tries for design discussion for the first
> >>    implementation? If we need, what should we try?
> >
> > A makeNode() is used with an allocation in the current memory context
> > in the function returning the handler.  I would have assume that this
> > stuff returns a handler as a const struct like table AMs.
>
> If we use this approach, we can't use the Sawada-san's
> idea[1] that provides a convenient API to hide
> CopyFormatRoutine internal. The idea provides
> MakeCopy{To,From}FormatRoutine(). They return a new
> CopyFormatRoutine* with suitable is_from member. They can't
> use static const CopyFormatRoutine because they may be called
> multiple times in the same process.
>
> We can use the satic const struct approach by choosing one
> of the followings:
>
> 1. Use separated function for COPY {TO,FROM} format handlers
>    as I suggested.
>
> 2. Don't provide convenient API. Developers construct
>    CopyFormatRoutine by themselves. But it may be a bit
>    tricky.
>
> 3. Similar to 2. but don't use a bit tricky approach (don't
>    embed Copy{To,From}FormatRoutine nodes into
>    CopyFormatRoutine).
>
>    Use unified function for COPY {TO,FROM} format handlers
>    but CopyFormatRoutine always have both of COPY {TO,FROM}
>    format routines and these routines aren't nodes:
>
>    typedef struct CopyToFormatRoutine
>    {
>            CopyToStart_function start_fn;
>            CopyToOneRow_function onerow_fn;
>            CopyToEnd_function end_fn;
>    } CopyToFormatRoutine;
>
>    /* XXX: just copied from COPY TO routines */
>    typedef struct CopyFromFormatRoutine
>    {
>            CopyFromStart_function start_fn;
>            CopyFromOneRow_function onerow_fn;
>            CopyFromEnd_function end_fn;
>    } CopyFromFormatRoutine;
>
>    typedef struct CopyFormatRoutine
>    {
>            NodeTag              type;
>
>            CopyToFormatRoutine     to_routine;
>            CopyFromFormatRoutine           from_routine;
>    } CopyFormatRoutine;
>
>    ----
>
>    static const CopyFormatRoutine testfmt_handler = {
>        .type = T_CopyFormatRoutine,
>        .to_routine = {
>            .start_fn = testfmt_copyto_start,
>            .onerow_fn = testfmt_copyto_onerow,
>            .end_fn = testfmt_copyto_end,
>        },
>        .from_routine = {
>            .start_fn = testfmt_copyfrom_start,
>            .onerow_fn = testfmt_copyfrom_onerow,
>            .end_fn = testfmt_copyfrom_end,
>        },
>    };
>
>    PG_FUNCTION_INFO_V1(copy_testfmt_handler);
>    Datum
>    copy_testfmt_handler(PG_FUNCTION_ARGS)
>    {
>            PG_RETURN_POINTER(&testfmt_handler);
>    }
>
> 4. ... other idea?
It's a just idea but the fourth idea is to provide a convenient macro
to make it easy to construct the CopyFormatRoutine. For example,
#define COPYTO_ROUTINE(...) (Node *) &(CopyToFormatRoutine) {__VA_ARGS__}
static const CopyFormatRoutine testfmt_copyto_handler = {
    .type = T_CopyFormatRoutine,
    .is_from = true,
    .routine = COPYTO_ROUTINE (
        .start_fn = testfmt_copyto_start,
        .onerow_fn = testfmt_copyto_onerow,
        .end_fn = testfmt_copyto_end
        )
};
Datum
copy_testfmt_handler(PG_FUNCTION_ARGS)
{
    PG_RETURN_POINTER(& testfmt_copyto_handler);
}
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
			
		Hi,
In <CAD21AoC_dhfS97DKwTL+2nvgBOYrmN9XVYrE8w2SuDgghb-yzg@mail.gmail.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Wed, 10 Jan 2024 15:33:22 +0900,
  Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> We can use the satic const struct approach by choosing one
>> of the followings:
>>
>> ...
>>
>> 4. ... other idea?
> 
> It's a just idea but the fourth idea is to provide a convenient macro
> to make it easy to construct the CopyFormatRoutine. For example,
> 
> #define COPYTO_ROUTINE(...) (Node *) &(CopyToFormatRoutine) {__VA_ARGS__}
> 
> static const CopyFormatRoutine testfmt_copyto_handler = {
>     .type = T_CopyFormatRoutine,
>     .is_from = true,
>     .routine = COPYTO_ROUTINE (
>         .start_fn = testfmt_copyto_start,
>         .onerow_fn = testfmt_copyto_onerow,
>         .end_fn = testfmt_copyto_end
>         )
> };
> 
> Datum
> copy_testfmt_handler(PG_FUNCTION_ARGS)
> {
>     PG_RETURN_POINTER(& testfmt_copyto_handler);
> }
Interesting. But I feel that it introduces another (a bit)
tricky mechanism...
BTW, we also need to set .type:
     .routine = COPYTO_ROUTINE (
         .type = T_CopyToFormatRoutine,
         .start_fn = testfmt_copyto_start,
         .onerow_fn = testfmt_copyto_onerow,
         .end_fn = testfmt_copyto_end
         )
Thanks,
-- 
kou
			
		On Wed, Jan 10, 2024 at 3:40 PM Sutou Kouhei <kou@clear-code.com> wrote:
>
> Hi,
>
> In <CAD21AoC_dhfS97DKwTL+2nvgBOYrmN9XVYrE8w2SuDgghb-yzg@mail.gmail.com>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Wed, 10 Jan 2024 15:33:22 +0900,
>   Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> >> We can use the satic const struct approach by choosing one
> >> of the followings:
> >>
> >> ...
> >>
> >> 4. ... other idea?
> >
> > It's a just idea but the fourth idea is to provide a convenient macro
> > to make it easy to construct the CopyFormatRoutine. For example,
> >
> > #define COPYTO_ROUTINE(...) (Node *) &(CopyToFormatRoutine) {__VA_ARGS__}
> >
> > static const CopyFormatRoutine testfmt_copyto_handler = {
> >     .type = T_CopyFormatRoutine,
> >     .is_from = true,
> >     .routine = COPYTO_ROUTINE (
> >         .start_fn = testfmt_copyto_start,
> >         .onerow_fn = testfmt_copyto_onerow,
> >         .end_fn = testfmt_copyto_end
> >         )
> > };
> >
> > Datum
> > copy_testfmt_handler(PG_FUNCTION_ARGS)
> > {
> >     PG_RETURN_POINTER(& testfmt_copyto_handler);
> > }
>
> Interesting. But I feel that it introduces another (a bit)
> tricky mechanism...
Right. On the other hand, I don't think the idea 3 is good for the
same reason Michael-san pointed out before[1][2].
>
> BTW, we also need to set .type:
>
>      .routine = COPYTO_ROUTINE (
>          .type = T_CopyToFormatRoutine,
>          .start_fn = testfmt_copyto_start,
>          .onerow_fn = testfmt_copyto_onerow,
>          .end_fn = testfmt_copyto_end
>          )
I think it's fine as the same is true for table AM.
[1] https://www.postgresql.org/message-id/ZXEUIy6wl4jHy6Nm%40paquier.xyz
[2] https://www.postgresql.org/message-id/ZXKm9tmnSPIVrqZz%40paquier.xyz
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
			
		Hi, In <CAD21AoC4HVuxOrsX1fLwj=5hdEmjvZoQw6PJGzxqxHNnYSQUVQ@mail.gmail.com> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Wed, 10 Jan 2024 16:53:48 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> Interesting. But I feel that it introduces another (a bit) >> tricky mechanism... > > Right. On the other hand, I don't think the idea 3 is good for the > same reason Michael-san pointed out before[1][2]. > > [1] https://www.postgresql.org/message-id/ZXEUIy6wl4jHy6Nm%40paquier.xyz > [2] https://www.postgresql.org/message-id/ZXKm9tmnSPIVrqZz%40paquier.xyz I think that the important part of the Michael-san's opinion is "keep COPY TO implementation and COPY FROM implementation separated for maintainability". The patch focused in [1][2] uses one routine for both of COPY TO and COPY FROM. If we use the approach, we need to change one common routine from copyto.c and copyfrom.c (or export callbacks from copyto.c and copyfrom.c and use them in copyto.c to construct one common routine). It's the problem. The idea 3 still has separated routines for COPY TO and COPY FROM. So I think that it still keeps COPY TO implementation and COPY FROM implementation separated. >> BTW, we also need to set .type: >> >> .routine = COPYTO_ROUTINE ( >> .type = T_CopyToFormatRoutine, >> .start_fn = testfmt_copyto_start, >> .onerow_fn = testfmt_copyto_onerow, >> .end_fn = testfmt_copyto_end >> ) > > I think it's fine as the same is true for table AM. Ah, sorry. I should have said explicitly. I don't this that it's not a problem. I just wanted to say that it's missing. Defining one more static const struct instead of providing a convenient (but a bit tricky) macro may be straightforward: static const CopyToFormatRoutine testfmt_copyto_routine = { .type = T_CopyToFormatRoutine, .start_fn = testfmt_copyto_start, .onerow_fn = testfmt_copyto_onerow, .end_fn = testfmt_copyto_end }; static const CopyFormatRoutine testfmt_copyto_handler = { .type = T_CopyFormatRoutine, .is_from = false, .routine = (Node *) &testfmt_copyto_routine }; Thanks, -- kou
Hi,
Here is the current summary for a this discussion to make
COPY format extendable. It's for reaching consensus and
starting implementing the feature. (I'll start implementing
the feature once we reach consensus.) If you have any
opinion, please share it.
Confirmed:
1.1 Making COPY format extendable will not reduce performance.
    [1]
Decisions:
2.1 Use separated handler for COPY TO and COPY FROM because
    our COPY TO implementation (copyto.c) and COPY FROM
    implementation (coypfrom.c) are separated.
    [2]
2.2 Don't use system catalog for COPY TO/FROM handlers. We can
    just use a function(internal) that returns a handler instead.
    [3]
2.3 The implementation must include documentation.
    [5]
2.4 The implementation must include test.
    [6]
2.5 The implementation should be consist of small patches
    for easy to review.
    [6]
2.7 Copy{To,From}State must have a opaque space for
    handlers.
    [8]
2.8 Export CopySendData() and CopySendEndOfRow() for COPY TO
    handlers.
    [8]
2.9 Make "format" in PgMsg_CopyOutResponse message
    extendable.
    [9]
2.10 Make makeNode() call avoidable in function(internal)
     that returns COPY TO/FROM handler.
     [9]
2.11 Custom COPY TO/FROM handlers must be able to parse
     their options.
     [11]
Discussing:
3.1 Should we use one function(internal) for COPY TO/FROM
    handlers or two function(internal)s (one is for COPY TO
    handler and another is for COPY FROM handler)?
    [4]
3.2 If we use separated function(internal) for COPY TO/FROM
    handlers, we need to define naming rule. For example,
    <method_name>_to(internal) for COPY TO handler and
    <method_name>_from(internal) for COPY FROM handler.
    [4]
3.3 Should we use prefix or suffix for function(internal)
    name to avoid name conflict with other handlers such as
    tablesample handlers?
    [7]
3.4 Should we export Copy{To,From}State? Or should we just
    provide getters/setters to access Copy{To,From}State
    internal?
    [10]
[1] https://www.postgresql.org/message-id/flat/20231204.153548.2126325458835528809.kou%40clear-code.com
[2] https://www.postgresql.org/message-id/flat/ZXEUIy6wl4jHy6Nm%40paquier.xyz
[3] https://www.postgresql.org/message-id/flat/CAD21AoAhcZkAp_WDJ4sSv_%2Bg2iCGjfyMFgeu7MxjnjX_FutZAg%40mail.gmail.com
[4] https://www.postgresql.org/message-id/flat/CAD21AoDkoGL6yJ_HjNOg9cU%3DaAdW8uQ3rSQOeRS0SX85LPPNwQ%40mail.gmail.com
[5]
https://www.postgresql.org/message-id/flat/TY3PR01MB9889C9234CD220A3A7075F0DF589A%40TY3PR01MB9889.jpnprd01.prod.outlook.com
[6] https://www.postgresql.org/message-id/flat/ZXbiPNriHHyUrcTF%40paquier.xyz
[7] https://www.postgresql.org/message-id/flat/20231214.184414.2179134502876898942.kou%40clear-code.com
[8] https://www.postgresql.org/message-id/flat/20231221.183504.1240642084042888377.kou%40clear-code.com
[9] https://www.postgresql.org/message-id/flat/ZYTfqGppMc9e_w2k%40paquier.xyz
[10] https://www.postgresql.org/message-id/flat/CAD21AoD%3DUapH4Wh06G6H5XAzPJ0iJg9YcW8r7E2UEJkZ8QsosA%40mail.gmail.com
[11] https://www.postgresql.org/message-id/flat/20240110.152023.1920937326588672387.kou%40clear-code.com
Thanks,
-- 
kou
			
		Hi,
On Wed, Jan 10, 2024 at 2:20 PM Sutou Kouhei <kou@clear-code.com> wrote:
>
> Hi,
>
> In <CAEG8a3+jG_NKOUmcxDyEX2xSggBXReZ4H=e3RFsUtedY88A03w@mail.gmail.com>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 22 Dec 2023 10:58:05 +0800,
>   Junwang Zhao <zhjwpku@gmail.com> wrote:
>
> >> 1. Add an opaque space for custom COPY TO handler
> >>    * Add CopyToState{Get,Set}Opaque()
> >>    https://github.com/kou/postgres/commit/5a610b6a066243f971e029432db67152cfe5e944
> >>
> >> 2. Export CopyToState::attnumlist
> >>    * Add CopyToStateGetAttNumList()
> >>    https://github.com/kou/postgres/commit/15fcba8b4e95afa86edb3f677a7bdb1acb1e7688
> >>
> >> 3. Export CopySend*()
> >>    * Rename CopySend*() to CopyToStateSend*() and export them
> >>    * Exception: CopySendEndOfRow() to CopyToStateFlush() because
> >>      it just flushes the internal buffer now.
> >>    https://github.com/kou/postgres/commit/289a5640135bde6733a1b8e2c412221ad522901e
> >>
> > I guess the purpose of these helpers is to avoid expose CopyToState to
> > copy.h,
>
> Yes.
>
> >         but I
> > think expose CopyToState to user might make life easier, users might want to use
> > the memory contexts of the structure (though I agree not all the
> > fields are necessary
> > for extension handers).
>
> OK. I don't object it as I said in another e-mail:
>
https://www.postgresql.org/message-id/flat/20240110.120644.1876591646729327180.kou%40clear-code.com#d923173e9625c20319750155083cbd72
>
> >> 2. Need an opaque space like IndexScanDesc::opaque does
> >>
> >>    * A custom COPY TO handler needs to keep its data
> >
> > I once thought users might want to parse their own options, maybe this
> > is a use case for this opaque space.
>
> Good catch! I forgot to suggest a callback for custom format
> options. How about the following API?
>
> ----
> ...
> typedef bool (*CopyToProcessOption_function) (CopyToState cstate, DefElem *defel);
>
> ...
> typedef bool (*CopyFromProcessOption_function) (CopyFromState cstate, DefElem *defel);
>
> typedef struct CopyToFormatRoutine
> {
>         ...
>         CopyToProcessOption_function process_option_fn;
> } CopyToFormatRoutine;
>
> typedef struct CopyFromFormatRoutine
> {
>         ...
>         CopyFromProcessOption_function process_option_fn;
> } CopyFromFormatRoutine;
> ----
>
> ----
> diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
> index e7597894bf..1aa8b62551 100644
> --- a/src/backend/commands/copy.c
> +++ b/src/backend/commands/copy.c
> @@ -416,6 +416,7 @@ void
>  ProcessCopyOptions(ParseState *pstate,
>                                    CopyFormatOptions *opts_out,
>                                    bool is_from,
> +                                  void *cstate, /* CopyToState* for !is_from, CopyFromState* for is_from */
>                                    List *options)
>  {
>         bool            format_specified = false;
> @@ -593,11 +594,19 @@ ProcessCopyOptions(ParseState *pstate,
>                                                  parser_errposition(pstate, defel->location)));
>                 }
>                 else
> -                       ereport(ERROR,
> -                                       (errcode(ERRCODE_SYNTAX_ERROR),
> -                                        errmsg("option \"%s\" not recognized",
> -                                                       defel->defname),
> -                                        parser_errposition(pstate, defel->location)));
> +               {
> +                       bool processed;
> +                       if (is_from)
> +                               processed = opts_out->from_ops->process_option_fn(cstate, defel);
> +                       else
> +                               processed = opts_out->to_ops->process_option_fn(cstate, defel);
> +                       if (!processed)
> +                               ereport(ERROR,
> +                                               (errcode(ERRCODE_SYNTAX_ERROR),
> +                                                errmsg("option \"%s\" not recognized",
> +                                                               defel->defname),
> +                                                parser_errposition(pstate, defel->location)));
> +               }
>         }
>
>         /*
> ----
Looks good.
>
> > For the name, I thought private_data might be a better candidate than
> > opaque, but I do not insist.
>
> I don't have a strong opinion for this. Here are the number
> of headers that use "private_data" and "opaque":
>
> $ grep -r private_data --files-with-matches src/include | wc -l
> 6
> $ grep -r opaque --files-with-matches src/include | wc -l
> 38
>
> It seems that we use "opaque" than "private_data" in general.
>
> but it seems that we use
> "opaque" than "private_data" in our code.
>
> > Do you use the arrow library to control the memory?
>
> Yes.
>
> >                                                     Is there a way that
> > we can let the arrow use postgres' memory context?
>
> Yes. Apache Arrow C++ provides a memory pool feature and we
> can implement PostgreSQL's memory context based memory pool
> for this. (But this is a custom COPY TO/FROM handler's
> implementation details.)
>
> >                                                    I'm not sure this
> > is necessary, just raise the question for discussion.
>
> Could you clarify what should we discuss? We should require
> that COPY TO/FROM handlers should use PostgreSQL's memory
> context for all internal memory allocations?
Yes, handlers should use PostgreSQL's memory context, and I think
creating other memory context under CopyToStateData.copycontext
should be suggested for handler creators, so I proposed exporting
CopyToStateData to public header.
>
> > +PG_FUNCTION_INFO_V1(copy_testfmt_handler);
> > +Datum
> > +copy_testfmt_handler(PG_FUNCTION_ARGS)
> > +{
> > + bool is_from = PG_GETARG_BOOL(0);
> > + CopyFormatRoutine *cp = makeNode(CopyFormatRoutine);;
> > +
> >
> > extra semicolon.
>
> I noticed it too :-)
> But I ignored it because the current implementation is only
> for discussion. We know that it may be dirty.
>
>
> Thanks,
> --
> kou
--
Regards
Junwang Zhao
			
		Hi, In <CAEG8a3J02NzGBxG1rP9C4u7qRLOqUjSOdy3q5_5v__fydS3XcA@mail.gmail.com> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 12 Jan 2024 14:40:41 +0800, Junwang Zhao <zhjwpku@gmail.com> wrote: >> Could you clarify what should we discuss? We should require >> that COPY TO/FROM handlers should use PostgreSQL's memory >> context for all internal memory allocations? > > Yes, handlers should use PostgreSQL's memory context, and I think > creating other memory context under CopyToStateData.copycontext > should be suggested for handler creators, so I proposed exporting > CopyToStateData to public header. I see. We can provide a getter for CopyToStateData::copycontext if we don't want to export CopyToStateData. Note that I don't have a strong opinion whether we should export CopyToStateData or not. Thanks, -- kou
Hi,
If there are no more comments for the current design, I'll
start implementing this feature with the following
approaches for "Discussing" items:
> 3.1 Should we use one function(internal) for COPY TO/FROM
>     handlers or two function(internal)s (one is for COPY TO
>     handler and another is for COPY FROM handler)?
>     [4]
I'll choose "one function(internal) for COPY TO/FROM handlers".
> 3.4 Should we export Copy{To,From}State? Or should we just
>     provide getters/setters to access Copy{To,From}State
>     internal?
>     [10]
I'll export Copy{To,From}State.
Thanks,
-- 
kou
In <20240112.144615.157925223373344229.kou@clear-code.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 12 Jan 2024 14:46:15 +0900 (JST),
  Sutou Kouhei <kou@clear-code.com> wrote:
> Hi,
> 
> Here is the current summary for a this discussion to make
> COPY format extendable. It's for reaching consensus and
> starting implementing the feature. (I'll start implementing
> the feature once we reach consensus.) If you have any
> opinion, please share it.
> 
> Confirmed:
> 
> 1.1 Making COPY format extendable will not reduce performance.
>     [1]
> 
> Decisions:
> 
> 2.1 Use separated handler for COPY TO and COPY FROM because
>     our COPY TO implementation (copyto.c) and COPY FROM
>     implementation (coypfrom.c) are separated.
>     [2]
> 
> 2.2 Don't use system catalog for COPY TO/FROM handlers. We can
>     just use a function(internal) that returns a handler instead.
>     [3]
> 
> 2.3 The implementation must include documentation.
>     [5]
> 
> 2.4 The implementation must include test.
>     [6]
> 
> 2.5 The implementation should be consist of small patches
>     for easy to review.
>     [6]
> 
> 2.7 Copy{To,From}State must have a opaque space for
>     handlers.
>     [8]
> 
> 2.8 Export CopySendData() and CopySendEndOfRow() for COPY TO
>     handlers.
>     [8]
> 
> 2.9 Make "format" in PgMsg_CopyOutResponse message
>     extendable.
>     [9]
> 
> 2.10 Make makeNode() call avoidable in function(internal)
>      that returns COPY TO/FROM handler.
>      [9]
> 
> 2.11 Custom COPY TO/FROM handlers must be able to parse
>      their options.
>      [11]
> 
> Discussing:
> 
> 3.1 Should we use one function(internal) for COPY TO/FROM
>     handlers or two function(internal)s (one is for COPY TO
>     handler and another is for COPY FROM handler)?
>     [4]
> 
> 3.2 If we use separated function(internal) for COPY TO/FROM
>     handlers, we need to define naming rule. For example,
>     <method_name>_to(internal) for COPY TO handler and
>     <method_name>_from(internal) for COPY FROM handler.
>     [4]
> 
> 3.3 Should we use prefix or suffix for function(internal)
>     name to avoid name conflict with other handlers such as
>     tablesample handlers?
>     [7]
> 
> 3.4 Should we export Copy{To,From}State? Or should we just
>     provide getters/setters to access Copy{To,From}State
>     internal?
>     [10]
> 
> 
> [1] https://www.postgresql.org/message-id/flat/20231204.153548.2126325458835528809.kou%40clear-code.com
> [2] https://www.postgresql.org/message-id/flat/ZXEUIy6wl4jHy6Nm%40paquier.xyz
> [3]
https://www.postgresql.org/message-id/flat/CAD21AoAhcZkAp_WDJ4sSv_%2Bg2iCGjfyMFgeu7MxjnjX_FutZAg%40mail.gmail.com
> [4]
https://www.postgresql.org/message-id/flat/CAD21AoDkoGL6yJ_HjNOg9cU%3DaAdW8uQ3rSQOeRS0SX85LPPNwQ%40mail.gmail.com
> [5]
https://www.postgresql.org/message-id/flat/TY3PR01MB9889C9234CD220A3A7075F0DF589A%40TY3PR01MB9889.jpnprd01.prod.outlook.com
> [6] https://www.postgresql.org/message-id/flat/ZXbiPNriHHyUrcTF%40paquier.xyz
> [7] https://www.postgresql.org/message-id/flat/20231214.184414.2179134502876898942.kou%40clear-code.com
> [8] https://www.postgresql.org/message-id/flat/20231221.183504.1240642084042888377.kou%40clear-code.com
> [9] https://www.postgresql.org/message-id/flat/ZYTfqGppMc9e_w2k%40paquier.xyz
> [10]
https://www.postgresql.org/message-id/flat/CAD21AoD%3DUapH4Wh06G6H5XAzPJ0iJg9YcW8r7E2UEJkZ8QsosA%40mail.gmail.com
> [11] https://www.postgresql.org/message-id/flat/20240110.152023.1920937326588672387.kou%40clear-code.com
> 
> 
> Thanks,
> -- 
> kou
> 
> 
			
		On Thu, Jan 11, 2024 at 10:24 AM Sutou Kouhei <kou@clear-code.com> wrote: > > Hi, > > In <CAD21AoC4HVuxOrsX1fLwj=5hdEmjvZoQw6PJGzxqxHNnYSQUVQ@mail.gmail.com> > "Re: Make COPY format extendable: Extract COPY TO format implementations" on Wed, 10 Jan 2024 16:53:48 +0900, > Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > >> Interesting. But I feel that it introduces another (a bit) > >> tricky mechanism... > > > > Right. On the other hand, I don't think the idea 3 is good for the > > same reason Michael-san pointed out before[1][2]. > > > > [1] https://www.postgresql.org/message-id/ZXEUIy6wl4jHy6Nm%40paquier.xyz > > [2] https://www.postgresql.org/message-id/ZXKm9tmnSPIVrqZz%40paquier.xyz > > I think that the important part of the Michael-san's opinion > is "keep COPY TO implementation and COPY FROM implementation > separated for maintainability". > > The patch focused in [1][2] uses one routine for both of > COPY TO and COPY FROM. If we use the approach, we need to > change one common routine from copyto.c and copyfrom.c (or > export callbacks from copyto.c and copyfrom.c and use them > in copyto.c to construct one common routine). It's > the problem. > > The idea 3 still has separated routines for COPY TO and COPY > FROM. So I think that it still keeps COPY TO implementation > and COPY FROM implementation separated. > > >> BTW, we also need to set .type: > >> > >> .routine = COPYTO_ROUTINE ( > >> .type = T_CopyToFormatRoutine, > >> .start_fn = testfmt_copyto_start, > >> .onerow_fn = testfmt_copyto_onerow, > >> .end_fn = testfmt_copyto_end > >> ) > > > > I think it's fine as the same is true for table AM. > > Ah, sorry. I should have said explicitly. I don't this that > it's not a problem. I just wanted to say that it's missing. Thank you for pointing it out. > > > Defining one more static const struct instead of providing a > convenient (but a bit tricky) macro may be straightforward: > > static const CopyToFormatRoutine testfmt_copyto_routine = { > .type = T_CopyToFormatRoutine, > .start_fn = testfmt_copyto_start, > .onerow_fn = testfmt_copyto_onerow, > .end_fn = testfmt_copyto_end > }; > > static const CopyFormatRoutine testfmt_copyto_handler = { > .type = T_CopyFormatRoutine, > .is_from = false, > .routine = (Node *) &testfmt_copyto_routine > }; Yeah, IIUC this is the option 2 you mentioned[1]. I think we can go with this idea as it's the simplest. If we find a better way, we can change it later. That is CopyFormatRoutine will be like: typedef struct CopyFormatRoutine { NodeTag type; /* either CopyToFormatRoutine or CopyFromFormatRoutine */ Node *routine; } CopyFormatRoutine; And the core can check the node type of the 'routine7 in the CopyFormatRoutine returned by extensions. Regards, [1] https://www.postgresql.org/message-id/20240110.120034.501385498034538233.kou%40clear-code.com Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Hi,
In <CAD21AoB5x86TTyer90iSFivnSD8MFRU8V4ALzmQ=rQFw4QqiXQ@mail.gmail.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 15 Jan 2024 16:03:41 +0900,
  Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> Defining one more static const struct instead of providing a
>> convenient (but a bit tricky) macro may be straightforward:
>>
>> static const CopyToFormatRoutine testfmt_copyto_routine = {
>>     .type = T_CopyToFormatRoutine,
>>     .start_fn = testfmt_copyto_start,
>>     .onerow_fn = testfmt_copyto_onerow,
>>     .end_fn = testfmt_copyto_end
>> };
>>
>> static const CopyFormatRoutine testfmt_copyto_handler = {
>>     .type = T_CopyFormatRoutine,
>>     .is_from = false,
>>     .routine = (Node *) &testfmt_copyto_routine
>> };
> 
> Yeah, IIUC this is the option 2 you mentioned[1]. I think we can go
> with this idea as it's the simplest.
>
> [1] https://www.postgresql.org/message-id/20240110.120034.501385498034538233.kou%40clear-code.com
Ah, you're right. I forgot it...
>                  That is CopyFormatRoutine will be like:
> 
> typedef struct CopyFormatRoutine
> {
>     NodeTag     type;
> 
>     /* either CopyToFormatRoutine or CopyFromFormatRoutine */
>     Node       *routine;
> }           CopyFormatRoutine;
> 
> And the core can check the node type of the 'routine7 in the
> CopyFormatRoutine returned by extensions.
It makes sense.
If no more comments about the current design, I'll start
implementing this feature based on the current design.
Thanks,
-- 
kou
			
		Hi, I've implemented custom COPY format feature based on the current design discussion. See the attached patches for details. I also implemented a PoC COPY format handler for Apache Arrow with this implementation and it worked. https://github.com/kou/pg-copy-arrow The patches implement not only custom COPY TO format feature but also custom COPY FROM format feature. 0001-0004 is for COPY TO and 0005-0008 is for COPY FROM. For COPY TO: 0001: This adds CopyToRoutine and use it for text/csv/binary formats. No implementation change. This just move codes. 0002: This adds support for adding custom COPY TO format by "CREATE FUNCTION ${FORMAT_NAME}". This uses the same approach provided by Sawada-san[1] but this doesn't introduce a wrapper CopyRoutine struct for Copy{To,From}Routine. Because I noticed that a wrapper CopyRoutine struct is needless. Copy handler can just return CopyToRoutine or CopyFromRtouine because both of them have NodeTag. We can distinct a returned struct by the NodeTag. [1] https://www.postgresql.org/message-id/CAD21AoCunywHird3GaPzWe6s9JG1wzxj3Cr6vGN36DDheGjOjA@mail.gmail.com 0003: This exports CopyToStateData. No implementation change except CopyDest enum values. I changed COPY_ prefix to COPY_DEST_ to avoid name conflict with CopySource enum values. This just moves codes. 0004: This adds CopyToState::opaque and exports CopySendEndOfRow(). CopySendEndOfRow() is renamed to CopyToStateFlush(). For COPY FROM: 0005: Same as 0001 but for COPY FROM. This adds CopyFromRoutine and use it for text/csv/binary formats. No implementation change. This just move codes. 0006: Same as 0002 but for COPY FROM. This adds support for adding custom COPY FROM format by "CREATE FUNCTION ${FORMAT_NAME}". 0007: Same as 0003 but for COPY FROM. This exports CopyFromStateData. No implementation change except CopySource enum values. I changed COPY_ prefix to COPY_SOURCE_ to align CopyDest changes in 0003. This just moves codes. 0008: Same as 0004 but for COPY FROM. This adds CopyFromState::opaque and exports CopyReadBinaryData(). CopyReadBinaryData() is renamed to CopyFromStateRead(). Thanks, -- kou In <20240115.152702.2011620917962812379.kou@clear-code.com> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 15 Jan 2024 15:27:02 +0900 (JST), Sutou Kouhei <kou@clear-code.com> wrote: > Hi, > > If there are no more comments for the current design, I'll > start implementing this feature with the following > approaches for "Discussing" items: > >> 3.1 Should we use one function(internal) for COPY TO/FROM >> handlers or two function(internal)s (one is for COPY TO >> handler and another is for COPY FROM handler)? >> [4] > > I'll choose "one function(internal) for COPY TO/FROM handlers". > >> 3.4 Should we export Copy{To,From}State? Or should we just >> provide getters/setters to access Copy{To,From}State >> internal? >> [10] > > I'll export Copy{To,From}State. > > > Thanks, > -- > kou > > In <20240112.144615.157925223373344229.kou@clear-code.com> > "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 12 Jan 2024 14:46:15 +0900 (JST), > Sutou Kouhei <kou@clear-code.com> wrote: > >> Hi, >> >> Here is the current summary for a this discussion to make >> COPY format extendable. It's for reaching consensus and >> starting implementing the feature. (I'll start implementing >> the feature once we reach consensus.) If you have any >> opinion, please share it. >> >> Confirmed: >> >> 1.1 Making COPY format extendable will not reduce performance. >> [1] >> >> Decisions: >> >> 2.1 Use separated handler for COPY TO and COPY FROM because >> our COPY TO implementation (copyto.c) and COPY FROM >> implementation (coypfrom.c) are separated. >> [2] >> >> 2.2 Don't use system catalog for COPY TO/FROM handlers. We can >> just use a function(internal) that returns a handler instead. >> [3] >> >> 2.3 The implementation must include documentation. >> [5] >> >> 2.4 The implementation must include test. >> [6] >> >> 2.5 The implementation should be consist of small patches >> for easy to review. >> [6] >> >> 2.7 Copy{To,From}State must have a opaque space for >> handlers. >> [8] >> >> 2.8 Export CopySendData() and CopySendEndOfRow() for COPY TO >> handlers. >> [8] >> >> 2.9 Make "format" in PgMsg_CopyOutResponse message >> extendable. >> [9] >> >> 2.10 Make makeNode() call avoidable in function(internal) >> that returns COPY TO/FROM handler. >> [9] >> >> 2.11 Custom COPY TO/FROM handlers must be able to parse >> their options. >> [11] >> >> Discussing: >> >> 3.1 Should we use one function(internal) for COPY TO/FROM >> handlers or two function(internal)s (one is for COPY TO >> handler and another is for COPY FROM handler)? >> [4] >> >> 3.2 If we use separated function(internal) for COPY TO/FROM >> handlers, we need to define naming rule. For example, >> <method_name>_to(internal) for COPY TO handler and >> <method_name>_from(internal) for COPY FROM handler. >> [4] >> >> 3.3 Should we use prefix or suffix for function(internal) >> name to avoid name conflict with other handlers such as >> tablesample handlers? >> [7] >> >> 3.4 Should we export Copy{To,From}State? Or should we just >> provide getters/setters to access Copy{To,From}State >> internal? >> [10] >> >> >> [1] https://www.postgresql.org/message-id/flat/20231204.153548.2126325458835528809.kou%40clear-code.com >> [2] https://www.postgresql.org/message-id/flat/ZXEUIy6wl4jHy6Nm%40paquier.xyz >> [3] https://www.postgresql.org/message-id/flat/CAD21AoAhcZkAp_WDJ4sSv_%2Bg2iCGjfyMFgeu7MxjnjX_FutZAg%40mail.gmail.com >> [4] https://www.postgresql.org/message-id/flat/CAD21AoDkoGL6yJ_HjNOg9cU%3DaAdW8uQ3rSQOeRS0SX85LPPNwQ%40mail.gmail.com >> [5] https://www.postgresql.org/message-id/flat/TY3PR01MB9889C9234CD220A3A7075F0DF589A%40TY3PR01MB9889.jpnprd01.prod.outlook.com >> [6] https://www.postgresql.org/message-id/flat/ZXbiPNriHHyUrcTF%40paquier.xyz >> [7] https://www.postgresql.org/message-id/flat/20231214.184414.2179134502876898942.kou%40clear-code.com >> [8] https://www.postgresql.org/message-id/flat/20231221.183504.1240642084042888377.kou%40clear-code.com >> [9] https://www.postgresql.org/message-id/flat/ZYTfqGppMc9e_w2k%40paquier.xyz >> [10] https://www.postgresql.org/message-id/flat/CAD21AoD%3DUapH4Wh06G6H5XAzPJ0iJg9YcW8r7E2UEJkZ8QsosA%40mail.gmail.com >> [11] https://www.postgresql.org/message-id/flat/20240110.152023.1920937326588672387.kou%40clear-code.com >> >> >> Thanks, >> -- >> kou >> >> > > From 3444b523aa356417f4cb3ec0c78894de65684889 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Mon, 4 Dec 2023 12:32:54 +0900 Subject: [PATCH v6 1/8] Extract COPY TO format implementations This is a part of making COPY format extendable. See also these past discussions: * New Copy Formats - avro/orc/parquet: https://www.postgresql.org/message-id/flat/20180210151304.fonjztsynewldfba%40gmail.com * Make COPY extendable in order to support Parquet and other formats: https://www.postgresql.org/message-id/flat/CAJ7c6TM6Bz1c3F04Cy6%2BSzuWfKmr0kU8c_3Stnvh_8BR0D6k8Q%40mail.gmail.com This doesn't change the current behavior. This just introduces CopyToRoutine, which just has function pointers of format implementation like TupleTableSlotOps, and use it for existing "text", "csv" and "binary" format implementations. Note that CopyToRoutine can't be used from extensions yet because CopySend*() aren't exported yet. Extensions can't send formatted data to a destination without CopySend*(). They will be exported by subsequent patches. Here is a benchmark result with/without this change because there was a discussion that we should care about performance regression: https://www.postgresql.org/message-id/3741749.1655952719%40sss.pgh.pa.us > I think that step 1 ought to be to convert the existing formats into > plug-ins, and demonstrate that there's no significant loss of > performance. You can see that there is no significant loss of performance: Data: Random 32 bit integers: CREATE TABLE data (int32 integer); INSERT INTO data SELECT random() * 10000 FROM generate_series(1, ${n_records}); The number of records: 100K, 1M and 10M 100K without this change: format,elapsed time (ms) text,11.002 csv,11.696 binary,11.352 100K with this change: format,elapsed time (ms) text,100000,11.562 csv,100000,11.889 binary,100000,10.825 1M without this change: format,elapsed time (ms) text,108.359 csv,114.233 binary,111.251 1M with this change: format,elapsed time (ms) text,111.269 csv,116.277 binary,104.765 10M without this change: format,elapsed time (ms) text,1090.763 csv,1136.103 binary,1137.141 10M with this change: format,elapsed time (ms) text,1082.654 csv,1196.991 binary,1069.697 --- contrib/file_fdw/file_fdw.c | 2 +- src/backend/commands/copy.c | 43 +++- src/backend/commands/copyfrom.c | 2 +- src/backend/commands/copyto.c | 428 ++++++++++++++++++++------------ src/include/commands/copy.h | 7 +- src/include/commands/copyapi.h | 59 +++++ 6 files changed, 376 insertions(+), 165 deletions(-) create mode 100644 src/include/commands/copyapi.h diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c index 249d82d3a0..9e4e819858 100644 --- a/contrib/file_fdw/file_fdw.c +++ b/contrib/file_fdw/file_fdw.c @@ -329,7 +329,7 @@ file_fdw_validator(PG_FUNCTION_ARGS) /* * Now apply the core COPY code's validation logic for more checks. */ - ProcessCopyOptions(NULL, NULL, true, other_options); + ProcessCopyOptions(NULL, NULL, true, NULL, other_options); /* * Either filename or program option is required for file_fdw foreign diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index cc0786c6f4..5f3697a5f9 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -442,6 +442,9 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from) * a list of options. In that usage, 'opts_out' can be passed as NULL and * the collected data is just leaked until CurrentMemoryContext is reset. * + * 'cstate' is CopyToState* for !is_from, CopyFromState* for is_from. 'cstate' + * may be NULL. For example, file_fdw uses NULL. + * * Note that additional checking, such as whether column names listed in FORCE * QUOTE actually exist, has to be applied later. This just checks for * self-consistency of the options list. @@ -450,6 +453,7 @@ void ProcessCopyOptions(ParseState *pstate, CopyFormatOptions *opts_out, bool is_from, + void *cstate, List *options) { bool format_specified = false; @@ -464,7 +468,13 @@ ProcessCopyOptions(ParseState *pstate, opts_out->file_encoding = -1; - /* Extract options from the statement node tree */ + /* Text is the default format. */ + opts_out->to_routine = &CopyToRoutineText; + + /* + * Extract only the "format" option to detect target routine as the first + * step + */ foreach(option, options) { DefElem *defel = lfirst_node(DefElem, option); @@ -479,15 +489,29 @@ ProcessCopyOptions(ParseState *pstate, if (strcmp(fmt, "text") == 0) /* default format */ ; else if (strcmp(fmt, "csv") == 0) + { opts_out->csv_mode = true; + opts_out->to_routine = &CopyToRoutineCSV; + } else if (strcmp(fmt, "binary") == 0) + { opts_out->binary = true; + opts_out->to_routine = &CopyToRoutineBinary; + } else ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("COPY format \"%s\" not recognized", fmt), parser_errposition(pstate, defel->location))); } + } + /* Extract options except "format" from the statement node tree */ + foreach(option, options) + { + DefElem *defel = lfirst_node(DefElem, option); + + if (strcmp(defel->defname, "format") == 0) + continue; else if (strcmp(defel->defname, "freeze") == 0) { if (freeze_specified) @@ -616,11 +640,18 @@ ProcessCopyOptions(ParseState *pstate, opts_out->on_error = defGetCopyOnErrorChoice(defel, pstate, is_from); } else - ereport(ERROR, - (errcode(ERRCODE_SYNTAX_ERROR), - errmsg("option \"%s\" not recognized", - defel->defname), - parser_errposition(pstate, defel->location))); + { + bool processed = false; + + if (!is_from) + processed = opts_out->to_routine->CopyToProcessOption(cstate, defel); + if (!processed) + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("option \"%s\" not recognized", + defel->defname), + parser_errposition(pstate, defel->location))); + } } /* diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index 173a736ad5..05b3d13236 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -1411,7 +1411,7 @@ BeginCopyFrom(ParseState *pstate, oldcontext = MemoryContextSwitchTo(cstate->copycontext); /* Extract options from the statement node tree */ - ProcessCopyOptions(pstate, &cstate->opts, true /* is_from */ , options); + ProcessCopyOptions(pstate, &cstate->opts, true /* is_from */ , cstate, options); /* Process the target relation */ cstate->rel = rel; diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c index d3dc3fc854..6547b7c654 100644 --- a/src/backend/commands/copyto.c +++ b/src/backend/commands/copyto.c @@ -131,6 +131,275 @@ static void CopySendEndOfRow(CopyToState cstate); static void CopySendInt32(CopyToState cstate, int32 val); static void CopySendInt16(CopyToState cstate, int16 val); +/* + * CopyToRoutine implementations. + */ + +/* + * CopyToRoutine implementation for "text" and "csv". CopyToText*() + * refer cstate->opts.csv_mode and change their behavior. We can split this + * implementation and stop referring cstate->opts.csv_mode later. + */ + +/* All "text" and "csv" options are parsed in ProcessCopyOptions(). We may + * move the code to here later. */ +static bool +CopyToTextProcessOption(CopyToState cstate, DefElem *defel) +{ + return false; +} + +static int16 +CopyToTextGetFormat(CopyToState cstate) +{ + return 0; +} + +static void +CopyToTextSendEndOfRow(CopyToState cstate) +{ + switch (cstate->copy_dest) + { + case COPY_FILE: + /* Default line termination depends on platform */ +#ifndef WIN32 + CopySendChar(cstate, '\n'); +#else + CopySendString(cstate, "\r\n"); +#endif + break; + case COPY_FRONTEND: + /* The FE/BE protocol uses \n as newline for all platforms */ + CopySendChar(cstate, '\n'); + break; + default: + break; + } + CopySendEndOfRow(cstate); +} + +static void +CopyToTextStart(CopyToState cstate, TupleDesc tupDesc) +{ + int num_phys_attrs; + ListCell *cur; + + num_phys_attrs = tupDesc->natts; + /* Get info about the columns we need to process. */ + cstate->out_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo)); + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + Oid out_func_oid; + bool isvarlena; + Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1); + + getTypeOutputInfo(attr->atttypid, &out_func_oid, &isvarlena); + fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]); + } + + /* + * For non-binary copy, we need to convert null_print to file encoding, + * because it will be sent directly with CopySendString. + */ + if (cstate->need_transcoding) + cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print, + cstate->opts.null_print_len, + cstate->file_encoding); + + /* if a header has been requested send the line */ + if (cstate->opts.header_line) + { + bool hdr_delim = false; + + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + char *colname; + + if (hdr_delim) + CopySendChar(cstate, cstate->opts.delim[0]); + hdr_delim = true; + + colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname); + + if (cstate->opts.csv_mode) + CopyAttributeOutCSV(cstate, colname, false, + list_length(cstate->attnumlist) == 1); + else + CopyAttributeOutText(cstate, colname); + } + + CopyToTextSendEndOfRow(cstate); + } +} + +static void +CopyToTextOneRow(CopyToState cstate, TupleTableSlot *slot) +{ + bool need_delim = false; + FmgrInfo *out_functions = cstate->out_functions; + ListCell *cur; + + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + Datum value = slot->tts_values[attnum - 1]; + bool isnull = slot->tts_isnull[attnum - 1]; + + if (need_delim) + CopySendChar(cstate, cstate->opts.delim[0]); + need_delim = true; + + if (isnull) + { + CopySendString(cstate, cstate->opts.null_print_client); + } + else + { + char *string; + + string = OutputFunctionCall(&out_functions[attnum - 1], value); + if (cstate->opts.csv_mode) + CopyAttributeOutCSV(cstate, string, + cstate->opts.force_quote_flags[attnum - 1], + list_length(cstate->attnumlist) == 1); + else + CopyAttributeOutText(cstate, string); + } + } + + CopyToTextSendEndOfRow(cstate); +} + +static void +CopyToTextEnd(CopyToState cstate) +{ +} + +/* + * CopyToRoutine implementation for "binary". + */ + +/* All "binary" options are parsed in ProcessCopyOptions(). We may move the + * code to here later. */ +static bool +CopyToBinaryProcessOption(CopyToState cstate, DefElem *defel) +{ + return false; +} + +static int16 +CopyToBinaryGetFormat(CopyToState cstate) +{ + return 1; +} + +static void +CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc) +{ + int num_phys_attrs; + ListCell *cur; + + num_phys_attrs = tupDesc->natts; + /* Get info about the columns we need to process. */ + cstate->out_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo)); + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + Oid out_func_oid; + bool isvarlena; + Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1); + + getTypeBinaryOutputInfo(attr->atttypid, &out_func_oid, &isvarlena); + fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]); + } + + { + /* Generate header for a binary copy */ + int32 tmp; + + /* Signature */ + CopySendData(cstate, BinarySignature, 11); + /* Flags field */ + tmp = 0; + CopySendInt32(cstate, tmp); + /* No header extension */ + tmp = 0; + CopySendInt32(cstate, tmp); + } +} + +static void +CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot) +{ + FmgrInfo *out_functions = cstate->out_functions; + ListCell *cur; + + /* Binary per-tuple header */ + CopySendInt16(cstate, list_length(cstate->attnumlist)); + + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + Datum value = slot->tts_values[attnum - 1]; + bool isnull = slot->tts_isnull[attnum - 1]; + + if (isnull) + { + CopySendInt32(cstate, -1); + } + else + { + bytea *outputbytes; + + outputbytes = SendFunctionCall(&out_functions[attnum - 1], value); + CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ); + CopySendData(cstate, VARDATA(outputbytes), + VARSIZE(outputbytes) - VARHDRSZ); + } + } + + CopySendEndOfRow(cstate); +} + +static void +CopyToBinaryEnd(CopyToState cstate) +{ + /* Generate trailer for a binary copy */ + CopySendInt16(cstate, -1); + /* Need to flush out the trailer */ + CopySendEndOfRow(cstate); +} + +CopyToRoutine CopyToRoutineText = { + .CopyToProcessOption = CopyToTextProcessOption, + .CopyToGetFormat = CopyToTextGetFormat, + .CopyToStart = CopyToTextStart, + .CopyToOneRow = CopyToTextOneRow, + .CopyToEnd = CopyToTextEnd, +}; + +/* + * We can use the same CopyToRoutine for both of "text" and "csv" because + * CopyToText*() refer cstate->opts.csv_mode and change their behavior. We can + * split the implementations and stop referring cstate->opts.csv_mode later. + */ +CopyToRoutine CopyToRoutineCSV = { + .CopyToProcessOption = CopyToTextProcessOption, + .CopyToGetFormat = CopyToTextGetFormat, + .CopyToStart = CopyToTextStart, + .CopyToOneRow = CopyToTextOneRow, + .CopyToEnd = CopyToTextEnd, +}; + +CopyToRoutine CopyToRoutineBinary = { + .CopyToProcessOption = CopyToBinaryProcessOption, + .CopyToGetFormat = CopyToBinaryGetFormat, + .CopyToStart = CopyToBinaryStart, + .CopyToOneRow = CopyToBinaryOneRow, + .CopyToEnd = CopyToBinaryEnd, +}; /* * Send copy start/stop messages for frontend copies. These have changed @@ -141,7 +410,7 @@ SendCopyBegin(CopyToState cstate) { StringInfoData buf; int natts = list_length(cstate->attnumlist); - int16 format = (cstate->opts.binary ? 1 : 0); + int16 format = cstate->opts.to_routine->CopyToGetFormat(cstate); int i; pq_beginmessage(&buf, PqMsg_CopyOutResponse); @@ -198,16 +467,6 @@ CopySendEndOfRow(CopyToState cstate) switch (cstate->copy_dest) { case COPY_FILE: - if (!cstate->opts.binary) - { - /* Default line termination depends on platform */ -#ifndef WIN32 - CopySendChar(cstate, '\n'); -#else - CopySendString(cstate, "\r\n"); -#endif - } - if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1, cstate->copy_file) != 1 || ferror(cstate->copy_file)) @@ -242,10 +501,6 @@ CopySendEndOfRow(CopyToState cstate) } break; case COPY_FRONTEND: - /* The FE/BE protocol uses \n as newline for all platforms */ - if (!cstate->opts.binary) - CopySendChar(cstate, '\n'); - /* Dump the accumulated row as one CopyData message */ (void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len); break; @@ -431,7 +686,7 @@ BeginCopyTo(ParseState *pstate, oldcontext = MemoryContextSwitchTo(cstate->copycontext); /* Extract options from the statement node tree */ - ProcessCopyOptions(pstate, &cstate->opts, false /* is_from */ , options); + ProcessCopyOptions(pstate, &cstate->opts, false /* is_from */ , cstate, options); /* Process the source/target relation or query */ if (rel) @@ -748,8 +1003,6 @@ DoCopyTo(CopyToState cstate) bool pipe = (cstate->filename == NULL && cstate->data_dest_cb == NULL); bool fe_copy = (pipe && whereToSendOutput == DestRemote); TupleDesc tupDesc; - int num_phys_attrs; - ListCell *cur; uint64 processed; if (fe_copy) @@ -759,32 +1012,11 @@ DoCopyTo(CopyToState cstate) tupDesc = RelationGetDescr(cstate->rel); else tupDesc = cstate->queryDesc->tupDesc; - num_phys_attrs = tupDesc->natts; cstate->opts.null_print_client = cstate->opts.null_print; /* default */ /* We use fe_msgbuf as a per-row buffer regardless of copy_dest */ cstate->fe_msgbuf = makeStringInfo(); - /* Get info about the columns we need to process. */ - cstate->out_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo)); - foreach(cur, cstate->attnumlist) - { - int attnum = lfirst_int(cur); - Oid out_func_oid; - bool isvarlena; - Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1); - - if (cstate->opts.binary) - getTypeBinaryOutputInfo(attr->atttypid, - &out_func_oid, - &isvarlena); - else - getTypeOutputInfo(attr->atttypid, - &out_func_oid, - &isvarlena); - fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]); - } - /* * Create a temporary memory context that we can reset once per row to * recover palloc'd memory. This avoids any problems with leaks inside @@ -795,57 +1027,7 @@ DoCopyTo(CopyToState cstate) "COPY TO", ALLOCSET_DEFAULT_SIZES); - if (cstate->opts.binary) - { - /* Generate header for a binary copy */ - int32 tmp; - - /* Signature */ - CopySendData(cstate, BinarySignature, 11); - /* Flags field */ - tmp = 0; - CopySendInt32(cstate, tmp); - /* No header extension */ - tmp = 0; - CopySendInt32(cstate, tmp); - } - else - { - /* - * For non-binary copy, we need to convert null_print to file - * encoding, because it will be sent directly with CopySendString. - */ - if (cstate->need_transcoding) - cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print, - cstate->opts.null_print_len, - cstate->file_encoding); - - /* if a header has been requested send the line */ - if (cstate->opts.header_line) - { - bool hdr_delim = false; - - foreach(cur, cstate->attnumlist) - { - int attnum = lfirst_int(cur); - char *colname; - - if (hdr_delim) - CopySendChar(cstate, cstate->opts.delim[0]); - hdr_delim = true; - - colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname); - - if (cstate->opts.csv_mode) - CopyAttributeOutCSV(cstate, colname, false, - list_length(cstate->attnumlist) == 1); - else - CopyAttributeOutText(cstate, colname); - } - - CopySendEndOfRow(cstate); - } - } + cstate->opts.to_routine->CopyToStart(cstate, tupDesc); if (cstate->rel) { @@ -884,13 +1066,7 @@ DoCopyTo(CopyToState cstate) processed = ((DR_copy *) cstate->queryDesc->dest)->processed; } - if (cstate->opts.binary) - { - /* Generate trailer for a binary copy */ - CopySendInt16(cstate, -1); - /* Need to flush out the trailer */ - CopySendEndOfRow(cstate); - } + cstate->opts.to_routine->CopyToEnd(cstate); MemoryContextDelete(cstate->rowcontext); @@ -906,71 +1082,15 @@ DoCopyTo(CopyToState cstate) static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot) { - bool need_delim = false; - FmgrInfo *out_functions = cstate->out_functions; MemoryContext oldcontext; - ListCell *cur; - char *string; MemoryContextReset(cstate->rowcontext); oldcontext = MemoryContextSwitchTo(cstate->rowcontext); - if (cstate->opts.binary) - { - /* Binary per-tuple header */ - CopySendInt16(cstate, list_length(cstate->attnumlist)); - } - /* Make sure the tuple is fully deconstructed */ slot_getallattrs(slot); - foreach(cur, cstate->attnumlist) - { - int attnum = lfirst_int(cur); - Datum value = slot->tts_values[attnum - 1]; - bool isnull = slot->tts_isnull[attnum - 1]; - - if (!cstate->opts.binary) - { - if (need_delim) - CopySendChar(cstate, cstate->opts.delim[0]); - need_delim = true; - } - - if (isnull) - { - if (!cstate->opts.binary) - CopySendString(cstate, cstate->opts.null_print_client); - else - CopySendInt32(cstate, -1); - } - else - { - if (!cstate->opts.binary) - { - string = OutputFunctionCall(&out_functions[attnum - 1], - value); - if (cstate->opts.csv_mode) - CopyAttributeOutCSV(cstate, string, - cstate->opts.force_quote_flags[attnum - 1], - list_length(cstate->attnumlist) == 1); - else - CopyAttributeOutText(cstate, string); - } - else - { - bytea *outputbytes; - - outputbytes = SendFunctionCall(&out_functions[attnum - 1], - value); - CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ); - CopySendData(cstate, VARDATA(outputbytes), - VARSIZE(outputbytes) - VARHDRSZ); - } - } - } - - CopySendEndOfRow(cstate); + cstate->opts.to_routine->CopyToOneRow(cstate, slot); MemoryContextSwitchTo(oldcontext); } diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h index b3da3cb0be..34bea880ca 100644 --- a/src/include/commands/copy.h +++ b/src/include/commands/copy.h @@ -14,6 +14,7 @@ #ifndef COPY_H #define COPY_H +#include "commands/copyapi.h" #include "nodes/execnodes.h" #include "nodes/parsenodes.h" #include "parser/parse_node.h" @@ -74,11 +75,11 @@ typedef struct CopyFormatOptions bool convert_selectively; /* do selective binary conversion? */ CopyOnErrorChoice on_error; /* what to do when error happened */ List *convert_select; /* list of column names (can be NIL) */ + CopyToRoutine *to_routine; /* callback routines for COPY TO */ } CopyFormatOptions; -/* These are private in commands/copy[from|to].c */ +/* This is private in commands/copyfrom.c */ typedef struct CopyFromStateData *CopyFromState; -typedef struct CopyToStateData *CopyToState; typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread); typedef void (*copy_data_dest_cb) (void *data, int len); @@ -87,7 +88,7 @@ extern void DoCopy(ParseState *pstate, const CopyStmt *stmt, int stmt_location, int stmt_len, uint64 *processed); -extern void ProcessCopyOptions(ParseState *pstate, CopyFormatOptions *opts_out, bool is_from, List *options); +extern void ProcessCopyOptions(ParseState *pstate, CopyFormatOptions *opts_out, bool is_from, void *cstate, List *options); extern CopyFromState BeginCopyFrom(ParseState *pstate, Relation rel, Node *whereClause, const char *filename, bool is_program, copy_data_source_cb data_source_cb, List *attnamelist, List *options); diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h new file mode 100644 index 0000000000..eb68f2fb7b --- /dev/null +++ b/src/include/commands/copyapi.h @@ -0,0 +1,59 @@ +/*------------------------------------------------------------------------- + * + * copyapi.h + * API for COPY TO/FROM handlers + * + * + * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + * src/include/commands/copyapi.h + * + *------------------------------------------------------------------------- + */ +#ifndef COPYAPI_H +#define COPYAPI_H + +#include "executor/tuptable.h" +#include "nodes/parsenodes.h" + +/* This is private in commands/copyto.c */ +typedef struct CopyToStateData *CopyToState; + +typedef bool (*CopyToProcessOption_function) (CopyToState cstate, DefElem *defel); +typedef int16 (*CopyToGetFormat_function) (CopyToState cstate); +typedef void (*CopyToStart_function) (CopyToState cstate, TupleDesc tupDesc); +typedef void (*CopyToOneRow_function) (CopyToState cstate, TupleTableSlot *slot); +typedef void (*CopyToEnd_function) (CopyToState cstate); + +/* Routines for a COPY TO format implementation. */ +typedef struct CopyToRoutine +{ + /* + * Called for processing one COPY TO option. This will return false when + * the given option is invalid. + */ + CopyToProcessOption_function CopyToProcessOption; + + /* + * Called when COPY TO is started. This will return a format as int16 + * value. It's used for the CopyOutResponse message. + */ + CopyToGetFormat_function CopyToGetFormat; + + /* Called when COPY TO is started. This will send a header. */ + CopyToStart_function CopyToStart; + + /* Copy one row for COPY TO. */ + CopyToOneRow_function CopyToOneRow; + + /* Called when COPY TO is ended. This will send a trailer. */ + CopyToEnd_function CopyToEnd; +} CopyToRoutine; + +/* Built-in CopyToRoutine for "text", "csv" and "binary". */ +extern CopyToRoutine CopyToRoutineText; +extern CopyToRoutine CopyToRoutineCSV; +extern CopyToRoutine CopyToRoutineBinary; + +#endif /* COPYAPI_H */ -- 2.43.0 From bd5848739465618fd31839b8ed34ad1cb95f6359 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Tue, 23 Jan 2024 13:58:38 +0900 Subject: [PATCH v6 2/8] Add support for adding custom COPY TO format This uses the handler approach like tablesample. The approach creates an internal function that returns an internal struct. In this case, a COPY TO handler returns a CopyToRoutine. We will add support for custom COPY FROM format later. We'll use the same handler for COPY TO and COPY FROM. PostgreSQL calls a COPY TO/FROM handler with "is_from" argument. It's true for COPY FROM and false for COPY TO: copy_handler(true) returns CopyToRoutine copy_handler(false) returns CopyFromRoutine (not exist yet) We discussed that we introduce a wrapper struct for it: typedef struct CopyRoutine { NodeTag type; /* either CopyToRoutine or CopyFromRoutine */ Node *routine; } copy_handler(true) returns CopyRoutine with CopyToRoutine copy_handler(false) returns CopyRoutine with CopyFromRoutine See also: https://www.postgresql.org/message-id/flat/CAD21AoCunywHird3GaPzWe6s9JG1wzxj3Cr6vGN36DDheGjOjA%40mail.gmail.com But I noticed that we don't need the wrapper struct. We can just CopyToRoutine or CopyFromRoutine. Because we can distinct the returned struct by checking its NodeTag. So I don't use the wrapper struct approach. --- src/backend/commands/copy.c | 84 ++++++++++++++----- src/backend/nodes/Makefile | 1 + src/backend/nodes/gen_node_support.pl | 2 + src/backend/utils/adt/pseudotypes.c | 1 + src/include/catalog/pg_proc.dat | 6 ++ src/include/catalog/pg_type.dat | 6 ++ src/include/commands/copyapi.h | 2 + src/include/nodes/meson.build | 1 + src/test/modules/Makefile | 1 + src/test/modules/meson.build | 1 + src/test/modules/test_copy_format/.gitignore | 4 + src/test/modules/test_copy_format/Makefile | 23 +++++ .../expected/test_copy_format.out | 17 ++++ src/test/modules/test_copy_format/meson.build | 33 ++++++++ .../test_copy_format/sql/test_copy_format.sql | 8 ++ .../test_copy_format--1.0.sql | 8 ++ .../test_copy_format/test_copy_format.c | 77 +++++++++++++++++ .../test_copy_format/test_copy_format.control | 4 + 18 files changed, 260 insertions(+), 19 deletions(-) mode change 100644 => 100755 src/backend/nodes/gen_node_support.pl create mode 100644 src/test/modules/test_copy_format/.gitignore create mode 100644 src/test/modules/test_copy_format/Makefile create mode 100644 src/test/modules/test_copy_format/expected/test_copy_format.out create mode 100644 src/test/modules/test_copy_format/meson.build create mode 100644 src/test/modules/test_copy_format/sql/test_copy_format.sql create mode 100644 src/test/modules/test_copy_format/test_copy_format--1.0.sql create mode 100644 src/test/modules/test_copy_format/test_copy_format.c create mode 100644 src/test/modules/test_copy_format/test_copy_format.control diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index 5f3697a5f9..6f0db0ae7c 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -32,6 +32,7 @@ #include "parser/parse_coerce.h" #include "parser/parse_collate.h" #include "parser/parse_expr.h" +#include "parser/parse_func.h" #include "parser/parse_relation.h" #include "rewrite/rewriteHandler.h" #include "utils/acl.h" @@ -430,6 +431,69 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from) return COPY_ON_ERROR_STOP; /* keep compiler quiet */ } +/* + * Process the "format" option. + * + * This function checks whether the option value is a built-in format such as + * "text" and "csv" or not. If the option value isn't a built-in format, this + * function finds a COPY format handler that returns a CopyToRoutine. If no + * COPY format handler is found, this function reports an error. + */ +static void +ProcessCopyOptionCustomFormat(ParseState *pstate, + CopyFormatOptions *opts_out, + bool is_from, + DefElem *defel) +{ + char *format; + Oid funcargtypes[1]; + Oid handlerOid = InvalidOid; + Datum datum; + void *routine; + + format = defGetString(defel); + + /* built-in formats */ + if (strcmp(format, "text") == 0) + /* default format */ return; + else if (strcmp(format, "csv") == 0) + { + opts_out->csv_mode = true; + opts_out->to_routine = &CopyToRoutineCSV; + return; + } + else if (strcmp(format, "binary") == 0) + { + opts_out->binary = true; + opts_out->to_routine = &CopyToRoutineBinary; + return; + } + + /* custom format */ + if (!is_from) + { + funcargtypes[0] = INTERNALOID; + handlerOid = LookupFuncName(list_make1(makeString(format)), 1, + funcargtypes, true); + } + if (!OidIsValid(handlerOid)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("COPY format \"%s\" not recognized", format), + parser_errposition(pstate, defel->location))); + + datum = OidFunctionCall1(handlerOid, BoolGetDatum(is_from)); + routine = DatumGetPointer(datum); + if (routine == NULL || !IsA(routine, CopyToRoutine)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("COPY handler function %s(%u) did not return a CopyToRoutine struct", + format, handlerOid), + parser_errposition(pstate, defel->location))); + + opts_out->to_routine = routine; +} + /* * Process the statement option list for COPY. * @@ -481,28 +545,10 @@ ProcessCopyOptions(ParseState *pstate, if (strcmp(defel->defname, "format") == 0) { - char *fmt = defGetString(defel); - if (format_specified) errorConflictingDefElem(defel, pstate); format_specified = true; - if (strcmp(fmt, "text") == 0) - /* default format */ ; - else if (strcmp(fmt, "csv") == 0) - { - opts_out->csv_mode = true; - opts_out->to_routine = &CopyToRoutineCSV; - } - else if (strcmp(fmt, "binary") == 0) - { - opts_out->binary = true; - opts_out->to_routine = &CopyToRoutineBinary; - } - else - ereport(ERROR, - (errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("COPY format \"%s\" not recognized", fmt), - parser_errposition(pstate, defel->location))); + ProcessCopyOptionCustomFormat(pstate, opts_out, is_from, defel); } } /* Extract options except "format" from the statement node tree */ diff --git a/src/backend/nodes/Makefile b/src/backend/nodes/Makefile index 66bbad8e6e..173ee11811 100644 --- a/src/backend/nodes/Makefile +++ b/src/backend/nodes/Makefile @@ -49,6 +49,7 @@ node_headers = \ access/sdir.h \ access/tableam.h \ access/tsmapi.h \ + commands/copyapi.h \ commands/event_trigger.h \ commands/trigger.h \ executor/tuptable.h \ diff --git a/src/backend/nodes/gen_node_support.pl b/src/backend/nodes/gen_node_support.pl old mode 100644 new mode 100755 index 2f0a59bc87..bd397f45ac --- a/src/backend/nodes/gen_node_support.pl +++ b/src/backend/nodes/gen_node_support.pl @@ -61,6 +61,7 @@ my @all_input_files = qw( access/sdir.h access/tableam.h access/tsmapi.h + commands/copyapi.h commands/event_trigger.h commands/trigger.h executor/tuptable.h @@ -85,6 +86,7 @@ my @nodetag_only_files = qw( access/sdir.h access/tableam.h access/tsmapi.h + commands/copyapi.h commands/event_trigger.h commands/trigger.h executor/tuptable.h diff --git a/src/backend/utils/adt/pseudotypes.c b/src/backend/utils/adt/pseudotypes.c index a3a991f634..d308780c43 100644 --- a/src/backend/utils/adt/pseudotypes.c +++ b/src/backend/utils/adt/pseudotypes.c @@ -373,6 +373,7 @@ PSEUDOTYPE_DUMMY_IO_FUNCS(fdw_handler); PSEUDOTYPE_DUMMY_IO_FUNCS(table_am_handler); PSEUDOTYPE_DUMMY_IO_FUNCS(index_am_handler); PSEUDOTYPE_DUMMY_IO_FUNCS(tsm_handler); +PSEUDOTYPE_DUMMY_IO_FUNCS(copy_handler); PSEUDOTYPE_DUMMY_IO_FUNCS(internal); PSEUDOTYPE_DUMMY_IO_FUNCS(anyelement); PSEUDOTYPE_DUMMY_IO_FUNCS(anynonarray); diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index ad74e07dbb..4772bdc0e4 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -7617,6 +7617,12 @@ { oid => '3312', descr => 'I/O', proname => 'tsm_handler_out', prorettype => 'cstring', proargtypes => 'tsm_handler', prosrc => 'tsm_handler_out' }, +{ oid => '8753', descr => 'I/O', + proname => 'copy_handler_in', proisstrict => 'f', prorettype => 'copy_handler', + proargtypes => 'cstring', prosrc => 'copy_handler_in' }, +{ oid => '8754', descr => 'I/O', + proname => 'copy_handler_out', prorettype => 'cstring', + proargtypes => 'copy_handler', prosrc => 'copy_handler_out' }, { oid => '267', descr => 'I/O', proname => 'table_am_handler_in', proisstrict => 'f', prorettype => 'table_am_handler', proargtypes => 'cstring', diff --git a/src/include/catalog/pg_type.dat b/src/include/catalog/pg_type.dat index d29194da31..2040d5da83 100644 --- a/src/include/catalog/pg_type.dat +++ b/src/include/catalog/pg_type.dat @@ -632,6 +632,12 @@ typcategory => 'P', typinput => 'tsm_handler_in', typoutput => 'tsm_handler_out', typreceive => '-', typsend => '-', typalign => 'i' }, +{ oid => '8752', + descr => 'pseudo-type for the result of a copy to/from method functoin', + typname => 'copy_handler', typlen => '4', typbyval => 't', typtype => 'p', + typcategory => 'P', typinput => 'copy_handler_in', + typoutput => 'copy_handler_out', typreceive => '-', typsend => '-', + typalign => 'i' }, { oid => '269', typname => 'table_am_handler', descr => 'pseudo-type for the result of a table AM handler function', diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index eb68f2fb7b..9c25e1c415 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -29,6 +29,8 @@ typedef void (*CopyToEnd_function) (CopyToState cstate); /* Routines for a COPY TO format implementation. */ typedef struct CopyToRoutine { + NodeTag type; + /* * Called for processing one COPY TO option. This will return false when * the given option is invalid. diff --git a/src/include/nodes/meson.build b/src/include/nodes/meson.build index b665e55b65..103df1a787 100644 --- a/src/include/nodes/meson.build +++ b/src/include/nodes/meson.build @@ -11,6 +11,7 @@ node_support_input_i = [ 'access/sdir.h', 'access/tableam.h', 'access/tsmapi.h', + 'commands/copyapi.h', 'commands/event_trigger.h', 'commands/trigger.h', 'executor/tuptable.h', diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile index e32c8925f6..9d57b868d5 100644 --- a/src/test/modules/Makefile +++ b/src/test/modules/Makefile @@ -15,6 +15,7 @@ SUBDIRS = \ spgist_name_ops \ test_bloomfilter \ test_copy_callbacks \ + test_copy_format \ test_custom_rmgrs \ test_ddl_deparse \ test_dsa \ diff --git a/src/test/modules/meson.build b/src/test/modules/meson.build index 397e0906e6..d76f2a6003 100644 --- a/src/test/modules/meson.build +++ b/src/test/modules/meson.build @@ -13,6 +13,7 @@ subdir('spgist_name_ops') subdir('ssl_passphrase_callback') subdir('test_bloomfilter') subdir('test_copy_callbacks') +subdir('test_copy_format') subdir('test_custom_rmgrs') subdir('test_ddl_deparse') subdir('test_dsa') diff --git a/src/test/modules/test_copy_format/.gitignore b/src/test/modules/test_copy_format/.gitignore new file mode 100644 index 0000000000..5dcb3ff972 --- /dev/null +++ b/src/test/modules/test_copy_format/.gitignore @@ -0,0 +1,4 @@ +# Generated subdirectories +/log/ +/results/ +/tmp_check/ diff --git a/src/test/modules/test_copy_format/Makefile b/src/test/modules/test_copy_format/Makefile new file mode 100644 index 0000000000..8497f91624 --- /dev/null +++ b/src/test/modules/test_copy_format/Makefile @@ -0,0 +1,23 @@ +# src/test/modules/test_copy_format/Makefile + +MODULE_big = test_copy_format +OBJS = \ + $(WIN32RES) \ + test_copy_format.o +PGFILEDESC = "test_copy_format - test custom COPY FORMAT" + +EXTENSION = test_copy_format +DATA = test_copy_format--1.0.sql + +REGRESS = test_copy_format + +ifdef USE_PGXS +PG_CONFIG = pg_config +PGXS := $(shell $(PG_CONFIG) --pgxs) +include $(PGXS) +else +subdir = src/test/modules/test_copy_format +top_builddir = ../../../.. +include $(top_builddir)/src/Makefile.global +include $(top_srcdir)/contrib/contrib-global.mk +endif diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out b/src/test/modules/test_copy_format/expected/test_copy_format.out new file mode 100644 index 0000000000..3a24ae7b97 --- /dev/null +++ b/src/test/modules/test_copy_format/expected/test_copy_format.out @@ -0,0 +1,17 @@ +CREATE EXTENSION test_copy_format; +CREATE TABLE public.test (a INT, b INT, c INT); +INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789); +COPY public.test TO stdout WITH ( + option_before 'before', + format 'test_copy_format', + option_after 'after' +); +NOTICE: test_copy_format: is_from=false +NOTICE: CopyToProcessOption: "option_before"="before" +NOTICE: CopyToProcessOption: "option_after"="after" +NOTICE: CopyToGetFormat +NOTICE: CopyToStart: natts=3 +NOTICE: CopyToOneRow: tts_nvalid=3 +NOTICE: CopyToOneRow: tts_nvalid=3 +NOTICE: CopyToOneRow: tts_nvalid=3 +NOTICE: CopyToEnd diff --git a/src/test/modules/test_copy_format/meson.build b/src/test/modules/test_copy_format/meson.build new file mode 100644 index 0000000000..4cefe7b709 --- /dev/null +++ b/src/test/modules/test_copy_format/meson.build @@ -0,0 +1,33 @@ +# Copyright (c) 2024, PostgreSQL Global Development Group + +test_copy_format_sources = files( + 'test_copy_format.c', +) + +if host_system == 'windows' + test_copy_format_sources += rc_lib_gen.process(win32ver_rc, extra_args: [ + '--NAME', 'test_copy_format', + '--FILEDESC', 'test_copy_format - test custom COPY FORMAT',]) +endif + +test_copy_format = shared_module('test_copy_format', + test_copy_format_sources, + kwargs: pg_test_mod_args, +) +test_install_libs += test_copy_format + +test_install_data += files( + 'test_copy_format.control', + 'test_copy_format--1.0.sql', +) + +tests += { + 'name': 'test_copy_format', + 'sd': meson.current_source_dir(), + 'bd': meson.current_build_dir(), + 'regress': { + 'sql': [ + 'test_copy_format', + ], + }, +} diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql b/src/test/modules/test_copy_format/sql/test_copy_format.sql new file mode 100644 index 0000000000..0eb7ed2e11 --- /dev/null +++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql @@ -0,0 +1,8 @@ +CREATE EXTENSION test_copy_format; +CREATE TABLE public.test (a INT, b INT, c INT); +INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789); +COPY public.test TO stdout WITH ( + option_before 'before', + format 'test_copy_format', + option_after 'after' +); diff --git a/src/test/modules/test_copy_format/test_copy_format--1.0.sql b/src/test/modules/test_copy_format/test_copy_format--1.0.sql new file mode 100644 index 0000000000..d24ea03ce9 --- /dev/null +++ b/src/test/modules/test_copy_format/test_copy_format--1.0.sql @@ -0,0 +1,8 @@ +/* src/test/modules/test_copy_format/test_copy_format--1.0.sql */ + +-- complain if script is sourced in psql, rather than via CREATE EXTENSION +\echo Use "CREATE EXTENSION test_copy_format" to load this file. \quit + +CREATE FUNCTION test_copy_format(internal) + RETURNS copy_handler + AS 'MODULE_PATHNAME' LANGUAGE C; diff --git a/src/test/modules/test_copy_format/test_copy_format.c b/src/test/modules/test_copy_format/test_copy_format.c new file mode 100644 index 0000000000..a2219afcde --- /dev/null +++ b/src/test/modules/test_copy_format/test_copy_format.c @@ -0,0 +1,77 @@ +/*-------------------------------------------------------------------------- + * + * test_copy_format.c + * Code for testing custom COPY format. + * + * Portions Copyright (c) 2024, PostgreSQL Global Development Group + * + * IDENTIFICATION + * src/test/modules/test_copy_format/test_copy_format.c + * + * ------------------------------------------------------------------------- + */ + +#include "postgres.h" + +#include "commands/copy.h" +#include "commands/defrem.h" + +PG_MODULE_MAGIC; + +static bool +CopyToProcessOption(CopyToState cstate, DefElem *defel) +{ + ereport(NOTICE, + (errmsg("CopyToProcessOption: \"%s\"=\"%s\"", + defel->defname, defGetString(defel)))); + return true; +} + +static int16 +CopyToGetFormat(CopyToState cstate) +{ + ereport(NOTICE, (errmsg("CopyToGetFormat"))); + return 0; +} + +static void +CopyToStart(CopyToState cstate, TupleDesc tupDesc) +{ + ereport(NOTICE, (errmsg("CopyToStart: natts=%d", tupDesc->natts))); +} + +static void +CopyToOneRow(CopyToState cstate, TupleTableSlot *slot) +{ + ereport(NOTICE, (errmsg("CopyToOneRow: tts_nvalid=%u", slot->tts_nvalid))); +} + +static void +CopyToEnd(CopyToState cstate) +{ + ereport(NOTICE, (errmsg("CopyToEnd"))); +} + +static const CopyToRoutine CopyToRoutineTestCopyFormat = { + .type = T_CopyToRoutine, + .CopyToProcessOption = CopyToProcessOption, + .CopyToGetFormat = CopyToGetFormat, + .CopyToStart = CopyToStart, + .CopyToOneRow = CopyToOneRow, + .CopyToEnd = CopyToEnd, +}; + +PG_FUNCTION_INFO_V1(test_copy_format); +Datum +test_copy_format(PG_FUNCTION_ARGS) +{ + bool is_from = PG_GETARG_BOOL(0); + + ereport(NOTICE, + (errmsg("test_copy_format: is_from=%s", is_from ? "true" : "false"))); + + if (is_from) + elog(ERROR, "COPY FROM isn't supported yet"); + + PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat); +} diff --git a/src/test/modules/test_copy_format/test_copy_format.control b/src/test/modules/test_copy_format/test_copy_format.control new file mode 100644 index 0000000000..f05a636235 --- /dev/null +++ b/src/test/modules/test_copy_format/test_copy_format.control @@ -0,0 +1,4 @@ +comment = 'Test code for custom COPY format' +default_version = '1.0' +module_pathname = '$libdir/test_copy_format' +relocatable = true -- 2.43.0 From 6480ebddfabed628c79a7c25f42f87b44d76f74f Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Tue, 23 Jan 2024 14:54:10 +0900 Subject: [PATCH v6 3/8] Export CopyToStateData It's for custom COPY TO format handlers implemented as extension. This just moves codes. This doesn't change codes except CopyDest enum values. CopyDest enum values such as COPY_FILE are conflicted CopySource enum values defined in copyfrom_internal.h. So COPY_DEST_ prefix instead of COPY_ prefix is used. For example, COPY_FILE is renamed to COPY_DEST_FILE. Note that this change isn't enough to implement a custom COPY TO format handler as extension. We'll do the followings in a subsequent commit: 1. Add an opaque space for custom COPY TO format handler 2. Export CopySendEndOfRow() to flush buffer --- src/backend/commands/copyto.c | 74 +++----------------- src/include/commands/copy.h | 59 ---------------- src/include/commands/copyapi.h | 120 ++++++++++++++++++++++++++++++++- 3 files changed, 127 insertions(+), 126 deletions(-) diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c index 6547b7c654..cfc74ee7b1 100644 --- a/src/backend/commands/copyto.c +++ b/src/backend/commands/copyto.c @@ -43,64 +43,6 @@ #include "utils/rel.h" #include "utils/snapmgr.h" -/* - * Represents the different dest cases we need to worry about at - * the bottom level - */ -typedef enum CopyDest -{ - COPY_FILE, /* to file (or a piped program) */ - COPY_FRONTEND, /* to frontend */ - COPY_CALLBACK, /* to callback function */ -} CopyDest; - -/* - * This struct contains all the state variables used throughout a COPY TO - * operation. - * - * Multi-byte encodings: all supported client-side encodings encode multi-byte - * characters by having the first byte's high bit set. Subsequent bytes of the - * character can have the high bit not set. When scanning data in such an - * encoding to look for a match to a single-byte (ie ASCII) character, we must - * use the full pg_encoding_mblen() machinery to skip over multibyte - * characters, else we might find a false match to a trailing byte. In - * supported server encodings, there is no possibility of a false match, and - * it's faster to make useless comparisons to trailing bytes than it is to - * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true - * when we have to do it the hard way. - */ -typedef struct CopyToStateData -{ - /* low-level state data */ - CopyDest copy_dest; /* type of copy source/destination */ - FILE *copy_file; /* used if copy_dest == COPY_FILE */ - StringInfo fe_msgbuf; /* used for all dests during COPY TO */ - - int file_encoding; /* file or remote side's character encoding */ - bool need_transcoding; /* file encoding diff from server? */ - bool encoding_embeds_ascii; /* ASCII can be non-first byte? */ - - /* parameters from the COPY command */ - Relation rel; /* relation to copy to */ - QueryDesc *queryDesc; /* executable query to copy from */ - List *attnumlist; /* integer list of attnums to copy */ - char *filename; /* filename, or NULL for STDOUT */ - bool is_program; /* is 'filename' a program to popen? */ - copy_data_dest_cb data_dest_cb; /* function for writing data */ - - CopyFormatOptions opts; - Node *whereClause; /* WHERE condition (or NULL) */ - - /* - * Working state - */ - MemoryContext copycontext; /* per-copy execution context */ - - FmgrInfo *out_functions; /* lookup info for output functions */ - MemoryContext rowcontext; /* per-row evaluation context */ - uint64 bytes_processed; /* number of bytes processed so far */ -} CopyToStateData; - /* DestReceiver for COPY (query) TO */ typedef struct { @@ -160,7 +102,7 @@ CopyToTextSendEndOfRow(CopyToState cstate) { switch (cstate->copy_dest) { - case COPY_FILE: + case COPY_DEST_FILE: /* Default line termination depends on platform */ #ifndef WIN32 CopySendChar(cstate, '\n'); @@ -168,7 +110,7 @@ CopyToTextSendEndOfRow(CopyToState cstate) CopySendString(cstate, "\r\n"); #endif break; - case COPY_FRONTEND: + case COPY_DEST_FRONTEND: /* The FE/BE protocol uses \n as newline for all platforms */ CopySendChar(cstate, '\n'); break; @@ -419,7 +361,7 @@ SendCopyBegin(CopyToState cstate) for (i = 0; i < natts; i++) pq_sendint16(&buf, format); /* per-column formats */ pq_endmessage(&buf); - cstate->copy_dest = COPY_FRONTEND; + cstate->copy_dest = COPY_DEST_FRONTEND; } static void @@ -466,7 +408,7 @@ CopySendEndOfRow(CopyToState cstate) switch (cstate->copy_dest) { - case COPY_FILE: + case COPY_DEST_FILE: if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1, cstate->copy_file) != 1 || ferror(cstate->copy_file)) @@ -500,11 +442,11 @@ CopySendEndOfRow(CopyToState cstate) errmsg("could not write to COPY file: %m"))); } break; - case COPY_FRONTEND: + case COPY_DEST_FRONTEND: /* Dump the accumulated row as one CopyData message */ (void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len); break; - case COPY_CALLBACK: + case COPY_DEST_CALLBACK: cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len); break; } @@ -877,12 +819,12 @@ BeginCopyTo(ParseState *pstate, /* See Multibyte encoding comment above */ cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding); - cstate->copy_dest = COPY_FILE; /* default */ + cstate->copy_dest = COPY_DEST_FILE; /* default */ if (data_dest_cb) { progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK; - cstate->copy_dest = COPY_CALLBACK; + cstate->copy_dest = COPY_DEST_CALLBACK; cstate->data_dest_cb = data_dest_cb; } else if (pipe) diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h index 34bea880ca..b3f4682f95 100644 --- a/src/include/commands/copy.h +++ b/src/include/commands/copy.h @@ -20,69 +20,10 @@ #include "parser/parse_node.h" #include "tcop/dest.h" -/* - * Represents whether a header line should be present, and whether it must - * match the actual names (which implies "true"). - */ -typedef enum CopyHeaderChoice -{ - COPY_HEADER_FALSE = 0, - COPY_HEADER_TRUE, - COPY_HEADER_MATCH, -} CopyHeaderChoice; - -/* - * Represents where to save input processing errors. More values to be added - * in the future. - */ -typedef enum CopyOnErrorChoice -{ - COPY_ON_ERROR_STOP = 0, /* immediately throw errors, default */ - COPY_ON_ERROR_IGNORE, /* ignore errors */ -} CopyOnErrorChoice; - -/* - * A struct to hold COPY options, in a parsed form. All of these are related - * to formatting, except for 'freeze', which doesn't really belong here, but - * it's expedient to parse it along with all the other options. - */ -typedef struct CopyFormatOptions -{ - /* parameters from the COPY command */ - int file_encoding; /* file or remote side's character encoding, - * -1 if not specified */ - bool binary; /* binary format? */ - bool freeze; /* freeze rows on loading? */ - bool csv_mode; /* Comma Separated Value format? */ - CopyHeaderChoice header_line; /* header line? */ - char *null_print; /* NULL marker string (server encoding!) */ - int null_print_len; /* length of same */ - char *null_print_client; /* same converted to file encoding */ - char *default_print; /* DEFAULT marker string */ - int default_print_len; /* length of same */ - char *delim; /* column delimiter (must be 1 byte) */ - char *quote; /* CSV quote char (must be 1 byte) */ - char *escape; /* CSV escape char (must be 1 byte) */ - List *force_quote; /* list of column names */ - bool force_quote_all; /* FORCE_QUOTE *? */ - bool *force_quote_flags; /* per-column CSV FQ flags */ - List *force_notnull; /* list of column names */ - bool force_notnull_all; /* FORCE_NOT_NULL *? */ - bool *force_notnull_flags; /* per-column CSV FNN flags */ - List *force_null; /* list of column names */ - bool force_null_all; /* FORCE_NULL *? */ - bool *force_null_flags; /* per-column CSV FN flags */ - bool convert_selectively; /* do selective binary conversion? */ - CopyOnErrorChoice on_error; /* what to do when error happened */ - List *convert_select; /* list of column names (can be NIL) */ - CopyToRoutine *to_routine; /* callback routines for COPY TO */ -} CopyFormatOptions; - /* This is private in commands/copyfrom.c */ typedef struct CopyFromStateData *CopyFromState; typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread); -typedef void (*copy_data_dest_cb) (void *data, int len); extern void DoCopy(ParseState *pstate, const CopyStmt *stmt, int stmt_location, int stmt_len, diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index 9c25e1c415..a869d78d72 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -14,10 +14,10 @@ #ifndef COPYAPI_H #define COPYAPI_H +#include "executor/execdesc.h" #include "executor/tuptable.h" #include "nodes/parsenodes.h" -/* This is private in commands/copyto.c */ typedef struct CopyToStateData *CopyToState; typedef bool (*CopyToProcessOption_function) (CopyToState cstate, DefElem *defel); @@ -58,4 +58,122 @@ extern CopyToRoutine CopyToRoutineText; extern CopyToRoutine CopyToRoutineCSV; extern CopyToRoutine CopyToRoutineBinary; +/* + * Represents whether a header line should be present, and whether it must + * match the actual names (which implies "true"). + */ +typedef enum CopyHeaderChoice +{ + COPY_HEADER_FALSE = 0, + COPY_HEADER_TRUE, + COPY_HEADER_MATCH, +} CopyHeaderChoice; + +/* + * Represents where to save input processing errors. More values to be added + * in the future. + */ +typedef enum CopyOnErrorChoice +{ + COPY_ON_ERROR_STOP = 0, /* immediately throw errors, default */ + COPY_ON_ERROR_IGNORE, /* ignore errors */ +} CopyOnErrorChoice; + +/* + * A struct to hold COPY options, in a parsed form. All of these are related + * to formatting, except for 'freeze', which doesn't really belong here, but + * it's expedient to parse it along with all the other options. + */ +typedef struct CopyFormatOptions +{ + /* parameters from the COPY command */ + int file_encoding; /* file or remote side's character encoding, + * -1 if not specified */ + bool binary; /* binary format? */ + bool freeze; /* freeze rows on loading? */ + bool csv_mode; /* Comma Separated Value format? */ + CopyHeaderChoice header_line; /* header line? */ + char *null_print; /* NULL marker string (server encoding!) */ + int null_print_len; /* length of same */ + char *null_print_client; /* same converted to file encoding */ + char *default_print; /* DEFAULT marker string */ + int default_print_len; /* length of same */ + char *delim; /* column delimiter (must be 1 byte) */ + char *quote; /* CSV quote char (must be 1 byte) */ + char *escape; /* CSV escape char (must be 1 byte) */ + List *force_quote; /* list of column names */ + bool force_quote_all; /* FORCE_QUOTE *? */ + bool *force_quote_flags; /* per-column CSV FQ flags */ + List *force_notnull; /* list of column names */ + bool force_notnull_all; /* FORCE_NOT_NULL *? */ + bool *force_notnull_flags; /* per-column CSV FNN flags */ + List *force_null; /* list of column names */ + bool force_null_all; /* FORCE_NULL *? */ + bool *force_null_flags; /* per-column CSV FN flags */ + bool convert_selectively; /* do selective binary conversion? */ + CopyOnErrorChoice on_error; /* what to do when error happened */ + List *convert_select; /* list of column names (can be NIL) */ + CopyToRoutine *to_routine; /* callback routines for COPY TO */ +} CopyFormatOptions; + +/* + * Represents the different dest cases we need to worry about at + * the bottom level + */ +typedef enum CopyDest +{ + COPY_DEST_FILE, /* to file (or a piped program) */ + COPY_DEST_FRONTEND, /* to frontend */ + COPY_DEST_CALLBACK, /* to callback function */ +} CopyDest; + +typedef void (*copy_data_dest_cb) (void *data, int len); + +/* + * This struct contains all the state variables used throughout a COPY TO + * operation. + * + * Multi-byte encodings: all supported client-side encodings encode multi-byte + * characters by having the first byte's high bit set. Subsequent bytes of the + * character can have the high bit not set. When scanning data in such an + * encoding to look for a match to a single-byte (ie ASCII) character, we must + * use the full pg_encoding_mblen() machinery to skip over multibyte + * characters, else we might find a false match to a trailing byte. In + * supported server encodings, there is no possibility of a false match, and + * it's faster to make useless comparisons to trailing bytes than it is to + * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true + * when we have to do it the hard way. + */ +typedef struct CopyToStateData +{ + /* low-level state data */ + CopyDest copy_dest; /* type of copy source/destination */ + FILE *copy_file; /* used if copy_dest == COPY_FILE */ + StringInfo fe_msgbuf; /* used for all dests during COPY TO */ + + int file_encoding; /* file or remote side's character encoding */ + bool need_transcoding; /* file encoding diff from server? */ + bool encoding_embeds_ascii; /* ASCII can be non-first byte? */ + + /* parameters from the COPY command */ + Relation rel; /* relation to copy to */ + QueryDesc *queryDesc; /* executable query to copy from */ + List *attnumlist; /* integer list of attnums to copy */ + char *filename; /* filename, or NULL for STDOUT */ + bool is_program; /* is 'filename' a program to popen? */ + copy_data_dest_cb data_dest_cb; /* function for writing data */ + + CopyFormatOptions opts; + Node *whereClause; /* WHERE condition (or NULL) */ + + /* + * Working state + */ + MemoryContext copycontext; /* per-copy execution context */ + + FmgrInfo *out_functions; /* lookup info for output functions */ + MemoryContext rowcontext; /* per-row evaluation context */ + uint64 bytes_processed; /* number of bytes processed so far */ +} CopyToStateData; + #endif /* COPYAPI_H */ -- 2.43.0 From 636e28b6478a8295469f832fb816a835e9cf24f6 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Tue, 23 Jan 2024 15:12:43 +0900 Subject: [PATCH v6 4/8] Add support for implementing custom COPY TO format as extension * Add CopyToStateData::opaque that can be used to keep data for custom COPY TO format implementation * Export CopySendEndOfRow() to flush data in CopyToStateData::fe_msgbuf * Rename CopySendEndOfRow() to CopyToStateFlush() because it's a method for CopyToState and it's used for flushing. End-of-row related codes were moved to CopyToTextSendEndOfRow(). --- src/backend/commands/copyto.c | 15 +++++++-------- src/include/commands/copyapi.h | 5 +++++ 2 files changed, 12 insertions(+), 8 deletions(-) diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c index cfc74ee7b1..b5d8678394 100644 --- a/src/backend/commands/copyto.c +++ b/src/backend/commands/copyto.c @@ -69,7 +69,6 @@ static void SendCopyEnd(CopyToState cstate); static void CopySendData(CopyToState cstate, const void *databuf, int datasize); static void CopySendString(CopyToState cstate, const char *str); static void CopySendChar(CopyToState cstate, char c); -static void CopySendEndOfRow(CopyToState cstate); static void CopySendInt32(CopyToState cstate, int32 val); static void CopySendInt16(CopyToState cstate, int16 val); @@ -117,7 +116,7 @@ CopyToTextSendEndOfRow(CopyToState cstate) default: break; } - CopySendEndOfRow(cstate); + CopyToStateFlush(cstate); } static void @@ -302,7 +301,7 @@ CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot) } } - CopySendEndOfRow(cstate); + CopyToStateFlush(cstate); } static void @@ -311,7 +310,7 @@ CopyToBinaryEnd(CopyToState cstate) /* Generate trailer for a binary copy */ CopySendInt16(cstate, -1); /* Need to flush out the trailer */ - CopySendEndOfRow(cstate); + CopyToStateFlush(cstate); } CopyToRoutine CopyToRoutineText = { @@ -377,8 +376,8 @@ SendCopyEnd(CopyToState cstate) * CopySendData sends output data to the destination (file or frontend) * CopySendString does the same for null-terminated strings * CopySendChar does the same for single characters - * CopySendEndOfRow does the appropriate thing at end of each data row - * (data is not actually flushed except by CopySendEndOfRow) + * CopyToStateFlush flushes the buffered data + * (data is not actually flushed except by CopyToStateFlush) * * NB: no data conversion is applied by these functions *---------- @@ -401,8 +400,8 @@ CopySendChar(CopyToState cstate, char c) appendStringInfoCharMacro(cstate->fe_msgbuf, c); } -static void -CopySendEndOfRow(CopyToState cstate) +void +CopyToStateFlush(CopyToState cstate) { StringInfo fe_msgbuf = cstate->fe_msgbuf; diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index a869d78d72..ffad433a21 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -174,6 +174,11 @@ typedef struct CopyToStateData FmgrInfo *out_functions; /* lookup info for output functions */ MemoryContext rowcontext; /* per-row evaluation context */ uint64 bytes_processed; /* number of bytes processed so far */ + + /* For custom format implementation */ + void *opaque; /* private space */ } CopyToStateData; +extern void CopyToStateFlush(CopyToState cstate); + #endif /* COPYAPI_H */ -- 2.43.0 From 53b120ef11a10563fb9f12ad40042adf039bd18c Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Tue, 23 Jan 2024 17:21:23 +0900 Subject: [PATCH v6 5/8] Extract COPY FROM format implementations This doesn't change the current behavior. This just introduces CopyFromRoutine, which just has function pointers of format implementation like TupleTableSlotOps, and use it for existing "text", "csv" and "binary" format implementations. Note that CopyFromRoutine can't be used from extensions yet because CopyRead*() aren't exported yet. Extensions can't read data from a source without CopyRead*(). They will be exported by subsequent patches. --- src/backend/commands/copy.c | 3 + src/backend/commands/copyfrom.c | 216 ++++++++++---- src/backend/commands/copyfromparse.c | 346 ++++++++++++----------- src/include/commands/copy.h | 3 - src/include/commands/copyapi.h | 44 +++ src/include/commands/copyfrom_internal.h | 4 + 6 files changed, 401 insertions(+), 215 deletions(-) diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index 6f0db0ae7c..ec6dfff8ab 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -459,12 +459,14 @@ ProcessCopyOptionCustomFormat(ParseState *pstate, else if (strcmp(format, "csv") == 0) { opts_out->csv_mode = true; + opts_out->from_routine = &CopyFromRoutineCSV; opts_out->to_routine = &CopyToRoutineCSV; return; } else if (strcmp(format, "binary") == 0) { opts_out->binary = true; + opts_out->from_routine = &CopyFromRoutineBinary; opts_out->to_routine = &CopyToRoutineBinary; return; } @@ -533,6 +535,7 @@ ProcessCopyOptions(ParseState *pstate, opts_out->file_encoding = -1; /* Text is the default format. */ + opts_out->from_routine = &CopyFromRoutineText; opts_out->to_routine = &CopyToRoutineText; /* diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index 05b3d13236..de85e4e9f1 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -108,6 +108,170 @@ static char *limit_printout_length(const char *str); static void ClosePipeFromProgram(CopyFromState cstate); + +/* + * CopyFromRoutine implementations. + */ + +/* + * CopyFromRoutine implementation for "text" and "csv". CopyFromText*() + * refer cstate->opts.csv_mode and change their behavior. We can split this + * implementation and stop referring cstate->opts.csv_mode later. + */ + +/* All "text" and "csv" options are parsed in ProcessCopyOptions(). We may + * move the code to here later. */ +static bool +CopyFromTextProcessOption(CopyFromState cstate, DefElem *defel) +{ + return false; +} + +static int16 +CopyFromTextGetFormat(CopyFromState cstate) +{ + return 0; +} + +static void +CopyFromTextStart(CopyFromState cstate, TupleDesc tupDesc) +{ + AttrNumber num_phys_attrs = tupDesc->natts; + AttrNumber attr_count; + + /* + * If encoding conversion is needed, we need another buffer to hold the + * converted input data. Otherwise, we can just point input_buf to the + * same buffer as raw_buf. + */ + if (cstate->need_transcoding) + { + cstate->input_buf = (char *) palloc(INPUT_BUF_SIZE + 1); + cstate->input_buf_index = cstate->input_buf_len = 0; + } + else + cstate->input_buf = cstate->raw_buf; + cstate->input_reached_eof = false; + + initStringInfo(&cstate->line_buf); + + /* + * Pick up the required catalog information for each attribute in the + * relation, including the input function, the element type (to pass to + * the input function). + */ + cstate->in_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo)); + cstate->typioparams = (Oid *) palloc(num_phys_attrs * sizeof(Oid)); + for (int attnum = 1; attnum <= num_phys_attrs; attnum++) + { + Form_pg_attribute att = TupleDescAttr(tupDesc, attnum - 1); + Oid in_func_oid; + + /* We don't need info for dropped attributes */ + if (att->attisdropped) + continue; + + /* Fetch the input function and typioparam info */ + getTypeInputInfo(att->atttypid, + &in_func_oid, &cstate->typioparams[attnum - 1]); + fmgr_info(in_func_oid, &cstate->in_functions[attnum - 1]); + } + + /* create workspace for CopyReadAttributes results */ + attr_count = list_length(cstate->attnumlist); + cstate->max_fields = attr_count; + cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *)); +} + +static void +CopyFromTextEnd(CopyFromState cstate) +{ +} + +/* + * CopyFromRoutine implementation for "binary". + */ + +/* All "binary" options are parsed in ProcessCopyOptions(). We may move the + * code to here later. */ +static bool +CopyFromBinaryProcessOption(CopyFromState cstate, DefElem *defel) +{ + return false; +} + +static int16 +CopyFromBinaryGetFormat(CopyFromState cstate) +{ + return 1; +} + +static void +CopyFromBinaryStart(CopyFromState cstate, TupleDesc tupDesc) +{ + AttrNumber num_phys_attrs = tupDesc->natts; + + /* + * Pick up the required catalog information for each attribute in the + * relation, including the input function, the element type (to pass to + * the input function). + */ + cstate->in_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo)); + cstate->typioparams = (Oid *) palloc(num_phys_attrs * sizeof(Oid)); + for (int attnum = 1; attnum <= num_phys_attrs; attnum++) + { + Form_pg_attribute att = TupleDescAttr(tupDesc, attnum - 1); + Oid in_func_oid; + + /* We don't need info for dropped attributes */ + if (att->attisdropped) + continue; + + /* Fetch the input function and typioparam info */ + getTypeBinaryInputInfo(att->atttypid, + &in_func_oid, &cstate->typioparams[attnum - 1]); + fmgr_info(in_func_oid, &cstate->in_functions[attnum - 1]); + } + + /* Read and verify binary header */ + ReceiveCopyBinaryHeader(cstate); +} + +static void +CopyFromBinaryEnd(CopyFromState cstate) +{ +} + +CopyFromRoutine CopyFromRoutineText = { + .CopyFromProcessOption = CopyFromTextProcessOption, + .CopyFromGetFormat = CopyFromTextGetFormat, + .CopyFromStart = CopyFromTextStart, + .CopyFromOneRow = CopyFromTextOneRow, + .CopyFromEnd = CopyFromTextEnd, +}; + +/* + * We can use the same CopyFromRoutine for both of "text" and "csv" because + * CopyFromText*() refer cstate->opts.csv_mode and change their behavior. We can + * split the implementations and stop referring cstate->opts.csv_mode later. + */ +CopyFromRoutine CopyFromRoutineCSV = { + .CopyFromProcessOption = CopyFromTextProcessOption, + .CopyFromGetFormat = CopyFromTextGetFormat, + .CopyFromStart = CopyFromTextStart, + .CopyFromOneRow = CopyFromTextOneRow, + .CopyFromEnd = CopyFromTextEnd, +}; + +CopyFromRoutine CopyFromRoutineBinary = { + .CopyFromProcessOption = CopyFromBinaryProcessOption, + .CopyFromGetFormat = CopyFromBinaryGetFormat, + .CopyFromStart = CopyFromBinaryStart, + .CopyFromOneRow = CopyFromBinaryOneRow, + .CopyFromEnd = CopyFromBinaryEnd, +}; + + /* * error context callback for COPY FROM * @@ -1379,9 +1543,6 @@ BeginCopyFrom(ParseState *pstate, TupleDesc tupDesc; AttrNumber num_phys_attrs, num_defaults; - FmgrInfo *in_functions; - Oid *typioparams; - Oid in_func_oid; int *defmap; ExprState **defexprs; MemoryContext oldcontext; @@ -1566,25 +1727,6 @@ BeginCopyFrom(ParseState *pstate, cstate->raw_buf_index = cstate->raw_buf_len = 0; cstate->raw_reached_eof = false; - if (!cstate->opts.binary) - { - /* - * If encoding conversion is needed, we need another buffer to hold - * the converted input data. Otherwise, we can just point input_buf - * to the same buffer as raw_buf. - */ - if (cstate->need_transcoding) - { - cstate->input_buf = (char *) palloc(INPUT_BUF_SIZE + 1); - cstate->input_buf_index = cstate->input_buf_len = 0; - } - else - cstate->input_buf = cstate->raw_buf; - cstate->input_reached_eof = false; - - initStringInfo(&cstate->line_buf); - } - initStringInfo(&cstate->attribute_buf); /* Assign range table and rteperminfos, we'll need them in CopyFrom. */ @@ -1603,8 +1745,6 @@ BeginCopyFrom(ParseState *pstate, * the input function), and info about defaults and constraints. (Which * input function we use depends on text/binary format choice.) */ - in_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo)); - typioparams = (Oid *) palloc(num_phys_attrs * sizeof(Oid)); defmap = (int *) palloc(num_phys_attrs * sizeof(int)); defexprs = (ExprState **) palloc(num_phys_attrs * sizeof(ExprState *)); @@ -1616,15 +1756,6 @@ BeginCopyFrom(ParseState *pstate, if (att->attisdropped) continue; - /* Fetch the input function and typioparam info */ - if (cstate->opts.binary) - getTypeBinaryInputInfo(att->atttypid, - &in_func_oid, &typioparams[attnum - 1]); - else - getTypeInputInfo(att->atttypid, - &in_func_oid, &typioparams[attnum - 1]); - fmgr_info(in_func_oid, &in_functions[attnum - 1]); - /* Get default info if available */ defexprs[attnum - 1] = NULL; @@ -1684,8 +1815,6 @@ BeginCopyFrom(ParseState *pstate, cstate->bytes_processed = 0; /* We keep those variables in cstate. */ - cstate->in_functions = in_functions; - cstate->typioparams = typioparams; cstate->defmap = defmap; cstate->defexprs = defexprs; cstate->volatile_defexprs = volatile_defexprs; @@ -1758,20 +1887,7 @@ BeginCopyFrom(ParseState *pstate, pgstat_progress_update_multi_param(3, progress_cols, progress_vals); - if (cstate->opts.binary) - { - /* Read and verify binary header */ - ReceiveCopyBinaryHeader(cstate); - } - - /* create workspace for CopyReadAttributes results */ - if (!cstate->opts.binary) - { - AttrNumber attr_count = list_length(cstate->attnumlist); - - cstate->max_fields = attr_count; - cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *)); - } + cstate->opts.from_routine->CopyFromStart(cstate, tupDesc); MemoryContextSwitchTo(oldcontext); @@ -1784,6 +1900,8 @@ BeginCopyFrom(ParseState *pstate, void EndCopyFrom(CopyFromState cstate) { + cstate->opts.from_routine->CopyFromEnd(cstate); + /* No COPY FROM related resources except memory. */ if (cstate->is_program) { diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index 7cacd0b752..49632f75e4 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -172,7 +172,7 @@ ReceiveCopyBegin(CopyFromState cstate) { StringInfoData buf; int natts = list_length(cstate->attnumlist); - int16 format = (cstate->opts.binary ? 1 : 0); + int16 format = cstate->opts.from_routine->CopyFromGetFormat(cstate); int i; pq_beginmessage(&buf, PqMsg_CopyInResponse); @@ -840,6 +840,185 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields) return true; } +bool +CopyFromTextOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls) +{ + TupleDesc tupDesc; + AttrNumber attr_count; + FmgrInfo *in_functions = cstate->in_functions; + Oid *typioparams = cstate->typioparams; + ExprState **defexprs = cstate->defexprs; + char **field_strings; + ListCell *cur; + int fldct; + int fieldno; + char *string; + + tupDesc = RelationGetDescr(cstate->rel); + attr_count = list_length(cstate->attnumlist); + + /* read raw fields in the next line */ + if (!NextCopyFromRawFields(cstate, &field_strings, &fldct)) + return false; + + /* check for overflowing fields */ + if (attr_count > 0 && fldct > attr_count) + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("extra data after last expected column"))); + + fieldno = 0; + + /* Loop to read the user attributes on the line. */ + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + int m = attnum - 1; + Form_pg_attribute att = TupleDescAttr(tupDesc, m); + + if (fieldno >= fldct) + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("missing data for column \"%s\"", + NameStr(att->attname)))); + string = field_strings[fieldno++]; + + if (cstate->convert_select_flags && + !cstate->convert_select_flags[m]) + { + /* ignore input field, leaving column as NULL */ + continue; + } + + if (cstate->opts.csv_mode) + { + if (string == NULL && + cstate->opts.force_notnull_flags[m]) + { + /* + * FORCE_NOT_NULL option is set and column is NULL - convert + * it to the NULL string. + */ + string = cstate->opts.null_print; + } + else if (string != NULL && cstate->opts.force_null_flags[m] + && strcmp(string, cstate->opts.null_print) == 0) + { + /* + * FORCE_NULL option is set and column matches the NULL + * string. It must have been quoted, or otherwise the string + * would already have been set to NULL. Convert it to NULL as + * specified. + */ + string = NULL; + } + } + + cstate->cur_attname = NameStr(att->attname); + cstate->cur_attval = string; + + if (string != NULL) + nulls[m] = false; + + if (cstate->defaults[m]) + { + /* + * The caller must supply econtext and have switched into the + * per-tuple memory context in it. + */ + Assert(econtext != NULL); + Assert(CurrentMemoryContext == econtext->ecxt_per_tuple_memory); + + values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]); + } + + /* + * If ON_ERROR is specified with IGNORE, skip rows with soft errors + */ + else if (!InputFunctionCallSafe(&in_functions[m], + string, + typioparams[m], + att->atttypmod, + (Node *) cstate->escontext, + &values[m])) + { + cstate->num_errors++; + return true; + } + + cstate->cur_attname = NULL; + cstate->cur_attval = NULL; + } + + Assert(fieldno == attr_count); + + return true; +} + +bool +CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls) +{ + TupleDesc tupDesc; + AttrNumber attr_count; + FmgrInfo *in_functions = cstate->in_functions; + Oid *typioparams = cstate->typioparams; + int16 fld_count; + ListCell *cur; + + tupDesc = RelationGetDescr(cstate->rel); + attr_count = list_length(cstate->attnumlist); + + cstate->cur_lineno++; + + if (!CopyGetInt16(cstate, &fld_count)) + { + /* EOF detected (end of file, or protocol-level EOF) */ + return false; + } + + if (fld_count == -1) + { + /* + * Received EOF marker. Wait for the protocol-level EOF, and complain + * if it doesn't come immediately. In COPY FROM STDIN, this ensures + * that we correctly handle CopyFail, if client chooses to send that + * now. When copying from file, we could ignore the rest of the file + * like in text mode, but we choose to be consistent with the COPY + * FROM STDIN case. + */ + char dummy; + + if (CopyReadBinaryData(cstate, &dummy, 1) > 0) + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("received copy data after EOF marker"))); + return false; + } + + if (fld_count != attr_count) + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("row field count is %d, expected %d", + (int) fld_count, attr_count))); + + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + int m = attnum - 1; + Form_pg_attribute att = TupleDescAttr(tupDesc, m); + + cstate->cur_attname = NameStr(att->attname); + values[m] = CopyReadBinaryAttribute(cstate, + &in_functions[m], + typioparams[m], + att->atttypmod, + &nulls[m]); + cstate->cur_attname = NULL; + } + + return true; +} + /* * Read next tuple from file for COPY FROM. Return false if no more tuples. * @@ -857,181 +1036,22 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext, { TupleDesc tupDesc; AttrNumber num_phys_attrs, - attr_count, num_defaults = cstate->num_defaults; - FmgrInfo *in_functions = cstate->in_functions; - Oid *typioparams = cstate->typioparams; int i; int *defmap = cstate->defmap; ExprState **defexprs = cstate->defexprs; tupDesc = RelationGetDescr(cstate->rel); num_phys_attrs = tupDesc->natts; - attr_count = list_length(cstate->attnumlist); /* Initialize all values for row to NULL */ MemSet(values, 0, num_phys_attrs * sizeof(Datum)); MemSet(nulls, true, num_phys_attrs * sizeof(bool)); MemSet(cstate->defaults, false, num_phys_attrs * sizeof(bool)); - if (!cstate->opts.binary) - { - char **field_strings; - ListCell *cur; - int fldct; - int fieldno; - char *string; - - /* read raw fields in the next line */ - if (!NextCopyFromRawFields(cstate, &field_strings, &fldct)) - return false; - - /* check for overflowing fields */ - if (attr_count > 0 && fldct > attr_count) - ereport(ERROR, - (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - errmsg("extra data after last expected column"))); - - fieldno = 0; - - /* Loop to read the user attributes on the line. */ - foreach(cur, cstate->attnumlist) - { - int attnum = lfirst_int(cur); - int m = attnum - 1; - Form_pg_attribute att = TupleDescAttr(tupDesc, m); - - if (fieldno >= fldct) - ereport(ERROR, - (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - errmsg("missing data for column \"%s\"", - NameStr(att->attname)))); - string = field_strings[fieldno++]; - - if (cstate->convert_select_flags && - !cstate->convert_select_flags[m]) - { - /* ignore input field, leaving column as NULL */ - continue; - } - - if (cstate->opts.csv_mode) - { - if (string == NULL && - cstate->opts.force_notnull_flags[m]) - { - /* - * FORCE_NOT_NULL option is set and column is NULL - - * convert it to the NULL string. - */ - string = cstate->opts.null_print; - } - else if (string != NULL && cstate->opts.force_null_flags[m] - && strcmp(string, cstate->opts.null_print) == 0) - { - /* - * FORCE_NULL option is set and column matches the NULL - * string. It must have been quoted, or otherwise the - * string would already have been set to NULL. Convert it - * to NULL as specified. - */ - string = NULL; - } - } - - cstate->cur_attname = NameStr(att->attname); - cstate->cur_attval = string; - - if (string != NULL) - nulls[m] = false; - - if (cstate->defaults[m]) - { - /* - * The caller must supply econtext and have switched into the - * per-tuple memory context in it. - */ - Assert(econtext != NULL); - Assert(CurrentMemoryContext == econtext->ecxt_per_tuple_memory); - - values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]); - } - - /* - * If ON_ERROR is specified with IGNORE, skip rows with soft - * errors - */ - else if (!InputFunctionCallSafe(&in_functions[m], - string, - typioparams[m], - att->atttypmod, - (Node *) cstate->escontext, - &values[m])) - { - cstate->num_errors++; - return true; - } - - cstate->cur_attname = NULL; - cstate->cur_attval = NULL; - } - - Assert(fieldno == attr_count); - } - else - { - /* binary */ - int16 fld_count; - ListCell *cur; - - cstate->cur_lineno++; - - if (!CopyGetInt16(cstate, &fld_count)) - { - /* EOF detected (end of file, or protocol-level EOF) */ - return false; - } - - if (fld_count == -1) - { - /* - * Received EOF marker. Wait for the protocol-level EOF, and - * complain if it doesn't come immediately. In COPY FROM STDIN, - * this ensures that we correctly handle CopyFail, if client - * chooses to send that now. When copying from file, we could - * ignore the rest of the file like in text mode, but we choose to - * be consistent with the COPY FROM STDIN case. - */ - char dummy; - - if (CopyReadBinaryData(cstate, &dummy, 1) > 0) - ereport(ERROR, - (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - errmsg("received copy data after EOF marker"))); - return false; - } - - if (fld_count != attr_count) - ereport(ERROR, - (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - errmsg("row field count is %d, expected %d", - (int) fld_count, attr_count))); - - foreach(cur, cstate->attnumlist) - { - int attnum = lfirst_int(cur); - int m = attnum - 1; - Form_pg_attribute att = TupleDescAttr(tupDesc, m); - - cstate->cur_attname = NameStr(att->attname); - values[m] = CopyReadBinaryAttribute(cstate, - &in_functions[m], - typioparams[m], - att->atttypmod, - &nulls[m]); - cstate->cur_attname = NULL; - } - } + if (!cstate->opts.from_routine->CopyFromOneRow(cstate, econtext, values, + nulls)) + return false; /* * Now compute and insert any defaults available for the columns not diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h index b3f4682f95..df29d42555 100644 --- a/src/include/commands/copy.h +++ b/src/include/commands/copy.h @@ -20,9 +20,6 @@ #include "parser/parse_node.h" #include "tcop/dest.h" -/* This is private in commands/copyfrom.c */ -typedef struct CopyFromStateData *CopyFromState; - typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread); extern void DoCopy(ParseState *pstate, const CopyStmt *stmt, diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index ffad433a21..323e4705d2 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -18,6 +18,49 @@ #include "executor/tuptable.h" #include "nodes/parsenodes.h" +/* This is private in commands/copyfrom.c */ +typedef struct CopyFromStateData *CopyFromState; + +typedef bool (*CopyFromProcessOption_function) (CopyFromState cstate, DefElem *defel); +typedef int16 (*CopyFromGetFormat_function) (CopyFromState cstate); +typedef void (*CopyFromStart_function) (CopyFromState cstate, TupleDesc tupDesc); +typedef bool (*CopyFromOneRow_function) (CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls); +typedef void (*CopyFromEnd_function) (CopyFromState cstate); + +/* Routines for a COPY FROM format implementation. */ +typedef struct CopyFromRoutine +{ + /* + * Called for processing one COPY FROM option. This will return false when + * the given option is invalid. + */ + CopyFromProcessOption_function CopyFromProcessOption; + + /* + * Called when COPY FROM is started. This will return a format as int16 + * value. It's used for the CopyInResponse message. + */ + CopyFromGetFormat_function CopyFromGetFormat; + + /* + * Called when COPY FROM is started. This will initialize something and + * receive a header. + */ + CopyFromStart_function CopyFromStart; + + /* Copy one row. It returns false if no more tuples. */ + CopyFromOneRow_function CopyFromOneRow; + + /* Called when COPY FROM is ended. This will finalize something. */ + CopyFromEnd_function CopyFromEnd; +} CopyFromRoutine; + +/* Built-in CopyFromRoutine for "text", "csv" and "binary". */ +extern CopyFromRoutine CopyFromRoutineText; +extern CopyFromRoutine CopyFromRoutineCSV; +extern CopyFromRoutine CopyFromRoutineBinary; + + typedef struct CopyToStateData *CopyToState; typedef bool (*CopyToProcessOption_function) (CopyToState cstate, DefElem *defel); @@ -113,6 +156,7 @@ typedef struct CopyFormatOptions bool convert_selectively; /* do selective binary conversion? */ CopyOnErrorChoice on_error; /* what to do when error happened */ List *convert_select; /* list of column names (can be NIL) */ + CopyFromRoutine *from_routine; /* callback routines for COPY FROM */ CopyToRoutine *to_routine; /* callback routines for COPY TO */ } CopyFormatOptions; diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h index cad52fcc78..921c1513f7 100644 --- a/src/include/commands/copyfrom_internal.h +++ b/src/include/commands/copyfrom_internal.h @@ -183,4 +183,8 @@ typedef struct CopyFromStateData extern void ReceiveCopyBegin(CopyFromState cstate); extern void ReceiveCopyBinaryHeader(CopyFromState cstate); +extern bool CopyFromTextOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls); +extern bool CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls); + + #endif /* COPYFROM_INTERNAL_H */ -- 2.43.0 From 8e75d2c93f6ce9d67664d53233d0b71c9f10613a Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Wed, 24 Jan 2024 11:07:14 +0900 Subject: [PATCH v6 6/8] Add support for adding custom COPY FROM format We use the same approach as we used for custom COPY TO format. Now, custom COPY format handler can return COPY TO format routines or COPY FROM format routines based on the "is_from" argument: copy_handler(true) returns CopyToRoutine copy_handler(false) returns CopyFromRoutine --- src/backend/commands/copy.c | 53 +++++++++++++------ src/include/commands/copyapi.h | 2 + .../expected/test_copy_format.out | 12 +++++ .../test_copy_format/sql/test_copy_format.sql | 6 +++ .../test_copy_format/test_copy_format.c | 50 +++++++++++++++-- 5 files changed, 105 insertions(+), 18 deletions(-) diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index ec6dfff8ab..479f36868c 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -472,12 +472,9 @@ ProcessCopyOptionCustomFormat(ParseState *pstate, } /* custom format */ - if (!is_from) - { - funcargtypes[0] = INTERNALOID; - handlerOid = LookupFuncName(list_make1(makeString(format)), 1, - funcargtypes, true); - } + funcargtypes[0] = INTERNALOID; + handlerOid = LookupFuncName(list_make1(makeString(format)), 1, + funcargtypes, true); if (!OidIsValid(handlerOid)) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), @@ -486,14 +483,36 @@ ProcessCopyOptionCustomFormat(ParseState *pstate, datum = OidFunctionCall1(handlerOid, BoolGetDatum(is_from)); routine = DatumGetPointer(datum); - if (routine == NULL || !IsA(routine, CopyToRoutine)) - ereport(ERROR, - (errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("COPY handler function %s(%u) did not return a CopyToRoutine struct", - format, handlerOid), - parser_errposition(pstate, defel->location))); - - opts_out->to_routine = routine; + if (is_from) + { + if (routine == NULL || !IsA(routine, CopyFromRoutine)) + ereport( + ERROR, + (errcode( + ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("COPY handler function " + "%s(%u) did not return a " + "CopyFromRoutine struct", + format, handlerOid), + parser_errposition( + pstate, defel->location))); + opts_out->from_routine = routine; + } + else + { + if (routine == NULL || !IsA(routine, CopyToRoutine)) + ereport( + ERROR, + (errcode( + ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("COPY handler function " + "%s(%u) did not return a " + "CopyToRoutine struct", + format, handlerOid), + parser_errposition( + pstate, defel->location))); + opts_out->to_routine = routine; + } } /* @@ -692,7 +711,11 @@ ProcessCopyOptions(ParseState *pstate, { bool processed = false; - if (!is_from) + if (is_from) + processed = + opts_out->from_routine->CopyFromProcessOption( + cstate, defel); + else processed = opts_out->to_routine->CopyToProcessOption(cstate, defel); if (!processed) ereport(ERROR, diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index 323e4705d2..ef1bb201c2 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -30,6 +30,8 @@ typedef void (*CopyFromEnd_function) (CopyFromState cstate); /* Routines for a COPY FROM format implementation. */ typedef struct CopyFromRoutine { + NodeTag type; + /* * Called for processing one COPY FROM option. This will return false when * the given option is invalid. diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out b/src/test/modules/test_copy_format/expected/test_copy_format.out index 3a24ae7b97..6af69f0eb7 100644 --- a/src/test/modules/test_copy_format/expected/test_copy_format.out +++ b/src/test/modules/test_copy_format/expected/test_copy_format.out @@ -1,6 +1,18 @@ CREATE EXTENSION test_copy_format; CREATE TABLE public.test (a INT, b INT, c INT); INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789); +COPY public.test FROM stdin WITH ( + option_before 'before', + format 'test_copy_format', + option_after 'after' +); +NOTICE: test_copy_format: is_from=true +NOTICE: CopyFromProcessOption: "option_before"="before" +NOTICE: CopyFromProcessOption: "option_after"="after" +NOTICE: CopyFromGetFormat +NOTICE: CopyFromStart: natts=3 +NOTICE: CopyFromOneRow +NOTICE: CopyFromEnd COPY public.test TO stdout WITH ( option_before 'before', format 'test_copy_format', diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql b/src/test/modules/test_copy_format/sql/test_copy_format.sql index 0eb7ed2e11..94d3c789a0 100644 --- a/src/test/modules/test_copy_format/sql/test_copy_format.sql +++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql @@ -1,6 +1,12 @@ CREATE EXTENSION test_copy_format; CREATE TABLE public.test (a INT, b INT, c INT); INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789); +COPY public.test FROM stdin WITH ( + option_before 'before', + format 'test_copy_format', + option_after 'after' +); +\. COPY public.test TO stdout WITH ( option_before 'before', format 'test_copy_format', diff --git a/src/test/modules/test_copy_format/test_copy_format.c b/src/test/modules/test_copy_format/test_copy_format.c index a2219afcde..5e1b40e881 100644 --- a/src/test/modules/test_copy_format/test_copy_format.c +++ b/src/test/modules/test_copy_format/test_copy_format.c @@ -18,6 +18,50 @@ PG_MODULE_MAGIC; +static bool +CopyFromProcessOption(CopyFromState cstate, DefElem *defel) +{ + ereport(NOTICE, + (errmsg("CopyFromProcessOption: \"%s\"=\"%s\"", + defel->defname, defGetString(defel)))); + return true; +} + +static int16 +CopyFromGetFormat(CopyFromState cstate) +{ + ereport(NOTICE, (errmsg("CopyFromGetFormat"))); + return 0; +} + +static void +CopyFromStart(CopyFromState cstate, TupleDesc tupDesc) +{ + ereport(NOTICE, (errmsg("CopyFromStart: natts=%d", tupDesc->natts))); +} + +static bool +CopyFromOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls) +{ + ereport(NOTICE, (errmsg("CopyFromOneRow"))); + return false; +} + +static void +CopyFromEnd(CopyFromState cstate) +{ + ereport(NOTICE, (errmsg("CopyFromEnd"))); +} + +static const CopyFromRoutine CopyFromRoutineTestCopyFormat = { + .type = T_CopyFromRoutine, + .CopyFromProcessOption = CopyFromProcessOption, + .CopyFromGetFormat = CopyFromGetFormat, + .CopyFromStart = CopyFromStart, + .CopyFromOneRow = CopyFromOneRow, + .CopyFromEnd = CopyFromEnd, +}; + static bool CopyToProcessOption(CopyToState cstate, DefElem *defel) { @@ -71,7 +115,7 @@ test_copy_format(PG_FUNCTION_ARGS) (errmsg("test_copy_format: is_from=%s", is_from ? "true" : "false"))); if (is_from) - elog(ERROR, "COPY FROM isn't supported yet"); - - PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat); + PG_RETURN_POINTER(&CopyFromRoutineTestCopyFormat); + else + PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat); } -- 2.43.0 From dc3c21e725849d1d0c163677d08d527a8bf3bc37 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Wed, 24 Jan 2024 14:16:29 +0900 Subject: [PATCH v6 7/8] Export CopyFromStateData It's for custom COPY FROM format handlers implemented as extension. This just moves codes. This doesn't change codes except CopySource enum values. CopySource enum values changes aren't required but I did like I did for CopyDest enum values. I changed COPY_ prefix to COPY_SOURCE_ prefix. For example, COPY_FILE to COPY_SOURCE_FILE. Note that this change isn't enough to implement a custom COPY FROM format handler as extension. We'll do the followings in a subsequent commit: 1. Add an opaque space for custom COPY FROM format handler 2. Export CopyReadBinaryData() to read the next data --- src/backend/commands/copyfrom.c | 4 +- src/backend/commands/copyfromparse.c | 10 +- src/include/commands/copy.h | 2 - src/include/commands/copyapi.h | 156 ++++++++++++++++++++++- src/include/commands/copyfrom_internal.h | 150 ---------------------- 5 files changed, 162 insertions(+), 160 deletions(-) diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index de85e4e9f1..b5f1771ac2 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -1705,7 +1705,7 @@ BeginCopyFrom(ParseState *pstate, pg_encoding_to_char(GetDatabaseEncoding())))); } - cstate->copy_src = COPY_FILE; /* default */ + cstate->copy_src = COPY_SOURCE_FILE; /* default */ cstate->whereClause = whereClause; @@ -1824,7 +1824,7 @@ BeginCopyFrom(ParseState *pstate, if (data_source_cb) { progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK; - cstate->copy_src = COPY_CALLBACK; + cstate->copy_src = COPY_SOURCE_CALLBACK; cstate->data_source_cb = data_source_cb; } else if (pipe) diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index 49632f75e4..a78a790060 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -181,7 +181,7 @@ ReceiveCopyBegin(CopyFromState cstate) for (i = 0; i < natts; i++) pq_sendint16(&buf, format); /* per-column formats */ pq_endmessage(&buf); - cstate->copy_src = COPY_FRONTEND; + cstate->copy_src = COPY_SOURCE_FRONTEND; cstate->fe_msgbuf = makeStringInfo(); /* We *must* flush here to ensure FE knows it can send. */ pq_flush(); @@ -249,7 +249,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread) switch (cstate->copy_src) { - case COPY_FILE: + case COPY_SOURCE_FILE: bytesread = fread(databuf, 1, maxread, cstate->copy_file); if (ferror(cstate->copy_file)) ereport(ERROR, @@ -258,7 +258,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread) if (bytesread == 0) cstate->raw_reached_eof = true; break; - case COPY_FRONTEND: + case COPY_SOURCE_FRONTEND: while (maxread > 0 && bytesread < minread && !cstate->raw_reached_eof) { int avail; @@ -341,7 +341,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread) bytesread += avail; } break; - case COPY_CALLBACK: + case COPY_SOURCE_CALLBACK: bytesread = cstate->data_source_cb(databuf, minread, maxread); break; } @@ -1099,7 +1099,7 @@ CopyReadLine(CopyFromState cstate) * after \. up to the protocol end of copy data. (XXX maybe better * not to treat \. as special?) */ - if (cstate->copy_src == COPY_FRONTEND) + if (cstate->copy_src == COPY_SOURCE_FRONTEND) { int inbytes; diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h index df29d42555..cd41d32074 100644 --- a/src/include/commands/copy.h +++ b/src/include/commands/copy.h @@ -20,8 +20,6 @@ #include "parser/parse_node.h" #include "tcop/dest.h" -typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread); - extern void DoCopy(ParseState *pstate, const CopyStmt *stmt, int stmt_location, int stmt_len, uint64 *processed); diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index ef1bb201c2..b7e8f627bf 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -14,11 +14,12 @@ #ifndef COPYAPI_H #define COPYAPI_H +#include "commands/trigger.h" #include "executor/execdesc.h" #include "executor/tuptable.h" +#include "nodes/miscnodes.h" #include "nodes/parsenodes.h" -/* This is private in commands/copyfrom.c */ typedef struct CopyFromStateData *CopyFromState; typedef bool (*CopyFromProcessOption_function) (CopyFromState cstate, DefElem *defel); @@ -162,6 +163,159 @@ typedef struct CopyFormatOptions CopyToRoutine *to_routine; /* callback routines for COPY TO */ } CopyFormatOptions; + +/* + * Represents the different source cases we need to worry about at + * the bottom level + */ +typedef enum CopySource +{ + COPY_SOURCE_FILE, /* from file (or a piped program) */ + COPY_SOURCE_FRONTEND, /* from frontend */ + COPY_SOURCE_CALLBACK, /* from callback function */ +} CopySource; + +/* + * Represents the end-of-line terminator type of the input + */ +typedef enum EolType +{ + EOL_UNKNOWN, + EOL_NL, + EOL_CR, + EOL_CRNL, +} EolType; + +typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread); + +/* + * This struct contains all the state variables used throughout a COPY FROM + * operation. + */ +typedef struct CopyFromStateData +{ + /* low-level state data */ + CopySource copy_src; /* type of copy source */ + FILE *copy_file; /* used if copy_src == COPY_FILE */ + StringInfo fe_msgbuf; /* used if copy_src == COPY_FRONTEND */ + + EolType eol_type; /* EOL type of input */ + int file_encoding; /* file or remote side's character encoding */ + bool need_transcoding; /* file encoding diff from server? */ + Oid conversion_proc; /* encoding conversion function */ + + /* parameters from the COPY command */ + Relation rel; /* relation to copy from */ + List *attnumlist; /* integer list of attnums to copy */ + char *filename; /* filename, or NULL for STDIN */ + bool is_program; /* is 'filename' a program to popen? */ + copy_data_source_cb data_source_cb; /* function for reading data */ + + CopyFormatOptions opts; + bool *convert_select_flags; /* per-column CSV/TEXT CS flags */ + Node *whereClause; /* WHERE condition (or NULL) */ + + /* these are just for error messages, see CopyFromErrorCallback */ + const char *cur_relname; /* table name for error messages */ + uint64 cur_lineno; /* line number for error messages */ + const char *cur_attname; /* current att for error messages */ + const char *cur_attval; /* current att value for error messages */ + bool relname_only; /* don't output line number, att, etc. */ + + /* + * Working state + */ + MemoryContext copycontext; /* per-copy execution context */ + + AttrNumber num_defaults; /* count of att that are missing and have + * default value */ + FmgrInfo *in_functions; /* array of input functions for each attrs */ + Oid *typioparams; /* array of element types for in_functions */ + ErrorSaveContext *escontext; /* soft error trapper during in_functions + * execution */ + uint64 num_errors; /* total number of rows which contained soft + * errors */ + int *defmap; /* array of default att numbers related to + * missing att */ + ExprState **defexprs; /* array of default att expressions for all + * att */ + bool *defaults; /* if DEFAULT marker was found for + * corresponding att */ + bool volatile_defexprs; /* is any of defexprs volatile? */ + List *range_table; /* single element list of RangeTblEntry */ + List *rteperminfos; /* single element list of RTEPermissionInfo */ + ExprState *qualexpr; + + TransitionCaptureState *transition_capture; + + /* + * These variables are used to reduce overhead in COPY FROM. + * + * attribute_buf holds the separated, de-escaped text for each field of + * the current line. The CopyReadAttributes functions return arrays of + * pointers into this buffer. We avoid palloc/pfree overhead by re-using + * the buffer on each cycle. + * + * In binary COPY FROM, attribute_buf holds the binary data for the + * current field, but the usage is otherwise similar. + */ + StringInfoData attribute_buf; + + /* field raw data pointers found by COPY FROM */ + + int max_fields; + char **raw_fields; + + /* + * Similarly, line_buf holds the whole input line being processed. The + * input cycle is first to read the whole line into line_buf, and then + * extract the individual attribute fields into attribute_buf. line_buf + * is preserved unmodified so that we can display it in error messages if + * appropriate. (In binary mode, line_buf is not used.) + */ + StringInfoData line_buf; + bool line_buf_valid; /* contains the row being processed? */ + + /* + * input_buf holds input data, already converted to database encoding. + * + * In text mode, CopyReadLine parses this data sufficiently to locate line + * boundaries, then transfers the data to line_buf. We guarantee that + * there is a \0 at input_buf[input_buf_len] at all times. (In binary + * mode, input_buf is not used.) + * + * If encoding conversion is not required, input_buf is not a separate + * buffer but points directly to raw_buf. In that case, input_buf_len + * tracks the number of bytes that have been verified as valid in the + * database encoding, and raw_buf_len is the total number of bytes stored + * in the buffer. + */ +#define INPUT_BUF_SIZE 65536 /* we palloc INPUT_BUF_SIZE+1 bytes */ + char *input_buf; + int input_buf_index; /* next byte to process */ + int input_buf_len; /* total # of bytes stored */ + bool input_reached_eof; /* true if we reached EOF */ + bool input_reached_error; /* true if a conversion error happened */ + /* Shorthand for number of unconsumed bytes available in input_buf */ +#define INPUT_BUF_BYTES(cstate) ((cstate)->input_buf_len - (cstate)->input_buf_index) + + /* + * raw_buf holds raw input data read from the data source (file or client + * connection), not yet converted to the database encoding. Like with + * 'input_buf', we guarantee that there is a \0 at raw_buf[raw_buf_len]. + */ +#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */ + char *raw_buf; + int raw_buf_index; /* next byte to process */ + int raw_buf_len; /* total # of bytes stored */ + bool raw_reached_eof; /* true if we reached EOF */ + + /* Shorthand for number of unconsumed bytes available in raw_buf */ +#define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index) + + uint64 bytes_processed; /* number of bytes processed so far */ +} CopyFromStateData; + /* * Represents the different dest cases we need to worry about at * the bottom level diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h index 921c1513f7..f8f6120255 100644 --- a/src/include/commands/copyfrom_internal.h +++ b/src/include/commands/copyfrom_internal.h @@ -18,28 +18,6 @@ #include "commands/trigger.h" #include "nodes/miscnodes.h" -/* - * Represents the different source cases we need to worry about at - * the bottom level - */ -typedef enum CopySource -{ - COPY_FILE, /* from file (or a piped program) */ - COPY_FRONTEND, /* from frontend */ - COPY_CALLBACK, /* from callback function */ -} CopySource; - -/* - * Represents the end-of-line terminator type of the input - */ -typedef enum EolType -{ - EOL_UNKNOWN, - EOL_NL, - EOL_CR, - EOL_CRNL, -} EolType; - /* * Represents the insert method to be used during COPY FROM. */ @@ -52,134 +30,6 @@ typedef enum CopyInsertMethod * ExecForeignBatchInsert only if valid */ } CopyInsertMethod; -/* - * This struct contains all the state variables used throughout a COPY FROM - * operation. - */ -typedef struct CopyFromStateData -{ - /* low-level state data */ - CopySource copy_src; /* type of copy source */ - FILE *copy_file; /* used if copy_src == COPY_FILE */ - StringInfo fe_msgbuf; /* used if copy_src == COPY_FRONTEND */ - - EolType eol_type; /* EOL type of input */ - int file_encoding; /* file or remote side's character encoding */ - bool need_transcoding; /* file encoding diff from server? */ - Oid conversion_proc; /* encoding conversion function */ - - /* parameters from the COPY command */ - Relation rel; /* relation to copy from */ - List *attnumlist; /* integer list of attnums to copy */ - char *filename; /* filename, or NULL for STDIN */ - bool is_program; /* is 'filename' a program to popen? */ - copy_data_source_cb data_source_cb; /* function for reading data */ - - CopyFormatOptions opts; - bool *convert_select_flags; /* per-column CSV/TEXT CS flags */ - Node *whereClause; /* WHERE condition (or NULL) */ - - /* these are just for error messages, see CopyFromErrorCallback */ - const char *cur_relname; /* table name for error messages */ - uint64 cur_lineno; /* line number for error messages */ - const char *cur_attname; /* current att for error messages */ - const char *cur_attval; /* current att value for error messages */ - bool relname_only; /* don't output line number, att, etc. */ - - /* - * Working state - */ - MemoryContext copycontext; /* per-copy execution context */ - - AttrNumber num_defaults; /* count of att that are missing and have - * default value */ - FmgrInfo *in_functions; /* array of input functions for each attrs */ - Oid *typioparams; /* array of element types for in_functions */ - ErrorSaveContext *escontext; /* soft error trapper during in_functions - * execution */ - uint64 num_errors; /* total number of rows which contained soft - * errors */ - int *defmap; /* array of default att numbers related to - * missing att */ - ExprState **defexprs; /* array of default att expressions for all - * att */ - bool *defaults; /* if DEFAULT marker was found for - * corresponding att */ - bool volatile_defexprs; /* is any of defexprs volatile? */ - List *range_table; /* single element list of RangeTblEntry */ - List *rteperminfos; /* single element list of RTEPermissionInfo */ - ExprState *qualexpr; - - TransitionCaptureState *transition_capture; - - /* - * These variables are used to reduce overhead in COPY FROM. - * - * attribute_buf holds the separated, de-escaped text for each field of - * the current line. The CopyReadAttributes functions return arrays of - * pointers into this buffer. We avoid palloc/pfree overhead by re-using - * the buffer on each cycle. - * - * In binary COPY FROM, attribute_buf holds the binary data for the - * current field, but the usage is otherwise similar. - */ - StringInfoData attribute_buf; - - /* field raw data pointers found by COPY FROM */ - - int max_fields; - char **raw_fields; - - /* - * Similarly, line_buf holds the whole input line being processed. The - * input cycle is first to read the whole line into line_buf, and then - * extract the individual attribute fields into attribute_buf. line_buf - * is preserved unmodified so that we can display it in error messages if - * appropriate. (In binary mode, line_buf is not used.) - */ - StringInfoData line_buf; - bool line_buf_valid; /* contains the row being processed? */ - - /* - * input_buf holds input data, already converted to database encoding. - * - * In text mode, CopyReadLine parses this data sufficiently to locate line - * boundaries, then transfers the data to line_buf. We guarantee that - * there is a \0 at input_buf[input_buf_len] at all times. (In binary - * mode, input_buf is not used.) - * - * If encoding conversion is not required, input_buf is not a separate - * buffer but points directly to raw_buf. In that case, input_buf_len - * tracks the number of bytes that have been verified as valid in the - * database encoding, and raw_buf_len is the total number of bytes stored - * in the buffer. - */ -#define INPUT_BUF_SIZE 65536 /* we palloc INPUT_BUF_SIZE+1 bytes */ - char *input_buf; - int input_buf_index; /* next byte to process */ - int input_buf_len; /* total # of bytes stored */ - bool input_reached_eof; /* true if we reached EOF */ - bool input_reached_error; /* true if a conversion error happened */ - /* Shorthand for number of unconsumed bytes available in input_buf */ -#define INPUT_BUF_BYTES(cstate) ((cstate)->input_buf_len - (cstate)->input_buf_index) - - /* - * raw_buf holds raw input data read from the data source (file or client - * connection), not yet converted to the database encoding. Like with - * 'input_buf', we guarantee that there is a \0 at raw_buf[raw_buf_len]. - */ -#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */ - char *raw_buf; - int raw_buf_index; /* next byte to process */ - int raw_buf_len; /* total # of bytes stored */ - bool raw_reached_eof; /* true if we reached EOF */ - - /* Shorthand for number of unconsumed bytes available in raw_buf */ -#define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index) - - uint64 bytes_processed; /* number of bytes processed so far */ -} CopyFromStateData; - extern void ReceiveCopyBegin(CopyFromState cstate); extern void ReceiveCopyBinaryHeader(CopyFromState cstate); -- 2.43.0 From f3cebe8b095f25c5c9bbeb5915c3a4233c45796c Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Wed, 24 Jan 2024 14:19:08 +0900 Subject: [PATCH v6 8/8] Add support for implementing custom COPY FROM format as extension * Add CopyFromStateData::opaque that can be used to keep data for custom COPY From format implementation * Export CopyReadBinaryData() to read the next data * Rename CopyReadBinaryData() to CopyFromStateRead() because it's a method for CopyFromState and "BinaryData" is redundant. --- src/backend/commands/copyfromparse.c | 21 ++++++++++----------- src/include/commands/copyapi.h | 5 +++++ 2 files changed, 15 insertions(+), 11 deletions(-) diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index a78a790060..f8a194635d 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -165,7 +165,6 @@ static int CopyGetData(CopyFromState cstate, void *databuf, static inline bool CopyGetInt32(CopyFromState cstate, int32 *val); static inline bool CopyGetInt16(CopyFromState cstate, int16 *val); static void CopyLoadInputBuf(CopyFromState cstate); -static int CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes); void ReceiveCopyBegin(CopyFromState cstate) @@ -194,7 +193,7 @@ ReceiveCopyBinaryHeader(CopyFromState cstate) int32 tmp; /* Signature */ - if (CopyReadBinaryData(cstate, readSig, 11) != 11 || + if (CopyFromStateRead(cstate, readSig, 11) != 11 || memcmp(readSig, BinarySignature, 11) != 0) ereport(ERROR, (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), @@ -222,7 +221,7 @@ ReceiveCopyBinaryHeader(CopyFromState cstate) /* Skip extension header, if present */ while (tmp-- > 0) { - if (CopyReadBinaryData(cstate, readSig, 1) != 1) + if (CopyFromStateRead(cstate, readSig, 1) != 1) ereport(ERROR, (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), errmsg("invalid COPY file header (wrong length)"))); @@ -364,7 +363,7 @@ CopyGetInt32(CopyFromState cstate, int32 *val) { uint32 buf; - if (CopyReadBinaryData(cstate, (char *) &buf, sizeof(buf)) != sizeof(buf)) + if (CopyFromStateRead(cstate, (char *) &buf, sizeof(buf)) != sizeof(buf)) { *val = 0; /* suppress compiler warning */ return false; @@ -381,7 +380,7 @@ CopyGetInt16(CopyFromState cstate, int16 *val) { uint16 buf; - if (CopyReadBinaryData(cstate, (char *) &buf, sizeof(buf)) != sizeof(buf)) + if (CopyFromStateRead(cstate, (char *) &buf, sizeof(buf)) != sizeof(buf)) { *val = 0; /* suppress compiler warning */ return false; @@ -692,14 +691,14 @@ CopyLoadInputBuf(CopyFromState cstate) } /* - * CopyReadBinaryData + * CopyFromStateRead * * Reads up to 'nbytes' bytes from cstate->copy_file via cstate->raw_buf * and writes them to 'dest'. Returns the number of bytes read (which * would be less than 'nbytes' only if we reach EOF). */ -static int -CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes) +int +CopyFromStateRead(CopyFromState cstate, char *dest, int nbytes) { int copied_bytes = 0; @@ -988,7 +987,7 @@ CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, */ char dummy; - if (CopyReadBinaryData(cstate, &dummy, 1) > 0) + if (CopyFromStateRead(cstate, &dummy, 1) > 0) ereport(ERROR, (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), errmsg("received copy data after EOF marker"))); @@ -1997,8 +1996,8 @@ CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo, resetStringInfo(&cstate->attribute_buf); enlargeStringInfo(&cstate->attribute_buf, fld_size); - if (CopyReadBinaryData(cstate, cstate->attribute_buf.data, - fld_size) != fld_size) + if (CopyFromStateRead(cstate, cstate->attribute_buf.data, + fld_size) != fld_size) ereport(ERROR, (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), errmsg("unexpected EOF in COPY data"))); diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index b7e8f627bf..22accc83ab 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -314,8 +314,13 @@ typedef struct CopyFromStateData #define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index) uint64 bytes_processed; /* number of bytes processed so far */ + + /* For custom format implementation */ + void *opaque; /* private space */ } CopyFromStateData; +extern int CopyFromStateRead(CopyFromState cstate, char *dest, int nbytes); + /* * Represents the different dest cases we need to worry about at * the bottom level -- 2.43.0
On Wed, Jan 24, 2024 at 02:49:36PM +0900, Sutou Kouhei wrote:
> For COPY TO:
>
> 0001: This adds CopyToRoutine and use it for text/csv/binary
> formats. No implementation change. This just move codes.
10M without this change:
    format,elapsed time (ms)
    text,1090.763
    csv,1136.103
    binary,1137.141
10M with this change:
    format,elapsed time (ms)
    text,1082.654
    csv,1196.991
    binary,1069.697
These numbers point out that binary is faster by 6%, csv is slower by
5%, while text stays around what looks like noise range.  That's not
negligible.  Are these numbers reproducible?  If they are, that could
be a problem for anybody doing bulk-loading of large data sets.  I am
not sure to understand where the improvement for binary comes from by
reading the patch, but perhaps perf would tell more for each format?
The loss with csv could be blamed on the extra manipulations of the
function pointers, likely.
--
Michael
			
		Вложения
On 2024-01-24 We 03:11, Michael Paquier wrote: > On Wed, Jan 24, 2024 at 02:49:36PM +0900, Sutou Kouhei wrote: >> For COPY TO: >> >> 0001: This adds CopyToRoutine and use it for text/csv/binary >> formats. No implementation change. This just move codes. > 10M without this change: > > format,elapsed time (ms) > text,1090.763 > csv,1136.103 > binary,1137.141 > > 10M with this change: > > format,elapsed time (ms) > text,1082.654 > csv,1196.991 > binary,1069.697 > > These numbers point out that binary is faster by 6%, csv is slower by > 5%, while text stays around what looks like noise range. That's not > negligible. Are these numbers reproducible? If they are, that could > be a problem for anybody doing bulk-loading of large data sets. I am > not sure to understand where the improvement for binary comes from by > reading the patch, but perhaps perf would tell more for each format? > The loss with csv could be blamed on the extra manipulations of the > function pointers, likely. I don't think that's at all acceptable. We've spent quite a lot of blood sweat and tears over the years to make COPY fast, and we should not sacrifice any of that lightly. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Hi, In <10025bac-158c-ffe7-fbec-32b42629121f@dunslane.net> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Wed, 24 Jan 2024 07:15:55 -0500, Andrew Dunstan <andrew@dunslane.net> wrote: > > On 2024-01-24 We 03:11, Michael Paquier wrote: >> On Wed, Jan 24, 2024 at 02:49:36PM +0900, Sutou Kouhei wrote: >>> For COPY TO: >>> >>> 0001: This adds CopyToRoutine and use it for text/csv/binary >>> formats. No implementation change. This just move codes. >> 10M without this change: >> >> format,elapsed time (ms) >> text,1090.763 >> csv,1136.103 >> binary,1137.141 >> >> 10M with this change: >> >> format,elapsed time (ms) >> text,1082.654 >> csv,1196.991 >> binary,1069.697 >> >> These numbers point out that binary is faster by 6%, csv is slower by >> 5%, while text stays around what looks like noise range. That's not >> negligible. Are these numbers reproducible? If they are, that could >> be a problem for anybody doing bulk-loading of large data sets. I am >> not sure to understand where the improvement for binary comes from by >> reading the patch, but perhaps perf would tell more for each format? >> The loss with csv could be blamed on the extra manipulations of the >> function pointers, likely. > > > I don't think that's at all acceptable. > > We've spent quite a lot of blood sweat and tears over the years to make COPY > fast, and we should not sacrifice any of that lightly. These numbers aren't reproducible. Because these benchmarks executed on my normal machine not a machine only for benchmarking. The machine runs another processes such as editor and Web browser. For example, here are some results with master (94edfe250c6a200d2067b0debfe00b4122e9b11e): Format,N records,Elapsed time (ms) csv,10000000,1073.715 csv,10000000,1022.830 csv,10000000,1073.584 csv,10000000,1090.651 csv,10000000,1052.259 Here are some results with master + the 0001 patch: Format,N records,Elapsed time (ms) csv,10000000,1025.356 csv,10000000,1067.202 csv,10000000,1014.563 csv,10000000,1032.088 csv,10000000,1058.110 I uploaded my benchmark script so that you can run the same benchmark on your machine: https://gist.github.com/kou/be02e02e5072c91969469dbf137b5de5 Could anyone try the benchmark with master and master+0001? Thanks, -- kou
Hi, In <20240124.144936.67229716500876806.kou@clear-code.com> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Wed, 24 Jan 2024 14:49:36 +0900 (JST), Sutou Kouhei <kou@clear-code.com> wrote: > I've implemented custom COPY format feature based on the > current design discussion. See the attached patches for > details. I forgot to mention one note. Documentation isn't included in these patches. I'll write it after all (or some) patches are merged. Is it OK? Thanks, -- kou
On Wed, Jan 24, 2024 at 10:17 PM Sutou Kouhei <kou@clear-code.com> wrote: > > I uploaded my benchmark script so that you can run the same > benchmark on your machine: > > https://gist.github.com/kou/be02e02e5072c91969469dbf137b5de5 > > Could anyone try the benchmark with master and master+0001? > sorry. I made a mistake. I applied v6, 0001 to 0008 all the patches. my tests: CREATE unlogged TABLE data (a bigint); SELECT setseed(0.29); INSERT INTO data SELECT random() * 10000 FROM generate_series(1, 1e7); my setup: meson setup --reconfigure ${BUILD} \ -Dprefix=${PG_PREFIX} \ -Dpgport=5462 \ -Dbuildtype=release \ -Ddocs_html_style=website \ -Ddocs_pdf=disabled \ -Dllvm=disabled \ -Dextra_version=_release_build gcc version: PostgreSQL 17devel_release_build on x86_64-linux, compiled by gcc-11.4.0, 64-bit apply your patch: COPY data TO '/dev/null' WITH (FORMAT csv) \watch count=5 Time: 668.996 ms Time: 596.254 ms Time: 592.723 ms Time: 591.663 ms Time: 590.803 ms not apply your patch, at git 729439607ad210dbb446e31754e8627d7e3f7dda COPY data TO '/dev/null' WITH (FORMAT csv) \watch count=5 Time: 644.246 ms Time: 583.075 ms Time: 568.670 ms Time: 569.463 ms Time: 569.201 ms I forgot to test other formats.
On Wed, Jan 24, 2024 at 11:17:26PM +0900, Sutou Kouhei wrote: > In <10025bac-158c-ffe7-fbec-32b42629121f@dunslane.net> > "Re: Make COPY format extendable: Extract COPY TO format implementations" on Wed, 24 Jan 2024 07:15:55 -0500, > Andrew Dunstan <andrew@dunslane.net> wrote: >> We've spent quite a lot of blood sweat and tears over the years to make COPY >> fast, and we should not sacrifice any of that lightly. Clearly. > I uploaded my benchmark script so that you can run the same > benchmark on your machine: > > https://gist.github.com/kou/be02e02e5072c91969469dbf137b5de5 Thanks, that saves time. I am attaching it to this email as well, for the sake of the archives if this link is removed in the future. > Could anyone try the benchmark with master and master+0001? Yep. It is one point we need to settle before deciding what to do with this patch set, and I've done so to reach my own conclusion. I have a rather good machine at my disposal in the cloud, so I did a few runs with HEAD and HEAD+0001, with PGDATA mounted on a tmpfs. Here are some results for the 10M row case, as these should be the least prone to noise, 5 runs each: master text 10M 1732.570 1684.542 1693.430 1687.696 1714.845 csv 10M 1729.113 1724.926 1727.414 1726.237 1728.865 bin 10M 1679.097 1677.887 1676.764 1677.554 1678.120 master+0001 text 10M 1702.207 1654.818 1647.069 1690.568 1654.446 csv 10M 1764.939 1714.313 1712.444 1712.323 1716.952 bin 10M 1703.061 1702.719 1702.234 1703.346 1704.137 Hmm. The point of contention in the patch is the change to use the CopyToOneRow callback in CopyOneRowTo(), as we go through it for each row and we should habe a worst-case scenario with a relation that has a small attribute size. The more rows, the more effect it would have. The memory context switches and the StringInfo manipulations are equally important, and there are a bunch of the latter, actually, with optimizations around fe_msgbuf. I've repeated a few runs across these two builds, and there is some variance and noise, but I am going to agree with your point that the effect 0001 cannot be seen. Even HEAD is showing some noise. So I am discarding the concerns I had after seeing the numbers you posted upthread. +typedef bool (*CopyToProcessOption_function) (CopyToState cstate, DefElem *defel); +typedef int16 (*CopyToGetFormat_function) (CopyToState cstate); +typedef void (*CopyToStart_function) (CopyToState cstate, TupleDesc tupDesc); +typedef void (*CopyToOneRow_function) (CopyToState cstate, TupleTableSlot *slot); +typedef void (*CopyToEnd_function) (CopyToState cstate); We don't really need a set of typedefs here, let's put the definitions in the CopyToRoutine struct instead. +extern CopyToRoutine CopyToRoutineText; +extern CopyToRoutine CopyToRoutineCSV; +extern CopyToRoutine CopyToRoutineBinary; All that should IMO remain in copyto.c and copyfrom.c in the initial patch doing the refactoring. Why not using a fetch function instead that uses a string in input? Then you can call that once after parsing the List of options in ProcessCopyOptions(). Introducing copyapi.h in the initial patch makes sense here for the TO and FROM routines. +/* All "text" and "csv" options are parsed in ProcessCopyOptions(). We may + * move the code to here later. */ Some areas, like this comment, are written in an incorrect format. + if (cstate->opts.csv_mode) + CopyAttributeOutCSV(cstate, colname, false, + list_length(cstate->attnumlist) == 1); + else + CopyAttributeOutText(cstate, colname); You are right that this is not worth the trouble of creating a different set of callbacks for CSV. This makes the result cleaner. + getTypeBinaryOutputInfo(attr->atttypid, &out_func_oid, &isvarlena); + fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]); Actually, this split is interesting. It is possible for a custom format to plug in a custom set of out functions. Did you make use of something custom for your own stuff? Actually, could it make sense to split the assignment of cstate->out_functions into its own callback? Sure, that's part of the start phase, but at least it would make clear that a custom method *has* to assign these OIDs to work. The patch implies that as a rule, without a comment that CopyToStart *must* set up these OIDs. I think that 0001 and 0005 should be handled first, as pieces independent of the rest. Then we could move on with 0002~0004 and 0006~0008. -- Michael
Вложения
On Thu, Jan 25, 2024 at 10:53:58AM +0800, jian he wrote: > apply your patch: > COPY data TO '/dev/null' WITH (FORMAT csv) \watch count=5 > Time: 668.996 ms > Time: 596.254 ms > Time: 592.723 ms > Time: 591.663 ms > Time: 590.803 ms > > not apply your patch, at git 729439607ad210dbb446e31754e8627d7e3f7dda > COPY data TO '/dev/null' WITH (FORMAT csv) \watch count=5 > Time: 644.246 ms > Time: 583.075 ms > Time: 568.670 ms > Time: 569.463 ms > Time: 569.201 ms > > I forgot to test other formats. There can be some variance in the tests, so you'd better run much more tests so as you can get a better idea of the mean. Discarding the N highest and lowest values also reduces slightly the effects of the noise you would get across single runs. -- Michael
Вложения
On Wed, Jan 24, 2024 at 11:17 PM Sutou Kouhei <kou@clear-code.com> wrote: > > Hi, > > In <10025bac-158c-ffe7-fbec-32b42629121f@dunslane.net> > "Re: Make COPY format extendable: Extract COPY TO format implementations" on Wed, 24 Jan 2024 07:15:55 -0500, > Andrew Dunstan <andrew@dunslane.net> wrote: > > > > > On 2024-01-24 We 03:11, Michael Paquier wrote: > >> On Wed, Jan 24, 2024 at 02:49:36PM +0900, Sutou Kouhei wrote: > >>> For COPY TO: > >>> > >>> 0001: This adds CopyToRoutine and use it for text/csv/binary > >>> formats. No implementation change. This just move codes. > >> 10M without this change: > >> > >> format,elapsed time (ms) > >> text,1090.763 > >> csv,1136.103 > >> binary,1137.141 > >> > >> 10M with this change: > >> > >> format,elapsed time (ms) > >> text,1082.654 > >> csv,1196.991 > >> binary,1069.697 > >> > >> These numbers point out that binary is faster by 6%, csv is slower by > >> 5%, while text stays around what looks like noise range. That's not > >> negligible. Are these numbers reproducible? If they are, that could > >> be a problem for anybody doing bulk-loading of large data sets. I am > >> not sure to understand where the improvement for binary comes from by > >> reading the patch, but perhaps perf would tell more for each format? > >> The loss with csv could be blamed on the extra manipulations of the > >> function pointers, likely. > > > > > > I don't think that's at all acceptable. > > > > We've spent quite a lot of blood sweat and tears over the years to make COPY > > fast, and we should not sacrifice any of that lightly. > > These numbers aren't reproducible. Because these benchmarks > executed on my normal machine not a machine only for > benchmarking. The machine runs another processes such as > editor and Web browser. > > For example, here are some results with master > (94edfe250c6a200d2067b0debfe00b4122e9b11e): > > Format,N records,Elapsed time (ms) > csv,10000000,1073.715 > csv,10000000,1022.830 > csv,10000000,1073.584 > csv,10000000,1090.651 > csv,10000000,1052.259 > > Here are some results with master + the 0001 patch: > > Format,N records,Elapsed time (ms) > csv,10000000,1025.356 > csv,10000000,1067.202 > csv,10000000,1014.563 > csv,10000000,1032.088 > csv,10000000,1058.110 > > > I uploaded my benchmark script so that you can run the same > benchmark on your machine: > > https://gist.github.com/kou/be02e02e5072c91969469dbf137b5de5 > > Could anyone try the benchmark with master and master+0001? > I've run a similar scenario: create unlogged table test (a int); insert into test select c from generate_series(1, 25000000) c; copy test to '/tmp/result.csv' with (format csv); -- generates 230MB file I've run it on HEAD and HEAD+0001 patch and here are the medians of 10 executions for each format: HEAD: binary 2930.353 ms text 2754.852 ms csv 2890.012 ms HEAD w/ 0001 patch: binary 2814.838 ms text 2900.845 ms csv 3015.210 ms Hmm I can see a similar trend that Suto-san had; the binary format got slightly faster whereas both text and csv format has small regression (4%~5%). I think that the improvement for binary came from the fact that we removed "if (cstate->opts.binary)" branches from the original CopyOneRowTo(). I've experimented with a similar optimization for csv and text format; have different callbacks for text and csv format and remove "if (cstate->opts.csv_mode)" branches. I've attached a patch for that. Here are results: HEAD w/ 0001 patch + remove branches: binary 2824.502 ms text 2715.264 ms csv 2803.381 ms The numbers look better now. I'm not sure these are within a noise range but it might be worth considering having different callbacks for text and csv formats. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Вложения
On Thu, Jan 25, 2024 at 01:36:03PM +0900, Masahiko Sawada wrote: > Hmm I can see a similar trend that Suto-san had; the binary format got > slightly faster whereas both text and csv format has small regression > (4%~5%). I think that the improvement for binary came from the fact > that we removed "if (cstate->opts.binary)" branches from the original > CopyOneRowTo(). I've experimented with a similar optimization for csv > and text format; have different callbacks for text and csv format and > remove "if (cstate->opts.csv_mode)" branches. I've attached a patch > for that. Here are results: > > HEAD w/ 0001 patch + remove branches: > binary 2824.502 ms > text 2715.264 ms > csv 2803.381 ms > > The numbers look better now. I'm not sure these are within a noise > range but it might be worth considering having different callbacks for > text and csv formats. Interesting. Your numbers imply a 0.3% speedup for text, 0.7% speedup for csv and 0.9% speedup for binary, which may be around the noise range assuming a ~1% range. While this does not imply a regression, that seems worth the duplication IMO. The patch had better document the reason why the split is done, as well. CopyFromTextOneRow() has also specific branches for binary and non-binary removed in 0005, so assuming that I/O is not a bottleneck, the operation would be faster because we would not evaluate this "if" condition for each row. Wouldn't we also see improvements for COPY FROM with short row values, say when mounting PGDATA into a tmpfs/ramfs? -- Michael
Вложения
On Thu, Jan 25, 2024 at 1:53 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Thu, Jan 25, 2024 at 01:36:03PM +0900, Masahiko Sawada wrote: > > Hmm I can see a similar trend that Suto-san had; the binary format got > > slightly faster whereas both text and csv format has small regression > > (4%~5%). I think that the improvement for binary came from the fact > > that we removed "if (cstate->opts.binary)" branches from the original > > CopyOneRowTo(). I've experimented with a similar optimization for csv > > and text format; have different callbacks for text and csv format and > > remove "if (cstate->opts.csv_mode)" branches. I've attached a patch > > for that. Here are results: > > > > HEAD w/ 0001 patch + remove branches: > > binary 2824.502 ms > > text 2715.264 ms > > csv 2803.381 ms > > > > The numbers look better now. I'm not sure these are within a noise > > range but it might be worth considering having different callbacks for > > text and csv formats. > > Interesting. > > Your numbers imply a 0.3% speedup for text, 0.7% speedup for csv and > 0.9% speedup for binary, which may be around the noise range assuming > a ~1% range. While this does not imply a regression, that seems worth > the duplication IMO. Agreed. In addition to that, now that each format routine has its own callbacks, there would be chances that we can do other optimizations dedicated to the format type in the future if available. > The patch had better document the reason why the > split is done, as well. +1 > > CopyFromTextOneRow() has also specific branches for binary and > non-binary removed in 0005, so assuming that I/O is not a bottleneck, > the operation would be faster because we would not evaluate this "if" > condition for each row. Wouldn't we also see improvements for COPY > FROM with short row values, say when mounting PGDATA into a > tmpfs/ramfs? Probably. Seems worth evaluating. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Hi, Thanks for trying these patches! In <CACJufxF9NS3xQ2d79jN0V1CGvF7cR16uJo-C3nrY7vZrwvxF7w@mail.gmail.com> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Thu, 25 Jan 2024 10:53:58 +0800, jian he <jian.universality@gmail.com> wrote: > COPY data TO '/dev/null' WITH (FORMAT csv) \watch count=5 Wow! I didn't know the "\watch count="! I'll use it. > Time: 668.996 ms > Time: 596.254 ms > Time: 592.723 ms > Time: 591.663 ms > Time: 590.803 ms It seems that 5 times isn't enough for this case as Michael said. But thanks for trying! Thanks, -- kou
Hi,
In <ZbHS439y-Bs6HIAR@paquier.xyz>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Thu, 25 Jan 2024 12:17:55 +0900,
  Michael Paquier <michael@paquier.xyz> wrote:
> +typedef bool (*CopyToProcessOption_function) (CopyToState cstate, DefElem *defel);
> +typedef int16 (*CopyToGetFormat_function) (CopyToState cstate);
> +typedef void (*CopyToStart_function) (CopyToState cstate, TupleDesc tupDesc);
> +typedef void (*CopyToOneRow_function) (CopyToState cstate, TupleTableSlot *slot);
> +typedef void (*CopyToEnd_function) (CopyToState cstate);
> 
> We don't really need a set of typedefs here, let's put the definitions
> in the CopyToRoutine struct instead.
OK. I'll do it.
> +extern CopyToRoutine CopyToRoutineText;
> +extern CopyToRoutine CopyToRoutineCSV;
> +extern CopyToRoutine CopyToRoutineBinary;
> 
> All that should IMO remain in copyto.c and copyfrom.c in the initial
> patch doing the refactoring.  Why not using a fetch function instead
> that uses a string in input?  Then you can call that once after
> parsing the List of options in ProcessCopyOptions().
OK. How about the following for the fetch function
signature?
extern CopyToRoutine *GetBuiltinCopyToRoutine(const char *format);
We may introduce an enum and use it:
typedef enum CopyBuiltinFormat
{
    COPY_BUILTIN_FORMAT_TEXT = 0,
    COPY_BUILTIN_FORMAT_CSV,
    COPY_BUILTIN_FORMAT_BINARY,
} CopyBuiltinFormat;
extern CopyToRoutine *GetBuiltinCopyToRoutine(CopyBuiltinFormat format);
> +/* All "text" and "csv" options are parsed in ProcessCopyOptions(). We may
> + * move the code to here later. */
> Some areas, like this comment, are written in an incorrect format.
Oh, sorry. I assumed that the comment style was adjusted by
pgindent.
I'll use the following style:
/*
 * ...
 */
> +    getTypeBinaryOutputInfo(attr->atttypid, &out_func_oid, &isvarlena);
> +    fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
> 
> Actually, this split is interesting.  It is possible for a custom
> format to plug in a custom set of out functions.  Did you make use of
> something custom for your own stuff?
I didn't. My PoC custom COPY format handler for Apache Arrow
just handles integer and text for now. It doesn't use
cstate->out_functions because cstate->out_functions may not
return a valid binary format value for Apache Arrow. So it
formats each value by itself.
I'll chose one of them for a custom type (that isn't
supported by Apache Arrow, e.g. PostGIS types):
1. Report an unsupported error
2. Call output function for Apache Arrow provided by the
   custom type
>                                       Actually, could it make sense to
> split the assignment of cstate->out_functions into its own callback?
Yes. Because we need to use getTypeBinaryOutputInfo() for
"binary" and use getTypeOutputInfo() for "text" and "csv".
> Sure, that's part of the start phase, but at least it would make clear
> that a custom method *has* to assign these OIDs to work.  The patch
> implies that as a rule, without a comment that CopyToStart *must* set
> up these OIDs.
CopyToStart doesn't need to set up them if the handler
doesn't use cstate->out_functions.
> I think that 0001 and 0005 should be handled first, as pieces
> independent of the rest.  Then we could move on with 0002~0004 and
> 0006~0008.
OK. I'll focus on 0001 and 0005 for now. I'll restart
0002-0004/0006-0008 after 0001 and 0005 are accepted.
Thanks,
-- 
kou
			
		Hi, In <CAD21AoALxEZz33NpcSk99ad_DT3A2oFNMa2KNjGBCMVFeCiUaA@mail.gmail.com> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Thu, 25 Jan 2024 13:36:03 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > I've experimented with a similar optimization for csv > and text format; have different callbacks for text and csv format and > remove "if (cstate->opts.csv_mode)" branches. I've attached a patch > for that. Here are results: > > HEAD w/ 0001 patch + remove branches: > binary 2824.502 ms > text 2715.264 ms > csv 2803.381 ms > > The numbers look better now. I'm not sure these are within a noise > range but it might be worth considering having different callbacks for > text and csv formats. Wow! Interesting. I tried the approach before but I didn't see any difference by the approach. But it may depend on my environment. I'll import the approach to the next patch set so that others can try the approach easily. Thanks, -- kou
On Thu, Jan 25, 2024 at 05:45:43PM +0900, Sutou Kouhei wrote:
> In <ZbHS439y-Bs6HIAR@paquier.xyz>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Thu, 25 Jan 2024 12:17:55 +0900,
>   Michael Paquier <michael@paquier.xyz> wrote:
>> +extern CopyToRoutine CopyToRoutineText;
>> +extern CopyToRoutine CopyToRoutineCSV;
>> +extern CopyToRoutine CopyToRoutineBinary;
>>
>> All that should IMO remain in copyto.c and copyfrom.c in the initial
>> patch doing the refactoring.  Why not using a fetch function instead
>> that uses a string in input?  Then you can call that once after
>> parsing the List of options in ProcessCopyOptions().
>
> OK. How about the following for the fetch function
> signature?
>
> extern CopyToRoutine *GetBuiltinCopyToRoutine(const char *format);
Or CopyToRoutineGet()?  I am not wedded to my suggestion, got a bad
history with naming things around here.
> We may introduce an enum and use it:
>
> typedef enum CopyBuiltinFormat
> {
>     COPY_BUILTIN_FORMAT_TEXT = 0,
>     COPY_BUILTIN_FORMAT_CSV,
>     COPY_BUILTIN_FORMAT_BINARY,
> } CopyBuiltinFormat;
>
> extern CopyToRoutine *GetBuiltinCopyToRoutine(CopyBuiltinFormat format);
I am not sure that this is necessary as the option value is a string.
> Oh, sorry. I assumed that the comment style was adjusted by
> pgindent.
No worries, that's just something we get used to.  I tend to fix a lot
of these things by myself when editing patches.
>> +    getTypeBinaryOutputInfo(attr->atttypid, &out_func_oid, &isvarlena);
>> +    fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
>>
>> Actually, this split is interesting.  It is possible for a custom
>> format to plug in a custom set of out functions.  Did you make use of
>> something custom for your own stuff?
>
> I didn't. My PoC custom COPY format handler for Apache Arrow
> just handles integer and text for now. It doesn't use
> cstate->out_functions because cstate->out_functions may not
> return a valid binary format value for Apache Arrow. So it
> formats each value by itself.
I mean, if you use a custom output function, you could tweak things
even more with byteas or such..  If a callback is expected to do
something, like setting the output function OIDs in the start
callback, we'd better document it rather than letting that be implied.
>>                                       Actually, could it make sense to
>> split the assignment of cstate->out_functions into its own callback?
>
> Yes. Because we need to use getTypeBinaryOutputInfo() for
> "binary" and use getTypeOutputInfo() for "text" and "csv".
Okay.  After sleeping on it, a split makes sense here, because it also
reduces the presence of TupleDesc in the start callback.
>> Sure, that's part of the start phase, but at least it would make clear
>> that a custom method *has* to assign these OIDs to work.  The patch
>> implies that as a rule, without a comment that CopyToStart *must* set
>> up these OIDs.
>
> CopyToStart doesn't need to set up them if the handler
> doesn't use cstate->out_functions.
Noted.
>> I think that 0001 and 0005 should be handled first, as pieces
>> independent of the rest.  Then we could move on with 0002~0004 and
>> 0006~0008.
>
> OK. I'll focus on 0001 and 0005 for now. I'll restart
> 0002-0004/0006-0008 after 0001 and 0005 are accepted.
Once you get these, I'd be interested in re-doing an evaluation of
COPY TO and more tests with COPY FROM while running Postgres on
scissors.  One thing I was thinking to use here is my blackhole_am for
COPY FROM:
https://github.com/michaelpq/pg_plugins/tree/main/blackhole_am
As per its name, it does nothing on INSERT, so you could create a
table using it as access method, and stress the COPY FROM execution
paths without having to mount Postgres on a tmpfs because the data is
sent to the void.  Perhaps it does not matter, but that moves the
tests to the bottlenecks we want to stress (aka the per-row callback
for large data sets).
I've switched the patch as waiting on author for now.  Thanks for your
perseverance here.  I understand that's not easy to follow up with
patches and reviews (^_^;)
--
Michael
			
		Вложения
On Thu, Jan 25, 2024 at 4:52 PM Sutou Kouhei <kou@clear-code.com> wrote: > > Hi, > > In <CAD21AoALxEZz33NpcSk99ad_DT3A2oFNMa2KNjGBCMVFeCiUaA@mail.gmail.com> > "Re: Make COPY format extendable: Extract COPY TO format implementations" on Thu, 25 Jan 2024 13:36:03 +0900, > Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > I've experimented with a similar optimization for csv > > and text format; have different callbacks for text and csv format and > > remove "if (cstate->opts.csv_mode)" branches. I've attached a patch > > for that. Here are results: > > > > HEAD w/ 0001 patch + remove branches: > > binary 2824.502 ms > > text 2715.264 ms > > csv 2803.381 ms > > > > The numbers look better now. I'm not sure these are within a noise > > range but it might be worth considering having different callbacks for > > text and csv formats. > > Wow! Interesting. I tried the approach before but I didn't > see any difference by the approach. But it may depend on my > environment. > > I'll import the approach to the next patch set so that > others can try the approach easily. > > > Thanks, > -- > kou Hi Kou-san, In the current implementation, there is no way that one can check incompatibility options in ProcessCopyOptions, we can postpone the check in CopyFromStart or CopyToStart, but I think it is a little bit late. Do you think adding an extra check for incompatible options hook is acceptable (PFA)? -- Regards Junwang Zhao
Вложения
Hi, In <CAEG8a3+-oG63GeG6v0L8EWi_8Fhuj9vJBhOteLxuBZwtun3GVA@mail.gmail.com> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 26 Jan 2024 16:18:14 +0800, Junwang Zhao <zhjwpku@gmail.com> wrote: > In the current implementation, there is no way that one can check > incompatibility > options in ProcessCopyOptions, we can postpone the check in CopyFromStart > or CopyToStart, but I think it is a little bit late. Do you think > adding an extra > check for incompatible options hook is acceptable (PFA)? Thanks for the suggestion! But I think that a custom handler can do it in CopyToProcessOption()/CopyFromProcessOption(). What do you think about this? Or could you share a sample COPY TO/FROM WITH() SQL you think? Thanks, -- kou
On Fri, Jan 26, 2024 at 4:32 PM Sutou Kouhei <kou@clear-code.com> wrote: > > Hi, > > In <CAEG8a3+-oG63GeG6v0L8EWi_8Fhuj9vJBhOteLxuBZwtun3GVA@mail.gmail.com> > "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 26 Jan 2024 16:18:14 +0800, > Junwang Zhao <zhjwpku@gmail.com> wrote: > > > In the current implementation, there is no way that one can check > > incompatibility > > options in ProcessCopyOptions, we can postpone the check in CopyFromStart > > or CopyToStart, but I think it is a little bit late. Do you think > > adding an extra > > check for incompatible options hook is acceptable (PFA)? > > Thanks for the suggestion! But I think that a custom handler > can do it in > CopyToProcessOption()/CopyFromProcessOption(). What do you > think about this? Or could you share a sample COPY TO/FROM > WITH() SQL you think? CopyToProcessOption()/CopyFromProcessOption() can only handle single option, and store the options in the opaque field, but it can not check the relation of two options, for example, considering json format, the `header` option can not be handled by these two functions. I want to find a way when the user specifies the header option, customer handler can error out. > > > Thanks, > -- > kou -- Regards Junwang Zhao
Hi, In <ZbLwNyOKbddno0Ue@paquier.xyz> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 26 Jan 2024 08:35:19 +0900, Michael Paquier <michael@paquier.xyz> wrote: >> OK. How about the following for the fetch function >> signature? >> >> extern CopyToRoutine *GetBuiltinCopyToRoutine(const char *format); > > Or CopyToRoutineGet()? I am not wedded to my suggestion, got a bad > history with naming things around here. Thanks for the suggestion. I rethink about this and use the following: +extern void ProcessCopyOptionFormatTo(ParseState *pstate, CopyFormatOptions *opts_out, DefElem *defel); It's not a fetch function. It sets CopyToRoutine opts_out instead. But it hides CopyToRoutine* to copyto.c. Is it acceptable? >> OK. I'll focus on 0001 and 0005 for now. I'll restart >> 0002-0004/0006-0008 after 0001 and 0005 are accepted. > > Once you get these, I'd be interested in re-doing an evaluation of > COPY TO and more tests with COPY FROM while running Postgres on > scissors. One thing I was thinking to use here is my blackhole_am for > COPY FROM: > https://github.com/michaelpq/pg_plugins/tree/main/blackhole_am Thanks! Could you evaluate the attached patch set with COPY FROM? I attach v7 patch set. It includes only the 0001 and 0005 parts in v6 patch set because we focus on them for now. 0001: This is based on 0001 in v6. Changes since v6: * Fix comment style * Hide CopyToRoutine{Text,CSV,Binary} * Add more comments * Eliminate "if (cstate->opts.csv_mode)" branches from "text" and "csv" callbacks * Remove CopyTo*_function typedefs * Update benchmark results in commit message but the results are measured on my environment that isn't suitable for accurate benchmark 0002: This is based on 0005 in v6. Changes since v6: * Fix comment style * Hide CopyFromRoutine{Text,CSV,Binary} * Add more comments * Eliminate a "if (cstate->opts.csv_mode)" branch from "text" and "csv" callbacks * NOTE: We can eliminate more "if (cstate->opts.csv_mode)" branches such as one in NextCopyFromRawFields(). Should we do it in this feature improvement (make COPY format extendable)? Can we defer this as a separated improvement? * Remove CopyFrom*_function typedefs Thanks, -- kou From 3e75129c2e9d9d34eebb6ef31b4fe6579f9eb02d Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Fri, 26 Jan 2024 16:46:51 +0900 Subject: [PATCH v7 1/2] Extract COPY TO format implementations This is a part of making COPY format extendable. See also these past discussions: * New Copy Formats - avro/orc/parquet: https://www.postgresql.org/message-id/flat/20180210151304.fonjztsynewldfba%40gmail.com * Make COPY extendable in order to support Parquet and other formats: https://www.postgresql.org/message-id/flat/CAJ7c6TM6Bz1c3F04Cy6%2BSzuWfKmr0kU8c_3Stnvh_8BR0D6k8Q%40mail.gmail.com This doesn't change the current behavior. This just introduces CopyToRoutine, which just has function pointers of format implementation like TupleTableSlotOps, and use it for existing "text", "csv" and "binary" format implementations. Note that CopyToRoutine can't be used from extensions yet because CopySend*() aren't exported yet. Extensions can't send formatted data to a destination without CopySend*(). They will be exported by subsequent patches. Here is a benchmark result with/without this change because there was a discussion that we should care about performance regression: https://www.postgresql.org/message-id/3741749.1655952719%40sss.pgh.pa.us > I think that step 1 ought to be to convert the existing formats into > plug-ins, and demonstrate that there's no significant loss of > performance. You can see that there is no significant loss of performance: Data: Random 32 bit integers: CREATE TABLE data (int32 integer); SELECT setseed(0.29); INSERT INTO data SELECT random() * 10000 FROM generate_series(1, ${n_records}); The number of records: 100K, 1M and 10M 100K without this change: format,elapsed time (ms) text,10.561 csv,10.868 binary,10.287 100K with this change: format,elapsed time (ms) text,9.962 csv,10.453 binary,9.473 1M without this change: format,elapsed time (ms) text,103.265 csv,109.789 binary,104.078 1M with this change: format,elapsed time (ms) text,98.612 csv,101.908 binary,94.456 10M without this change: format,elapsed time (ms) text,1060.614 csv,1065.272 binary,1025.875 10M with this change: format,elapsed time (ms) text,1020.050 csv,1031.279 binary,954.792 --- contrib/file_fdw/file_fdw.c | 2 +- src/backend/commands/copy.c | 71 ++-- src/backend/commands/copyfrom.c | 2 +- src/backend/commands/copyto.c | 551 +++++++++++++++++++++++--------- src/include/commands/copy.h | 8 +- src/include/commands/copyapi.h | 48 +++ 6 files changed, 505 insertions(+), 177 deletions(-) create mode 100644 src/include/commands/copyapi.h diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c index 249d82d3a0..9e4e819858 100644 --- a/contrib/file_fdw/file_fdw.c +++ b/contrib/file_fdw/file_fdw.c @@ -329,7 +329,7 @@ file_fdw_validator(PG_FUNCTION_ARGS) /* * Now apply the core COPY code's validation logic for more checks. */ - ProcessCopyOptions(NULL, NULL, true, other_options); + ProcessCopyOptions(NULL, NULL, true, NULL, other_options); /* * Either filename or program option is required for file_fdw foreign diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index cc0786c6f4..3676d1206d 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -442,6 +442,9 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from) * a list of options. In that usage, 'opts_out' can be passed as NULL and * the collected data is just leaked until CurrentMemoryContext is reset. * + * 'cstate' is CopyToState* for !is_from, CopyFromState* for is_from. 'cstate' + * may be NULL. For example, file_fdw uses NULL. + * * Note that additional checking, such as whether column names listed in FORCE * QUOTE actually exist, has to be applied later. This just checks for * self-consistency of the options list. @@ -450,6 +453,7 @@ void ProcessCopyOptions(ParseState *pstate, CopyFormatOptions *opts_out, bool is_from, + void *cstate, List *options) { bool format_specified = false; @@ -464,30 +468,54 @@ ProcessCopyOptions(ParseState *pstate, opts_out->file_encoding = -1; - /* Extract options from the statement node tree */ + /* + * Extract only the "format" option to detect target routine as the first + * step + */ foreach(option, options) { DefElem *defel = lfirst_node(DefElem, option); if (strcmp(defel->defname, "format") == 0) { - char *fmt = defGetString(defel); - if (format_specified) errorConflictingDefElem(defel, pstate); format_specified = true; - if (strcmp(fmt, "text") == 0) - /* default format */ ; - else if (strcmp(fmt, "csv") == 0) - opts_out->csv_mode = true; - else if (strcmp(fmt, "binary") == 0) - opts_out->binary = true; + + if (is_from) + { + char *fmt = defGetString(defel); + + if (strcmp(fmt, "text") == 0) + /* default format */ ; + else if (strcmp(fmt, "csv") == 0) + { + opts_out->csv_mode = true; + } + else if (strcmp(fmt, "binary") == 0) + { + opts_out->binary = true; + } + else + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("COPY format \"%s\" not recognized", fmt), + parser_errposition(pstate, defel->location))); + } else - ereport(ERROR, - (errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("COPY format \"%s\" not recognized", fmt), - parser_errposition(pstate, defel->location))); + ProcessCopyOptionFormatTo(pstate, opts_out, defel); } + } + if (!format_specified) + /* Set the default format. */ + ProcessCopyOptionFormatTo(pstate, opts_out, NULL); + /* Extract options except "format" from the statement node tree */ + foreach(option, options) + { + DefElem *defel = lfirst_node(DefElem, option); + + if (strcmp(defel->defname, "format") == 0) + continue; else if (strcmp(defel->defname, "freeze") == 0) { if (freeze_specified) @@ -616,11 +644,18 @@ ProcessCopyOptions(ParseState *pstate, opts_out->on_error = defGetCopyOnErrorChoice(defel, pstate, is_from); } else - ereport(ERROR, - (errcode(ERRCODE_SYNTAX_ERROR), - errmsg("option \"%s\" not recognized", - defel->defname), - parser_errposition(pstate, defel->location))); + { + bool processed = false; + + if (!is_from) + processed = opts_out->to_routine->CopyToProcessOption(cstate, defel); + if (!processed) + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("option \"%s\" not recognized", + defel->defname), + parser_errposition(pstate, defel->location))); + } } /* diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index 173a736ad5..05b3d13236 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -1411,7 +1411,7 @@ BeginCopyFrom(ParseState *pstate, oldcontext = MemoryContextSwitchTo(cstate->copycontext); /* Extract options from the statement node tree */ - ProcessCopyOptions(pstate, &cstate->opts, true /* is_from */ , options); + ProcessCopyOptions(pstate, &cstate->opts, true /* is_from */ , cstate, options); /* Process the target relation */ cstate->rel = rel; diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c index d3dc3fc854..52572585fa 100644 --- a/src/backend/commands/copyto.c +++ b/src/backend/commands/copyto.c @@ -24,6 +24,7 @@ #include "access/xact.h" #include "access/xlog.h" #include "commands/copy.h" +#include "commands/defrem.h" #include "commands/progress.h" #include "executor/execdesc.h" #include "executor/executor.h" @@ -131,6 +132,397 @@ static void CopySendEndOfRow(CopyToState cstate); static void CopySendInt32(CopyToState cstate, int32 val); static void CopySendInt16(CopyToState cstate, int16 val); +/* + * CopyToRoutine implementations. + */ + +/* + * CopyToRoutine implementation for "text" and "csv". CopyToTextBased*() are + * shared by both of "text" and "csv". CopyToText*() are only for "text" and + * CopyToCSV*() are only for "csv". + * + * We can use the same functions for all callbacks by referring + * cstate->opts.csv_mode but splitting callbacks to eliminate "if + * (cstate->opts.csv_mode)" branches from all callbacks has performance + * merit when many tuples are copied. So we use separated callbacks for "text" + * and "csv". + */ + +/* + * All "text" and "csv" options are parsed in ProcessCopyOptions(). We may + * move the code to here later. + */ +static bool +CopyToTextBasedProcessOption(CopyToState cstate, DefElem *defel) +{ + return false; +} + +static int16 +CopyToTextBasedGetFormat(CopyToState cstate) +{ + return 0; +} + +static void +CopyToTextBasedSendEndOfRow(CopyToState cstate) +{ + switch (cstate->copy_dest) + { + case COPY_FILE: + /* Default line termination depends on platform */ +#ifndef WIN32 + CopySendChar(cstate, '\n'); +#else + CopySendString(cstate, "\r\n"); +#endif + break; + case COPY_FRONTEND: + /* The FE/BE protocol uses \n as newline for all platforms */ + CopySendChar(cstate, '\n'); + break; + default: + break; + } + CopySendEndOfRow(cstate); +} + +typedef void (*CopyAttributeOutHeaderFunction) (CopyToState cstate, char *string); + +/* + * We can use CopyAttributeOutText() directly but define this for consistency + * with CopyAttributeOutCSVHeader(). "static inline" will prevent performance + * penalty by this wrapping. + */ +static inline void +CopyAttributeOutTextHeader(CopyToState cstate, char *string) +{ + CopyAttributeOutText(cstate, string); +} + +static inline void +CopyAttributeOutCSVHeader(CopyToState cstate, char *string) +{ + CopyAttributeOutCSV(cstate, string, false, + list_length(cstate->attnumlist) == 1); +} + +/* + * We don't use this function as a callback directly. We define + * CopyToTextStart() and CopyToCSVStart() and use them instead. It's for + * eliminating a "if (cstate->opts.csv_mode)" branch. This callback is called + * only once per COPY TO. So this optimization may be meaningless but done for + * consistency with CopyToTextBasedOneRow(). + * + * This must initialize cstate->out_functions for CopyToTextBasedOneRow(). + */ +static inline void +CopyToTextBasedStart(CopyToState cstate, TupleDesc tupDesc, CopyAttributeOutHeaderFunction out) +{ + int num_phys_attrs; + ListCell *cur; + + num_phys_attrs = tupDesc->natts; + /* Get info about the columns we need to process. */ + cstate->out_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo)); + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + Oid out_func_oid; + bool isvarlena; + Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1); + + getTypeOutputInfo(attr->atttypid, &out_func_oid, &isvarlena); + fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]); + } + + /* + * For non-binary copy, we need to convert null_print to file encoding, + * because it will be sent directly with CopySendString. + */ + if (cstate->need_transcoding) + cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print, + cstate->opts.null_print_len, + cstate->file_encoding); + + /* if a header has been requested send the line */ + if (cstate->opts.header_line) + { + bool hdr_delim = false; + + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + char *colname; + + if (hdr_delim) + CopySendChar(cstate, cstate->opts.delim[0]); + hdr_delim = true; + + colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname); + + out(cstate, colname); + } + + CopyToTextBasedSendEndOfRow(cstate); + } +} + +static void +CopyToTextStart(CopyToState cstate, TupleDesc tupDesc) +{ + CopyToTextBasedStart(cstate, tupDesc, CopyAttributeOutTextHeader); +} + +static void +CopyToCSVStart(CopyToState cstate, TupleDesc tupDesc) +{ + CopyToTextBasedStart(cstate, tupDesc, CopyAttributeOutCSVHeader); +} + +typedef void (*CopyAttributeOutValueFunction) (CopyToState cstate, char *string, int attnum); + +static inline void +CopyAttributeOutTextValue(CopyToState cstate, char *string, int attnum) +{ + CopyAttributeOutText(cstate, string); +} + +static inline void +CopyAttributeOutCSVValue(CopyToState cstate, char *string, int attnum) +{ + CopyAttributeOutCSV(cstate, string, + cstate->opts.force_quote_flags[attnum - 1], + list_length(cstate->attnumlist) == 1); +} + +/* + * We don't use this function as a callback directly. We define + * CopyToTextOneRow() and CopyToCSVOneRow() and use them instead. It's for + * eliminating a "if (cstate->opts.csv_mode)" branch. This callback is called + * per tuple. So this optimization will be valuable when many tuples are + * copied. + * + * cstate->out_functions must be initialized in CopyToTextBasedStart(). + */ +static void +CopyToTextBasedOneRow(CopyToState cstate, TupleTableSlot *slot, CopyAttributeOutValueFunction out) +{ + bool need_delim = false; + FmgrInfo *out_functions = cstate->out_functions; + ListCell *cur; + + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + Datum value = slot->tts_values[attnum - 1]; + bool isnull = slot->tts_isnull[attnum - 1]; + + if (need_delim) + CopySendChar(cstate, cstate->opts.delim[0]); + need_delim = true; + + if (isnull) + { + CopySendString(cstate, cstate->opts.null_print_client); + } + else + { + char *string; + + string = OutputFunctionCall(&out_functions[attnum - 1], value); + out(cstate, string, attnum); + } + } + + CopyToTextBasedSendEndOfRow(cstate); +} + +static void +CopyToTextOneRow(CopyToState cstate, TupleTableSlot *slot) +{ + CopyToTextBasedOneRow(cstate, slot, CopyAttributeOutTextValue); +} + +static void +CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot) +{ + CopyToTextBasedOneRow(cstate, slot, CopyAttributeOutCSVValue); +} + +static void +CopyToTextBasedEnd(CopyToState cstate) +{ +} + +/* + * CopyToRoutine implementation for "binary". + */ + +/* + * All "binary" options are parsed in ProcessCopyOptions(). We may move the + * code to here later. + */ +static bool +CopyToBinaryProcessOption(CopyToState cstate, DefElem *defel) +{ + return false; +} + +static int16 +CopyToBinaryGetFormat(CopyToState cstate) +{ + return 1; +} + +/* + * This must initialize cstate->out_functions for CopyToBinaryOneRow(). + */ +static void +CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc) +{ + int num_phys_attrs; + ListCell *cur; + + num_phys_attrs = tupDesc->natts; + /* Get info about the columns we need to process. */ + cstate->out_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo)); + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + Oid out_func_oid; + bool isvarlena; + Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1); + + getTypeBinaryOutputInfo(attr->atttypid, &out_func_oid, &isvarlena); + fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]); + } + + { + /* Generate header for a binary copy */ + int32 tmp; + + /* Signature */ + CopySendData(cstate, BinarySignature, 11); + /* Flags field */ + tmp = 0; + CopySendInt32(cstate, tmp); + /* No header extension */ + tmp = 0; + CopySendInt32(cstate, tmp); + } +} + +/* + * cstate->out_functions must be initialized in CopyToBinaryStart(). + */ +static void +CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot) +{ + FmgrInfo *out_functions = cstate->out_functions; + ListCell *cur; + + /* Binary per-tuple header */ + CopySendInt16(cstate, list_length(cstate->attnumlist)); + + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + Datum value = slot->tts_values[attnum - 1]; + bool isnull = slot->tts_isnull[attnum - 1]; + + if (isnull) + { + CopySendInt32(cstate, -1); + } + else + { + bytea *outputbytes; + + outputbytes = SendFunctionCall(&out_functions[attnum - 1], value); + CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ); + CopySendData(cstate, VARDATA(outputbytes), + VARSIZE(outputbytes) - VARHDRSZ); + } + } + + CopySendEndOfRow(cstate); +} + +static void +CopyToBinaryEnd(CopyToState cstate) +{ + /* Generate trailer for a binary copy */ + CopySendInt16(cstate, -1); + /* Need to flush out the trailer */ + CopySendEndOfRow(cstate); +} + +/* + * CopyToTextBased*() are shared with "csv". CopyToText*() are only for "text". + */ +static const CopyToRoutine CopyToRoutineText = { + .CopyToProcessOption = CopyToTextBasedProcessOption, + .CopyToGetFormat = CopyToTextBasedGetFormat, + .CopyToStart = CopyToTextStart, + .CopyToOneRow = CopyToTextOneRow, + .CopyToEnd = CopyToTextBasedEnd, +}; + +/* + * CopyToTextBased*() are shared with "text". CopyToCSV*() are only for "csv". + */ +static const CopyToRoutine CopyToRoutineCSV = { + .CopyToProcessOption = CopyToTextBasedProcessOption, + .CopyToGetFormat = CopyToTextBasedGetFormat, + .CopyToStart = CopyToCSVStart, + .CopyToOneRow = CopyToCSVOneRow, + .CopyToEnd = CopyToTextBasedEnd, +}; + +static const CopyToRoutine CopyToRoutineBinary = { + .CopyToProcessOption = CopyToBinaryProcessOption, + .CopyToGetFormat = CopyToBinaryGetFormat, + .CopyToStart = CopyToBinaryStart, + .CopyToOneRow = CopyToBinaryOneRow, + .CopyToEnd = CopyToBinaryEnd, +}; + +/* + * Process the "format" option for COPY TO. + * + * If defel is NULL, the default format "text" is used. + */ +void +ProcessCopyOptionFormatTo(ParseState *pstate, + CopyFormatOptions *opts_out, + DefElem *defel) +{ + char *format; + + if (defel) + format = defGetString(defel); + else + format = "text"; + + if (strcmp(format, "text") == 0) + opts_out->to_routine = &CopyToRoutineText; + else if (strcmp(format, "csv") == 0) + { + opts_out->csv_mode = true; + opts_out->to_routine = &CopyToRoutineCSV; + } + else if (strcmp(format, "binary") == 0) + { + opts_out->binary = true; + opts_out->to_routine = &CopyToRoutineBinary; + } + else + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("COPY format \"%s\" not recognized", format), + parser_errposition(pstate, defel->location))); +} /* * Send copy start/stop messages for frontend copies. These have changed @@ -141,7 +533,7 @@ SendCopyBegin(CopyToState cstate) { StringInfoData buf; int natts = list_length(cstate->attnumlist); - int16 format = (cstate->opts.binary ? 1 : 0); + int16 format = cstate->opts.to_routine->CopyToGetFormat(cstate); int i; pq_beginmessage(&buf, PqMsg_CopyOutResponse); @@ -198,16 +590,6 @@ CopySendEndOfRow(CopyToState cstate) switch (cstate->copy_dest) { case COPY_FILE: - if (!cstate->opts.binary) - { - /* Default line termination depends on platform */ -#ifndef WIN32 - CopySendChar(cstate, '\n'); -#else - CopySendString(cstate, "\r\n"); -#endif - } - if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1, cstate->copy_file) != 1 || ferror(cstate->copy_file)) @@ -242,10 +624,6 @@ CopySendEndOfRow(CopyToState cstate) } break; case COPY_FRONTEND: - /* The FE/BE protocol uses \n as newline for all platforms */ - if (!cstate->opts.binary) - CopySendChar(cstate, '\n'); - /* Dump the accumulated row as one CopyData message */ (void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len); break; @@ -431,7 +809,7 @@ BeginCopyTo(ParseState *pstate, oldcontext = MemoryContextSwitchTo(cstate->copycontext); /* Extract options from the statement node tree */ - ProcessCopyOptions(pstate, &cstate->opts, false /* is_from */ , options); + ProcessCopyOptions(pstate, &cstate->opts, false /* is_from */ , cstate, options); /* Process the source/target relation or query */ if (rel) @@ -748,8 +1126,6 @@ DoCopyTo(CopyToState cstate) bool pipe = (cstate->filename == NULL && cstate->data_dest_cb == NULL); bool fe_copy = (pipe && whereToSendOutput == DestRemote); TupleDesc tupDesc; - int num_phys_attrs; - ListCell *cur; uint64 processed; if (fe_copy) @@ -759,32 +1135,11 @@ DoCopyTo(CopyToState cstate) tupDesc = RelationGetDescr(cstate->rel); else tupDesc = cstate->queryDesc->tupDesc; - num_phys_attrs = tupDesc->natts; cstate->opts.null_print_client = cstate->opts.null_print; /* default */ /* We use fe_msgbuf as a per-row buffer regardless of copy_dest */ cstate->fe_msgbuf = makeStringInfo(); - /* Get info about the columns we need to process. */ - cstate->out_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo)); - foreach(cur, cstate->attnumlist) - { - int attnum = lfirst_int(cur); - Oid out_func_oid; - bool isvarlena; - Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1); - - if (cstate->opts.binary) - getTypeBinaryOutputInfo(attr->atttypid, - &out_func_oid, - &isvarlena); - else - getTypeOutputInfo(attr->atttypid, - &out_func_oid, - &isvarlena); - fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]); - } - /* * Create a temporary memory context that we can reset once per row to * recover palloc'd memory. This avoids any problems with leaks inside @@ -795,57 +1150,7 @@ DoCopyTo(CopyToState cstate) "COPY TO", ALLOCSET_DEFAULT_SIZES); - if (cstate->opts.binary) - { - /* Generate header for a binary copy */ - int32 tmp; - - /* Signature */ - CopySendData(cstate, BinarySignature, 11); - /* Flags field */ - tmp = 0; - CopySendInt32(cstate, tmp); - /* No header extension */ - tmp = 0; - CopySendInt32(cstate, tmp); - } - else - { - /* - * For non-binary copy, we need to convert null_print to file - * encoding, because it will be sent directly with CopySendString. - */ - if (cstate->need_transcoding) - cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print, - cstate->opts.null_print_len, - cstate->file_encoding); - - /* if a header has been requested send the line */ - if (cstate->opts.header_line) - { - bool hdr_delim = false; - - foreach(cur, cstate->attnumlist) - { - int attnum = lfirst_int(cur); - char *colname; - - if (hdr_delim) - CopySendChar(cstate, cstate->opts.delim[0]); - hdr_delim = true; - - colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname); - - if (cstate->opts.csv_mode) - CopyAttributeOutCSV(cstate, colname, false, - list_length(cstate->attnumlist) == 1); - else - CopyAttributeOutText(cstate, colname); - } - - CopySendEndOfRow(cstate); - } - } + cstate->opts.to_routine->CopyToStart(cstate, tupDesc); if (cstate->rel) { @@ -884,13 +1189,7 @@ DoCopyTo(CopyToState cstate) processed = ((DR_copy *) cstate->queryDesc->dest)->processed; } - if (cstate->opts.binary) - { - /* Generate trailer for a binary copy */ - CopySendInt16(cstate, -1); - /* Need to flush out the trailer */ - CopySendEndOfRow(cstate); - } + cstate->opts.to_routine->CopyToEnd(cstate); MemoryContextDelete(cstate->rowcontext); @@ -906,71 +1205,15 @@ DoCopyTo(CopyToState cstate) static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot) { - bool need_delim = false; - FmgrInfo *out_functions = cstate->out_functions; MemoryContext oldcontext; - ListCell *cur; - char *string; MemoryContextReset(cstate->rowcontext); oldcontext = MemoryContextSwitchTo(cstate->rowcontext); - if (cstate->opts.binary) - { - /* Binary per-tuple header */ - CopySendInt16(cstate, list_length(cstate->attnumlist)); - } - /* Make sure the tuple is fully deconstructed */ slot_getallattrs(slot); - foreach(cur, cstate->attnumlist) - { - int attnum = lfirst_int(cur); - Datum value = slot->tts_values[attnum - 1]; - bool isnull = slot->tts_isnull[attnum - 1]; - - if (!cstate->opts.binary) - { - if (need_delim) - CopySendChar(cstate, cstate->opts.delim[0]); - need_delim = true; - } - - if (isnull) - { - if (!cstate->opts.binary) - CopySendString(cstate, cstate->opts.null_print_client); - else - CopySendInt32(cstate, -1); - } - else - { - if (!cstate->opts.binary) - { - string = OutputFunctionCall(&out_functions[attnum - 1], - value); - if (cstate->opts.csv_mode) - CopyAttributeOutCSV(cstate, string, - cstate->opts.force_quote_flags[attnum - 1], - list_length(cstate->attnumlist) == 1); - else - CopyAttributeOutText(cstate, string); - } - else - { - bytea *outputbytes; - - outputbytes = SendFunctionCall(&out_functions[attnum - 1], - value); - CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ); - CopySendData(cstate, VARDATA(outputbytes), - VARSIZE(outputbytes) - VARHDRSZ); - } - } - } - - CopySendEndOfRow(cstate); + cstate->opts.to_routine->CopyToOneRow(cstate, slot); MemoryContextSwitchTo(oldcontext); } diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h index b3da3cb0be..9abd7fe538 100644 --- a/src/include/commands/copy.h +++ b/src/include/commands/copy.h @@ -14,6 +14,7 @@ #ifndef COPY_H #define COPY_H +#include "commands/copyapi.h" #include "nodes/execnodes.h" #include "nodes/parsenodes.h" #include "parser/parse_node.h" @@ -74,11 +75,11 @@ typedef struct CopyFormatOptions bool convert_selectively; /* do selective binary conversion? */ CopyOnErrorChoice on_error; /* what to do when error happened */ List *convert_select; /* list of column names (can be NIL) */ + const CopyToRoutine *to_routine; /* callback routines for COPY TO */ } CopyFormatOptions; -/* These are private in commands/copy[from|to].c */ +/* This is private in commands/copyfrom.c */ typedef struct CopyFromStateData *CopyFromState; -typedef struct CopyToStateData *CopyToState; typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread); typedef void (*copy_data_dest_cb) (void *data, int len); @@ -87,7 +88,8 @@ extern void DoCopy(ParseState *pstate, const CopyStmt *stmt, int stmt_location, int stmt_len, uint64 *processed); -extern void ProcessCopyOptions(ParseState *pstate, CopyFormatOptions *opts_out, bool is_from, List *options); +extern void ProcessCopyOptions(ParseState *pstate, CopyFormatOptions *opts_out, bool is_from, void *cstate, List *options); +extern void ProcessCopyOptionFormatTo(ParseState *pstate, CopyFormatOptions *opts_out, DefElem *defel); extern CopyFromState BeginCopyFrom(ParseState *pstate, Relation rel, Node *whereClause, const char *filename, bool is_program, copy_data_source_cb data_source_cb, List *attnamelist, List *options); diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h new file mode 100644 index 0000000000..ed52ce5f49 --- /dev/null +++ b/src/include/commands/copyapi.h @@ -0,0 +1,48 @@ +/*------------------------------------------------------------------------- + * + * copyapi.h + * API for COPY TO/FROM handlers + * + * + * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + * src/include/commands/copyapi.h + * + *------------------------------------------------------------------------- + */ +#ifndef COPYAPI_H +#define COPYAPI_H + +#include "executor/tuptable.h" +#include "nodes/parsenodes.h" + +/* This is private in commands/copyto.c */ +typedef struct CopyToStateData *CopyToState; + +/* Routines for a COPY TO format implementation. */ +typedef struct CopyToRoutine +{ + /* + * Called for processing one COPY TO option. This will return false when + * the given option is invalid. + */ + bool (*CopyToProcessOption) (CopyToState cstate, DefElem *defel); + + /* + * Called when COPY TO is started. This will return a format as int16 + * value. It's used for the CopyOutResponse message. + */ + int16 (*CopyToGetFormat) (CopyToState cstate); + + /* Called when COPY TO is started. This will send a header. */ + void (*CopyToStart) (CopyToState cstate, TupleDesc tupDesc); + + /* Copy one row for COPY TO. */ + void (*CopyToOneRow) (CopyToState cstate, TupleTableSlot *slot); + + /* Called when COPY TO is ended. This will send a trailer. */ + void (*CopyToEnd) (CopyToState cstate); +} CopyToRoutine; + +#endif /* COPYAPI_H */ -- 2.43.0 From c956816bed5c6ac4366ad4ae839d8e38f5ae4d7e Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Fri, 26 Jan 2024 17:21:53 +0900 Subject: [PATCH v7 2/2] Extract COPY FROM format implementations This doesn't change the current behavior. This just introduces CopyFromRoutine, which just has function pointers of format implementation like TupleTableSlotOps, and use it for existing "text", "csv" and "binary" format implementations. Note that CopyFromRoutine can't be used from extensions yet because CopyRead*() aren't exported yet. Extensions can't read data from a source without CopyRead*(). They will be exported by subsequent patches. --- src/backend/commands/copy.c | 33 +- src/backend/commands/copyfrom.c | 270 +++++++++++++--- src/backend/commands/copyfromparse.c | 382 +++++++++++++---------- src/include/commands/copy.h | 6 +- src/include/commands/copyapi.h | 32 ++ src/include/commands/copyfrom_internal.h | 4 + 6 files changed, 490 insertions(+), 237 deletions(-) diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index 3676d1206d..489de4ab8d 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -483,32 +483,19 @@ ProcessCopyOptions(ParseState *pstate, format_specified = true; if (is_from) - { - char *fmt = defGetString(defel); - - if (strcmp(fmt, "text") == 0) - /* default format */ ; - else if (strcmp(fmt, "csv") == 0) - { - opts_out->csv_mode = true; - } - else if (strcmp(fmt, "binary") == 0) - { - opts_out->binary = true; - } - else - ereport(ERROR, - (errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("COPY format \"%s\" not recognized", fmt), - parser_errposition(pstate, defel->location))); - } + ProcessCopyOptionFormatFrom(pstate, opts_out, defel); else ProcessCopyOptionFormatTo(pstate, opts_out, defel); } } if (!format_specified) + { /* Set the default format. */ - ProcessCopyOptionFormatTo(pstate, opts_out, NULL); + if (is_from) + ProcessCopyOptionFormatFrom(pstate, opts_out, NULL); + else + ProcessCopyOptionFormatTo(pstate, opts_out, NULL); + } /* Extract options except "format" from the statement node tree */ foreach(option, options) { @@ -645,9 +632,11 @@ ProcessCopyOptions(ParseState *pstate, } else { - bool processed = false; + bool processed; - if (!is_from) + if (is_from) + processed = opts_out->from_routine->CopyFromProcessOption(cstate, defel); + else processed = opts_out->to_routine->CopyToProcessOption(cstate, defel); if (!processed) ereport(ERROR, diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index 05b3d13236..498d7bc5ad 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -32,6 +32,7 @@ #include "catalog/namespace.h" #include "commands/copy.h" #include "commands/copyfrom_internal.h" +#include "commands/defrem.h" #include "commands/progress.h" #include "commands/trigger.h" #include "executor/execPartition.h" @@ -108,6 +109,223 @@ static char *limit_printout_length(const char *str); static void ClosePipeFromProgram(CopyFromState cstate); + +/* + * CopyFromRoutine implementations. + */ + +/* + * CopyFromRoutine implementation for "text" and "csv". CopyFromTextBased*() + * are shared by both of "text" and "csv". CopyFromText*() are only for "text" + * and CopyFromCSV*() are only for "csv". + * + * We can use the same functions for all callbacks by referring + * cstate->opts.csv_mode but splitting callbacks to eliminate "if + * (cstate->opts.csv_mode)" branches from all callbacks has performance merit + * when many tuples are copied. So we use separated callbacks for "text" and + * "csv". + */ + +/* + * All "text" and "csv" options are parsed in ProcessCopyOptions(). We may + * move the code to here later. + */ +static bool +CopyFromTextBasedProcessOption(CopyFromState cstate, DefElem *defel) +{ + return false; +} + +static int16 +CopyFromTextBasedGetFormat(CopyFromState cstate) +{ + return 0; +} + +/* + * This must initialize cstate->in_functions for CopyFromTextBasedOneRow(). + */ +static void +CopyFromTextBasedStart(CopyFromState cstate, TupleDesc tupDesc) +{ + AttrNumber num_phys_attrs = tupDesc->natts; + AttrNumber attr_count; + + /* + * If encoding conversion is needed, we need another buffer to hold the + * converted input data. Otherwise, we can just point input_buf to the + * same buffer as raw_buf. + */ + if (cstate->need_transcoding) + { + cstate->input_buf = (char *) palloc(INPUT_BUF_SIZE + 1); + cstate->input_buf_index = cstate->input_buf_len = 0; + } + else + cstate->input_buf = cstate->raw_buf; + cstate->input_reached_eof = false; + + initStringInfo(&cstate->line_buf); + + /* + * Pick up the required catalog information for each attribute in the + * relation, including the input function, the element type (to pass to + * the input function). + */ + cstate->in_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo)); + cstate->typioparams = (Oid *) palloc(num_phys_attrs * sizeof(Oid)); + for (int attnum = 1; attnum <= num_phys_attrs; attnum++) + { + Form_pg_attribute att = TupleDescAttr(tupDesc, attnum - 1); + Oid in_func_oid; + + /* We don't need info for dropped attributes */ + if (att->attisdropped) + continue; + + /* Fetch the input function and typioparam info */ + getTypeInputInfo(att->atttypid, + &in_func_oid, &cstate->typioparams[attnum - 1]); + fmgr_info(in_func_oid, &cstate->in_functions[attnum - 1]); + } + + /* create workspace for CopyReadAttributes results */ + attr_count = list_length(cstate->attnumlist); + cstate->max_fields = attr_count; + cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *)); +} + +static void +CopyFromTextBasedEnd(CopyFromState cstate) +{ +} + +/* + * CopyFromRoutine implementation for "binary". + */ + +/* + * All "binary" options are parsed in ProcessCopyOptions(). We may move the + * code to here later. + */ +static bool +CopyFromBinaryProcessOption(CopyFromState cstate, DefElem *defel) +{ + return false; +} + +static int16 +CopyFromBinaryGetFormat(CopyFromState cstate) +{ + return 1; +} + +/* + * This must initialize cstate->in_functions for CopyFromBinaryOneRow(). + */ +static void +CopyFromBinaryStart(CopyFromState cstate, TupleDesc tupDesc) +{ + AttrNumber num_phys_attrs = tupDesc->natts; + + /* + * Pick up the required catalog information for each attribute in the + * relation, including the input function, the element type (to pass to + * the input function). + */ + cstate->in_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo)); + cstate->typioparams = (Oid *) palloc(num_phys_attrs * sizeof(Oid)); + for (int attnum = 1; attnum <= num_phys_attrs; attnum++) + { + Form_pg_attribute att = TupleDescAttr(tupDesc, attnum - 1); + Oid in_func_oid; + + /* We don't need info for dropped attributes */ + if (att->attisdropped) + continue; + + /* Fetch the input function and typioparam info */ + getTypeBinaryInputInfo(att->atttypid, + &in_func_oid, &cstate->typioparams[attnum - 1]); + fmgr_info(in_func_oid, &cstate->in_functions[attnum - 1]); + } + + /* Read and verify binary header */ + ReceiveCopyBinaryHeader(cstate); +} + +static void +CopyFromBinaryEnd(CopyFromState cstate) +{ +} + +/* + * CopyFromTextBased*() are shared with "csv". CopyFromText*() are only for "text". + */ +static const CopyFromRoutine CopyFromRoutineText = { + .CopyFromProcessOption = CopyFromTextBasedProcessOption, + .CopyFromGetFormat = CopyFromTextBasedGetFormat, + .CopyFromStart = CopyFromTextBasedStart, + .CopyFromOneRow = CopyFromTextOneRow, + .CopyFromEnd = CopyFromTextBasedEnd, +}; + +/* + * CopyFromTextBased*() are shared with "text". CopyFromCSV*() are only for "csv". + */ +static const CopyFromRoutine CopyFromRoutineCSV = { + .CopyFromProcessOption = CopyFromTextBasedProcessOption, + .CopyFromGetFormat = CopyFromTextBasedGetFormat, + .CopyFromStart = CopyFromTextBasedStart, + .CopyFromOneRow = CopyFromCSVOneRow, + .CopyFromEnd = CopyFromTextBasedEnd, +}; + +static const CopyFromRoutine CopyFromRoutineBinary = { + .CopyFromProcessOption = CopyFromBinaryProcessOption, + .CopyFromGetFormat = CopyFromBinaryGetFormat, + .CopyFromStart = CopyFromBinaryStart, + .CopyFromOneRow = CopyFromBinaryOneRow, + .CopyFromEnd = CopyFromBinaryEnd, +}; + +/* + * Process the "format" option for COPY FROM. + * + * If defel is NULL, the default format "text" is used. + */ +void +ProcessCopyOptionFormatFrom(ParseState *pstate, + CopyFormatOptions *opts_out, + DefElem *defel) +{ + char *format; + + if (defel) + format = defGetString(defel); + else + format = "text"; + + if (strcmp(format, "text") == 0) + opts_out->from_routine = &CopyFromRoutineText; + else if (strcmp(format, "csv") == 0) + { + opts_out->csv_mode = true; + opts_out->from_routine = &CopyFromRoutineCSV; + } + else if (strcmp(format, "binary") == 0) + { + opts_out->binary = true; + opts_out->from_routine = &CopyFromRoutineBinary; + } + else + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("COPY format \"%s\" not recognized", format), + parser_errposition(pstate, defel->location))); +} + + /* * error context callback for COPY FROM * @@ -1379,9 +1597,6 @@ BeginCopyFrom(ParseState *pstate, TupleDesc tupDesc; AttrNumber num_phys_attrs, num_defaults; - FmgrInfo *in_functions; - Oid *typioparams; - Oid in_func_oid; int *defmap; ExprState **defexprs; MemoryContext oldcontext; @@ -1566,25 +1781,6 @@ BeginCopyFrom(ParseState *pstate, cstate->raw_buf_index = cstate->raw_buf_len = 0; cstate->raw_reached_eof = false; - if (!cstate->opts.binary) - { - /* - * If encoding conversion is needed, we need another buffer to hold - * the converted input data. Otherwise, we can just point input_buf - * to the same buffer as raw_buf. - */ - if (cstate->need_transcoding) - { - cstate->input_buf = (char *) palloc(INPUT_BUF_SIZE + 1); - cstate->input_buf_index = cstate->input_buf_len = 0; - } - else - cstate->input_buf = cstate->raw_buf; - cstate->input_reached_eof = false; - - initStringInfo(&cstate->line_buf); - } - initStringInfo(&cstate->attribute_buf); /* Assign range table and rteperminfos, we'll need them in CopyFrom. */ @@ -1603,8 +1799,6 @@ BeginCopyFrom(ParseState *pstate, * the input function), and info about defaults and constraints. (Which * input function we use depends on text/binary format choice.) */ - in_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo)); - typioparams = (Oid *) palloc(num_phys_attrs * sizeof(Oid)); defmap = (int *) palloc(num_phys_attrs * sizeof(int)); defexprs = (ExprState **) palloc(num_phys_attrs * sizeof(ExprState *)); @@ -1616,15 +1810,6 @@ BeginCopyFrom(ParseState *pstate, if (att->attisdropped) continue; - /* Fetch the input function and typioparam info */ - if (cstate->opts.binary) - getTypeBinaryInputInfo(att->atttypid, - &in_func_oid, &typioparams[attnum - 1]); - else - getTypeInputInfo(att->atttypid, - &in_func_oid, &typioparams[attnum - 1]); - fmgr_info(in_func_oid, &in_functions[attnum - 1]); - /* Get default info if available */ defexprs[attnum - 1] = NULL; @@ -1684,8 +1869,6 @@ BeginCopyFrom(ParseState *pstate, cstate->bytes_processed = 0; /* We keep those variables in cstate. */ - cstate->in_functions = in_functions; - cstate->typioparams = typioparams; cstate->defmap = defmap; cstate->defexprs = defexprs; cstate->volatile_defexprs = volatile_defexprs; @@ -1758,20 +1941,7 @@ BeginCopyFrom(ParseState *pstate, pgstat_progress_update_multi_param(3, progress_cols, progress_vals); - if (cstate->opts.binary) - { - /* Read and verify binary header */ - ReceiveCopyBinaryHeader(cstate); - } - - /* create workspace for CopyReadAttributes results */ - if (!cstate->opts.binary) - { - AttrNumber attr_count = list_length(cstate->attnumlist); - - cstate->max_fields = attr_count; - cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *)); - } + cstate->opts.from_routine->CopyFromStart(cstate, tupDesc); MemoryContextSwitchTo(oldcontext); @@ -1784,6 +1954,8 @@ BeginCopyFrom(ParseState *pstate, void EndCopyFrom(CopyFromState cstate) { + cstate->opts.from_routine->CopyFromEnd(cstate); + /* No COPY FROM related resources except memory. */ if (cstate->is_program) { diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index 7cacd0b752..856ba261e1 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -172,7 +172,7 @@ ReceiveCopyBegin(CopyFromState cstate) { StringInfoData buf; int natts = list_length(cstate->attnumlist); - int16 format = (cstate->opts.binary ? 1 : 0); + int16 format = cstate->opts.from_routine->CopyFromGetFormat(cstate); int i; pq_beginmessage(&buf, PqMsg_CopyInResponse); @@ -840,6 +840,221 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields) return true; } +typedef char *(*PostpareColumnValue) (CopyFromState cstate, char *string, int m); + +static inline char * +PostpareColumnValueText(CopyFromState cstate, char *string, int m) +{ + /* do nothing */ + return string; +} + +static inline char * +PostpareColumnValueCSV(CopyFromState cstate, char *string, int m) +{ + if (string == NULL && + cstate->opts.force_notnull_flags[m]) + { + /* + * FORCE_NOT_NULL option is set and column is NULL - convert it to the + * NULL string. + */ + string = cstate->opts.null_print; + } + else if (string != NULL && cstate->opts.force_null_flags[m] + && strcmp(string, cstate->opts.null_print) == 0) + { + /* + * FORCE_NULL option is set and column matches the NULL string. It + * must have been quoted, or otherwise the string would already have + * been set to NULL. Convert it to NULL as specified. + */ + string = NULL; + } + return string; +} + +/* + * We don't use this function as a callback directly. We define + * CopyFromTextOneRow() and CopyFromCSVOneRow() and use them instead. It's for + * eliminating a "if (cstate->opts.csv_mode)" branch. This callback is called + * per tuple. So this optimization will be valuable when many tuples are + * copied. + * + * cstate->in_functions must be initialized in CopyFromTextBasedStart(). + */ +static inline bool +CopyFromTextBasedOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls, PostpareColumnValue postpare_column_value) +{ + TupleDesc tupDesc; + AttrNumber attr_count; + FmgrInfo *in_functions = cstate->in_functions; + Oid *typioparams = cstate->typioparams; + ExprState **defexprs = cstate->defexprs; + char **field_strings; + ListCell *cur; + int fldct; + int fieldno; + char *string; + + tupDesc = RelationGetDescr(cstate->rel); + attr_count = list_length(cstate->attnumlist); + + /* read raw fields in the next line */ + if (!NextCopyFromRawFields(cstate, &field_strings, &fldct)) + return false; + + /* check for overflowing fields */ + if (attr_count > 0 && fldct > attr_count) + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("extra data after last expected column"))); + + fieldno = 0; + + /* Loop to read the user attributes on the line. */ + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + int m = attnum - 1; + Form_pg_attribute att = TupleDescAttr(tupDesc, m); + + if (fieldno >= fldct) + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("missing data for column \"%s\"", + NameStr(att->attname)))); + string = field_strings[fieldno++]; + + if (cstate->convert_select_flags && + !cstate->convert_select_flags[m]) + { + /* ignore input field, leaving column as NULL */ + continue; + } + + cstate->cur_attname = NameStr(att->attname); + cstate->cur_attval = string; + + string = postpare_column_value(cstate, string, m); + + if (string != NULL) + nulls[m] = false; + + if (cstate->defaults[m]) + { + /* + * The caller must supply econtext and have switched into the + * per-tuple memory context in it. + */ + Assert(econtext != NULL); + Assert(CurrentMemoryContext == econtext->ecxt_per_tuple_memory); + + values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]); + } + + /* + * If ON_ERROR is specified with IGNORE, skip rows with soft errors + */ + else if (!InputFunctionCallSafe(&in_functions[m], + string, + typioparams[m], + att->atttypmod, + (Node *) cstate->escontext, + &values[m])) + { + cstate->num_errors++; + return true; + } + + cstate->cur_attname = NULL; + cstate->cur_attval = NULL; + } + + Assert(fieldno == attr_count); + + return true; +} + +bool +CopyFromTextOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls) +{ + return CopyFromTextBasedOneRow(cstate, econtext, values, nulls, PostpareColumnValueText); +} + +bool +CopyFromCSVOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls) +{ + return CopyFromTextBasedOneRow(cstate, econtext, values, nulls, PostpareColumnValueCSV); +} + +/* + * cstate->in_functions must be initialized in CopyFromBinaryStart(). + */ +bool +CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls) +{ + TupleDesc tupDesc; + AttrNumber attr_count; + FmgrInfo *in_functions = cstate->in_functions; + Oid *typioparams = cstate->typioparams; + int16 fld_count; + ListCell *cur; + + tupDesc = RelationGetDescr(cstate->rel); + attr_count = list_length(cstate->attnumlist); + + cstate->cur_lineno++; + + if (!CopyGetInt16(cstate, &fld_count)) + { + /* EOF detected (end of file, or protocol-level EOF) */ + return false; + } + + if (fld_count == -1) + { + /* + * Received EOF marker. Wait for the protocol-level EOF, and complain + * if it doesn't come immediately. In COPY FROM STDIN, this ensures + * that we correctly handle CopyFail, if client chooses to send that + * now. When copying from file, we could ignore the rest of the file + * like in text mode, but we choose to be consistent with the COPY + * FROM STDIN case. + */ + char dummy; + + if (CopyReadBinaryData(cstate, &dummy, 1) > 0) + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("received copy data after EOF marker"))); + return false; + } + + if (fld_count != attr_count) + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("row field count is %d, expected %d", + (int) fld_count, attr_count))); + + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + int m = attnum - 1; + Form_pg_attribute att = TupleDescAttr(tupDesc, m); + + cstate->cur_attname = NameStr(att->attname); + values[m] = CopyReadBinaryAttribute(cstate, + &in_functions[m], + typioparams[m], + att->atttypmod, + &nulls[m]); + cstate->cur_attname = NULL; + } + + return true; +} + /* * Read next tuple from file for COPY FROM. Return false if no more tuples. * @@ -857,181 +1072,22 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext, { TupleDesc tupDesc; AttrNumber num_phys_attrs, - attr_count, num_defaults = cstate->num_defaults; - FmgrInfo *in_functions = cstate->in_functions; - Oid *typioparams = cstate->typioparams; int i; int *defmap = cstate->defmap; ExprState **defexprs = cstate->defexprs; tupDesc = RelationGetDescr(cstate->rel); num_phys_attrs = tupDesc->natts; - attr_count = list_length(cstate->attnumlist); /* Initialize all values for row to NULL */ MemSet(values, 0, num_phys_attrs * sizeof(Datum)); MemSet(nulls, true, num_phys_attrs * sizeof(bool)); MemSet(cstate->defaults, false, num_phys_attrs * sizeof(bool)); - if (!cstate->opts.binary) - { - char **field_strings; - ListCell *cur; - int fldct; - int fieldno; - char *string; - - /* read raw fields in the next line */ - if (!NextCopyFromRawFields(cstate, &field_strings, &fldct)) - return false; - - /* check for overflowing fields */ - if (attr_count > 0 && fldct > attr_count) - ereport(ERROR, - (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - errmsg("extra data after last expected column"))); - - fieldno = 0; - - /* Loop to read the user attributes on the line. */ - foreach(cur, cstate->attnumlist) - { - int attnum = lfirst_int(cur); - int m = attnum - 1; - Form_pg_attribute att = TupleDescAttr(tupDesc, m); - - if (fieldno >= fldct) - ereport(ERROR, - (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - errmsg("missing data for column \"%s\"", - NameStr(att->attname)))); - string = field_strings[fieldno++]; - - if (cstate->convert_select_flags && - !cstate->convert_select_flags[m]) - { - /* ignore input field, leaving column as NULL */ - continue; - } - - if (cstate->opts.csv_mode) - { - if (string == NULL && - cstate->opts.force_notnull_flags[m]) - { - /* - * FORCE_NOT_NULL option is set and column is NULL - - * convert it to the NULL string. - */ - string = cstate->opts.null_print; - } - else if (string != NULL && cstate->opts.force_null_flags[m] - && strcmp(string, cstate->opts.null_print) == 0) - { - /* - * FORCE_NULL option is set and column matches the NULL - * string. It must have been quoted, or otherwise the - * string would already have been set to NULL. Convert it - * to NULL as specified. - */ - string = NULL; - } - } - - cstate->cur_attname = NameStr(att->attname); - cstate->cur_attval = string; - - if (string != NULL) - nulls[m] = false; - - if (cstate->defaults[m]) - { - /* - * The caller must supply econtext and have switched into the - * per-tuple memory context in it. - */ - Assert(econtext != NULL); - Assert(CurrentMemoryContext == econtext->ecxt_per_tuple_memory); - - values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]); - } - - /* - * If ON_ERROR is specified with IGNORE, skip rows with soft - * errors - */ - else if (!InputFunctionCallSafe(&in_functions[m], - string, - typioparams[m], - att->atttypmod, - (Node *) cstate->escontext, - &values[m])) - { - cstate->num_errors++; - return true; - } - - cstate->cur_attname = NULL; - cstate->cur_attval = NULL; - } - - Assert(fieldno == attr_count); - } - else - { - /* binary */ - int16 fld_count; - ListCell *cur; - - cstate->cur_lineno++; - - if (!CopyGetInt16(cstate, &fld_count)) - { - /* EOF detected (end of file, or protocol-level EOF) */ - return false; - } - - if (fld_count == -1) - { - /* - * Received EOF marker. Wait for the protocol-level EOF, and - * complain if it doesn't come immediately. In COPY FROM STDIN, - * this ensures that we correctly handle CopyFail, if client - * chooses to send that now. When copying from file, we could - * ignore the rest of the file like in text mode, but we choose to - * be consistent with the COPY FROM STDIN case. - */ - char dummy; - - if (CopyReadBinaryData(cstate, &dummy, 1) > 0) - ereport(ERROR, - (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - errmsg("received copy data after EOF marker"))); - return false; - } - - if (fld_count != attr_count) - ereport(ERROR, - (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - errmsg("row field count is %d, expected %d", - (int) fld_count, attr_count))); - - foreach(cur, cstate->attnumlist) - { - int attnum = lfirst_int(cur); - int m = attnum - 1; - Form_pg_attribute att = TupleDescAttr(tupDesc, m); - - cstate->cur_attname = NameStr(att->attname); - values[m] = CopyReadBinaryAttribute(cstate, - &in_functions[m], - typioparams[m], - att->atttypmod, - &nulls[m]); - cstate->cur_attname = NULL; - } - } + if (!cstate->opts.from_routine->CopyFromOneRow(cstate, econtext, values, + nulls)) + return false; /* * Now compute and insert any defaults available for the columns not diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h index 9abd7fe538..107642ef7a 100644 --- a/src/include/commands/copy.h +++ b/src/include/commands/copy.h @@ -75,12 +75,11 @@ typedef struct CopyFormatOptions bool convert_selectively; /* do selective binary conversion? */ CopyOnErrorChoice on_error; /* what to do when error happened */ List *convert_select; /* list of column names (can be NIL) */ + const CopyFromRoutine *from_routine; /* callback routines for COPY + * FROM */ const CopyToRoutine *to_routine; /* callback routines for COPY TO */ } CopyFormatOptions; -/* This is private in commands/copyfrom.c */ -typedef struct CopyFromStateData *CopyFromState; - typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread); typedef void (*copy_data_dest_cb) (void *data, int len); @@ -89,6 +88,7 @@ extern void DoCopy(ParseState *pstate, const CopyStmt *stmt, uint64 *processed); extern void ProcessCopyOptions(ParseState *pstate, CopyFormatOptions *opts_out, bool is_from, void *cstate, List *options); +extern void ProcessCopyOptionFormatFrom(ParseState *pstate, CopyFormatOptions *opts_out, DefElem *defel); extern void ProcessCopyOptionFormatTo(ParseState *pstate, CopyFormatOptions *opts_out, DefElem *defel); extern CopyFromState BeginCopyFrom(ParseState *pstate, Relation rel, Node *whereClause, const char *filename, diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index ed52ce5f49..9f82cc0876 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -15,8 +15,40 @@ #define COPYAPI_H #include "executor/tuptable.h" +#include "nodes/execnodes.h" #include "nodes/parsenodes.h" +/* This is private in commands/copyfrom.c */ +typedef struct CopyFromStateData *CopyFromState; + +/* Routines for a COPY FROM format implementation. */ +typedef struct CopyFromRoutine +{ + /* + * Called for processing one COPY FROM option. This will return false when + * the given option is invalid. + */ + bool (*CopyFromProcessOption) (CopyFromState cstate, DefElem *defel); + + /* + * Called when COPY FROM is started. This will return a format as int16 + * value. It's used for the CopyInResponse message. + */ + int16 (*CopyFromGetFormat) (CopyFromState cstate); + + /* + * Called when COPY FROM is started. This will initialize something and + * receive a header. + */ + void (*CopyFromStart) (CopyFromState cstate, TupleDesc tupDesc); + + /* Copy one row. It returns false if no more tuples. */ + bool (*CopyFromOneRow) (CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls); + + /* Called when COPY FROM is ended. This will finalize something. */ + void (*CopyFromEnd) (CopyFromState cstate); +} CopyFromRoutine; + /* This is private in commands/copyto.c */ typedef struct CopyToStateData *CopyToState; diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h index cad52fcc78..096b55011e 100644 --- a/src/include/commands/copyfrom_internal.h +++ b/src/include/commands/copyfrom_internal.h @@ -183,4 +183,8 @@ typedef struct CopyFromStateData extern void ReceiveCopyBegin(CopyFromState cstate); extern void ReceiveCopyBinaryHeader(CopyFromState cstate); +extern bool CopyFromTextOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls); +extern bool CopyFromCSVOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls); +extern bool CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls); + #endif /* COPYFROM_INTERNAL_H */ -- 2.43.0
Hi, In <CAEG8a3KhS6s1XQgDSvc8vFTb4GkhBmS8TxOoVSDPFX+MPExxxQ@mail.gmail.com> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 26 Jan 2024 16:41:50 +0800, Junwang Zhao <zhjwpku@gmail.com> wrote: > CopyToProcessOption()/CopyFromProcessOption() can only handle > single option, and store the options in the opaque field, but it can not > check the relation of two options, for example, considering json format, > the `header` option can not be handled by these two functions. > > I want to find a way when the user specifies the header option, customer > handler can error out. Ah, you want to use a built-in option (such as "header") value from a custom handler, right? Hmm, it may be better that we call CopyToProcessOption()/CopyFromProcessOption() for all options including built-in options. Thanks, -- kou
On Fri, Jan 26, 2024 at 4:55 PM Sutou Kouhei <kou@clear-code.com> wrote: > > Hi, > > In <CAEG8a3KhS6s1XQgDSvc8vFTb4GkhBmS8TxOoVSDPFX+MPExxxQ@mail.gmail.com> > "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 26 Jan 2024 16:41:50 +0800, > Junwang Zhao <zhjwpku@gmail.com> wrote: > > > CopyToProcessOption()/CopyFromProcessOption() can only handle > > single option, and store the options in the opaque field, but it can not > > check the relation of two options, for example, considering json format, > > the `header` option can not be handled by these two functions. > > > > I want to find a way when the user specifies the header option, customer > > handler can error out. > > Ah, you want to use a built-in option (such as "header") > value from a custom handler, right? Hmm, it may be better > that we call CopyToProcessOption()/CopyFromProcessOption() > for all options including built-in options. > Hmm, still I don't think it can handle all cases, since we don't know the sequence of the options, we need all the options been parsed before we check the compatibility of the options, or customer handlers will need complicated logic to resolve that, which might lead to ugly code :( > > Thanks, > -- > kou -- Regards Junwang Zhao
Hi Kou-san, On Fri, Jan 26, 2024 at 5:02 PM Junwang Zhao <zhjwpku@gmail.com> wrote: > > On Fri, Jan 26, 2024 at 4:55 PM Sutou Kouhei <kou@clear-code.com> wrote: > > > > Hi, > > > > In <CAEG8a3KhS6s1XQgDSvc8vFTb4GkhBmS8TxOoVSDPFX+MPExxxQ@mail.gmail.com> > > "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 26 Jan 2024 16:41:50 +0800, > > Junwang Zhao <zhjwpku@gmail.com> wrote: > > > > > CopyToProcessOption()/CopyFromProcessOption() can only handle > > > single option, and store the options in the opaque field, but it can not > > > check the relation of two options, for example, considering json format, > > > the `header` option can not be handled by these two functions. > > > > > > I want to find a way when the user specifies the header option, customer > > > handler can error out. > > > > Ah, you want to use a built-in option (such as "header") > > value from a custom handler, right? Hmm, it may be better > > that we call CopyToProcessOption()/CopyFromProcessOption() > > for all options including built-in options. > > > Hmm, still I don't think it can handle all cases, since we don't know > the sequence of the options, we need all the options been parsed > before we check the compatibility of the options, or customer > handlers will need complicated logic to resolve that, which might > lead to ugly code :( > I have been working on a *COPY TO JSON* extension since yesterday, which is based on your V6 patch set, I'd like to give you more input so you can make better decisions about the implementation(with only pg-copy-arrow you might not get everything considered). V8 is based on V6, so anybody involved in the performance issue should still review the V7 patch set. 0001-0008 is your original V6 implementations 0009 is some changes made by me, I changed CopyToGetFormat to CopyToSendCopyBegin because pg_copy_json need to send different bytes in SendCopyBegin, get the format code along is not enough, I once had a thought that may be we should merge SendCopyBegin/SendCopyEnd into CopyToStart/CopyToEnd but I don't do that in this patch. I have also exported more APIs for extension usage. 00010 is the pg_copy_json extension, I think this should be a good case which can utilize the *extendable copy format* feature, maybe we should delete copy_test_format if we have this extension as an example? > > > > Thanks, > > -- > > kou > > > > -- > Regards > Junwang Zhao -- Regards Junwang Zhao
Вложения
- v8-0004-Add-support-for-implementing-custom-COPY-TO-forma.patch
- v8-0001-Extract-COPY-TO-format-implementations.patch
- v8-0002-Add-support-for-adding-custom-COPY-TO-format.patch
- v8-0003-Export-CopyToStateData.patch
- v8-0005-Extract-COPY-FROM-format-implementations.patch
- v8-0009-change-CopyToGetFormat-to-CopyToSendCopyBegin-and.patch
- v8-0006-Add-support-for-adding-custom-COPY-FROM-format.patch
- v8-0010-introduce-contrib-pg_copy_json.patch
- v8-0008-Add-support-for-implementing-custom-COPY-FROM-for.patch
- v8-0007-Export-CopyFromStateData.patch
On Fri, Jan 26, 2024 at 6:02 PM Junwang Zhao <zhjwpku@gmail.com> wrote: > > On Fri, Jan 26, 2024 at 4:55 PM Sutou Kouhei <kou@clear-code.com> wrote: > > > > Hi, > > > > In <CAEG8a3KhS6s1XQgDSvc8vFTb4GkhBmS8TxOoVSDPFX+MPExxxQ@mail.gmail.com> > > "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 26 Jan 2024 16:41:50 +0800, > > Junwang Zhao <zhjwpku@gmail.com> wrote: > > > > > CopyToProcessOption()/CopyFromProcessOption() can only handle > > > single option, and store the options in the opaque field, but it can not > > > check the relation of two options, for example, considering json format, > > > the `header` option can not be handled by these two functions. > > > > > > I want to find a way when the user specifies the header option, customer > > > handler can error out. > > > > Ah, you want to use a built-in option (such as "header") > > value from a custom handler, right? Hmm, it may be better > > that we call CopyToProcessOption()/CopyFromProcessOption() > > for all options including built-in options. > > > Hmm, still I don't think it can handle all cases, since we don't know > the sequence of the options, we need all the options been parsed > before we check the compatibility of the options, or customer > handlers will need complicated logic to resolve that, which might > lead to ugly code :( > Does it make sense to pass only non-builtin options to the custom format callback after parsing and evaluating the builtin options? That is, we parse and evaluate only the builtin options and populate opts_out first, then pass each rest option to the custom format handler callback. The callback can refer to the builtin option values. The callback is expected to return false if the passed option is not supported. If one of the builtin formats is specified and the rest options list has at least one option, we raise "option %s not recognized" error. IOW it's the core's responsibility to ranse the "option %s not recognized" error, which is in order to raise a consistent error message. Also, I think the core should check the redundant options including bultiin and custom options. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
On Mon, Jan 29, 2024 at 10:42 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Fri, Jan 26, 2024 at 6:02 PM Junwang Zhao <zhjwpku@gmail.com> wrote: > > > > On Fri, Jan 26, 2024 at 4:55 PM Sutou Kouhei <kou@clear-code.com> wrote: > > > > > > Hi, > > > > > > In <CAEG8a3KhS6s1XQgDSvc8vFTb4GkhBmS8TxOoVSDPFX+MPExxxQ@mail.gmail.com> > > > "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 26 Jan 2024 16:41:50 +0800, > > > Junwang Zhao <zhjwpku@gmail.com> wrote: > > > > > > > CopyToProcessOption()/CopyFromProcessOption() can only handle > > > > single option, and store the options in the opaque field, but it can not > > > > check the relation of two options, for example, considering json format, > > > > the `header` option can not be handled by these two functions. > > > > > > > > I want to find a way when the user specifies the header option, customer > > > > handler can error out. > > > > > > Ah, you want to use a built-in option (such as "header") > > > value from a custom handler, right? Hmm, it may be better > > > that we call CopyToProcessOption()/CopyFromProcessOption() > > > for all options including built-in options. > > > > > Hmm, still I don't think it can handle all cases, since we don't know > > the sequence of the options, we need all the options been parsed > > before we check the compatibility of the options, or customer > > handlers will need complicated logic to resolve that, which might > > lead to ugly code :( > > > > Does it make sense to pass only non-builtin options to the custom > format callback after parsing and evaluating the builtin options? That > is, we parse and evaluate only the builtin options and populate > opts_out first, then pass each rest option to the custom format > handler callback. The callback can refer to the builtin option values. Yeah, I think this makes sense. > The callback is expected to return false if the passed option is not > supported. If one of the builtin formats is specified and the rest > options list has at least one option, we raise "option %s not > recognized" error. IOW it's the core's responsibility to ranse the > "option %s not recognized" error, which is in order to raise a > consistent error message. Also, I think the core should check the > redundant options including bultiin and custom options. It would be good that core could check all the redundant options, but where should core do the book-keeping of all the options? I have no idea about this, in my implementation of pg_copy_json extension, I handle redundant options by adding a xxx_specified field for each xxx. > > Regards, > > -- > Masahiko Sawada > Amazon Web Services: https://aws.amazon.com -- Regards Junwang Zhao
On Mon, Jan 29, 2024 at 12:10 PM Junwang Zhao <zhjwpku@gmail.com> wrote: > > On Mon, Jan 29, 2024 at 10:42 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Fri, Jan 26, 2024 at 6:02 PM Junwang Zhao <zhjwpku@gmail.com> wrote: > > > > > > On Fri, Jan 26, 2024 at 4:55 PM Sutou Kouhei <kou@clear-code.com> wrote: > > > > > > > > Hi, > > > > > > > > In <CAEG8a3KhS6s1XQgDSvc8vFTb4GkhBmS8TxOoVSDPFX+MPExxxQ@mail.gmail.com> > > > > "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 26 Jan 2024 16:41:50 +0800, > > > > Junwang Zhao <zhjwpku@gmail.com> wrote: > > > > > > > > > CopyToProcessOption()/CopyFromProcessOption() can only handle > > > > > single option, and store the options in the opaque field, but it can not > > > > > check the relation of two options, for example, considering json format, > > > > > the `header` option can not be handled by these two functions. > > > > > > > > > > I want to find a way when the user specifies the header option, customer > > > > > handler can error out. > > > > > > > > Ah, you want to use a built-in option (such as "header") > > > > value from a custom handler, right? Hmm, it may be better > > > > that we call CopyToProcessOption()/CopyFromProcessOption() > > > > for all options including built-in options. > > > > > > > Hmm, still I don't think it can handle all cases, since we don't know > > > the sequence of the options, we need all the options been parsed > > > before we check the compatibility of the options, or customer > > > handlers will need complicated logic to resolve that, which might > > > lead to ugly code :( > > > > > > > Does it make sense to pass only non-builtin options to the custom > > format callback after parsing and evaluating the builtin options? That > > is, we parse and evaluate only the builtin options and populate > > opts_out first, then pass each rest option to the custom format > > handler callback. The callback can refer to the builtin option values. > > Yeah, I think this makes sense. > > > The callback is expected to return false if the passed option is not > > supported. If one of the builtin formats is specified and the rest > > options list has at least one option, we raise "option %s not > > recognized" error. IOW it's the core's responsibility to ranse the > > "option %s not recognized" error, which is in order to raise a > > consistent error message. Also, I think the core should check the > > redundant options including bultiin and custom options. > > It would be good that core could check all the redundant options, > but where should core do the book-keeping of all the options? I have > no idea about this, in my implementation of pg_copy_json extension, > I handle redundant options by adding a xxx_specified field for each > xxx. What I imagined is that while parsing the all specified options, we evaluate builtin options and we add non-builtin options to another list. Then when parsing a non-builtin option, we check if this option already exists in the list. If there is, we raise the "option %s not recognized" error.". Once we complete checking all options, we pass each option in the list to the callback. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
On Mon, Jan 29, 2024 at 11:22 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Mon, Jan 29, 2024 at 12:10 PM Junwang Zhao <zhjwpku@gmail.com> wrote: > > > > On Mon, Jan 29, 2024 at 10:42 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > On Fri, Jan 26, 2024 at 6:02 PM Junwang Zhao <zhjwpku@gmail.com> wrote: > > > > > > > > On Fri, Jan 26, 2024 at 4:55 PM Sutou Kouhei <kou@clear-code.com> wrote: > > > > > > > > > > Hi, > > > > > > > > > > In <CAEG8a3KhS6s1XQgDSvc8vFTb4GkhBmS8TxOoVSDPFX+MPExxxQ@mail.gmail.com> > > > > > "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 26 Jan 2024 16:41:50 +0800, > > > > > Junwang Zhao <zhjwpku@gmail.com> wrote: > > > > > > > > > > > CopyToProcessOption()/CopyFromProcessOption() can only handle > > > > > > single option, and store the options in the opaque field, but it can not > > > > > > check the relation of two options, for example, considering json format, > > > > > > the `header` option can not be handled by these two functions. > > > > > > > > > > > > I want to find a way when the user specifies the header option, customer > > > > > > handler can error out. > > > > > > > > > > Ah, you want to use a built-in option (such as "header") > > > > > value from a custom handler, right? Hmm, it may be better > > > > > that we call CopyToProcessOption()/CopyFromProcessOption() > > > > > for all options including built-in options. > > > > > > > > > Hmm, still I don't think it can handle all cases, since we don't know > > > > the sequence of the options, we need all the options been parsed > > > > before we check the compatibility of the options, or customer > > > > handlers will need complicated logic to resolve that, which might > > > > lead to ugly code :( > > > > > > > > > > Does it make sense to pass only non-builtin options to the custom > > > format callback after parsing and evaluating the builtin options? That > > > is, we parse and evaluate only the builtin options and populate > > > opts_out first, then pass each rest option to the custom format > > > handler callback. The callback can refer to the builtin option values. > > > > Yeah, I think this makes sense. > > > > > The callback is expected to return false if the passed option is not > > > supported. If one of the builtin formats is specified and the rest > > > options list has at least one option, we raise "option %s not > > > recognized" error. IOW it's the core's responsibility to ranse the > > > "option %s not recognized" error, which is in order to raise a > > > consistent error message. Also, I think the core should check the > > > redundant options including bultiin and custom options. > > > > It would be good that core could check all the redundant options, > > but where should core do the book-keeping of all the options? I have > > no idea about this, in my implementation of pg_copy_json extension, > > I handle redundant options by adding a xxx_specified field for each > > xxx. > > What I imagined is that while parsing the all specified options, we > evaluate builtin options and we add non-builtin options to another > list. Then when parsing a non-builtin option, we check if this option > already exists in the list. If there is, we raise the "option %s not > recognized" error.". Once we complete checking all options, we pass > each option in the list to the callback. LGTM. > > Regards, > > -- > Masahiko Sawada > Amazon Web Services: https://aws.amazon.com -- Regards Junwang Zhao
Hi,
In <CAEG8a3JDPks7XU5-NvzjzuKQYQqR8pDfS7CDGZonQTXfdWtnnw@mail.gmail.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Sat, 27 Jan 2024 14:15:02 +0800,
  Junwang Zhao <zhjwpku@gmail.com> wrote:
> I have been working on a *COPY TO JSON* extension since yesterday,
> which is based on your V6 patch set, I'd like to give you more input
> so you can make better decisions about the implementation(with only
> pg-copy-arrow you might not get everything considered).
Thanks!
> 0009 is some changes made by me, I changed CopyToGetFormat to
> CopyToSendCopyBegin because pg_copy_json need to send different bytes
> in SendCopyBegin, get the format code along is not enough
Oh, I haven't cared about the case.
How about the following API instead?
static void
SendCopyBegin(CopyToState cstate)
{
    StringInfoData buf;
    pq_beginmessage(&buf, PqMsg_CopyOutResponse);
    cstate->opts.to_routine->CopyToFillCopyOutResponse(cstate, &buf);
    pq_endmessage(&buf);
    cstate->copy_dest = COPY_FRONTEND;
}
static void
CopyToJsonFillCopyOutResponse(CopyToState cstate, StringInfoData &buf)
{
    int16        format = 0;
    pq_sendbyte(&buf, format);      /* overall format */
    /*
     * JSON mode is always one non-binary column
     */
    pq_sendint16(&buf, 1);
    pq_sendint16(&buf, format);
}
> 00010 is the pg_copy_json extension, I think this should be a good
> case which can utilize the *extendable copy format* feature
It seems that it's convenient that we have one more callback
for initializing CopyToState::opaque. It's called only once
when Copy{To,From}Routine is chosen:
typedef struct CopyToRoutine
{
    void        (*CopyToInit) (CopyToState cstate);
...
};
void
ProcessCopyOptions(ParseState *pstate,
                   CopyFormatOptions *opts_out,
                   bool is_from,
                   void *cstate,
                   List *options)
{
...
    foreach(option, options)
    {
        DefElem    *defel = lfirst_node(DefElem, option);
        if (strcmp(defel->defname, "format") == 0)
        {
            ...
            opts_out->to_routine = &CopyToRoutineXXX;
            opts_out->to_routine->CopyToInit(cstate);
            ...
        }
    }
...
}
>                                                              maybe we
> should delete copy_test_format if we have this extension as an
> example?
I haven't read the COPY TO format json thread[1] carefully
(sorry), but we may add the JSON format as a built-in
format. If we do it, copy_test_format is useful to test the
extension API.
[1] https://www.postgresql.org/message-id/flat/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
Thanks,
-- 
kou
			
		On Mon, Jan 29, 2024 at 2:03 PM Sutou Kouhei <kou@clear-code.com> wrote:
>
> Hi,
>
> In <CAEG8a3JDPks7XU5-NvzjzuKQYQqR8pDfS7CDGZonQTXfdWtnnw@mail.gmail.com>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Sat, 27 Jan 2024 14:15:02 +0800,
>   Junwang Zhao <zhjwpku@gmail.com> wrote:
>
> > I have been working on a *COPY TO JSON* extension since yesterday,
> > which is based on your V6 patch set, I'd like to give you more input
> > so you can make better decisions about the implementation(with only
> > pg-copy-arrow you might not get everything considered).
>
> Thanks!
>
> > 0009 is some changes made by me, I changed CopyToGetFormat to
> > CopyToSendCopyBegin because pg_copy_json need to send different bytes
> > in SendCopyBegin, get the format code along is not enough
>
> Oh, I haven't cared about the case.
> How about the following API instead?
>
> static void
> SendCopyBegin(CopyToState cstate)
> {
>         StringInfoData buf;
>
>         pq_beginmessage(&buf, PqMsg_CopyOutResponse);
>         cstate->opts.to_routine->CopyToFillCopyOutResponse(cstate, &buf);
>         pq_endmessage(&buf);
>         cstate->copy_dest = COPY_FRONTEND;
> }
>
> static void
> CopyToJsonFillCopyOutResponse(CopyToState cstate, StringInfoData &buf)
> {
>         int16           format = 0;
>
>         pq_sendbyte(&buf, format);      /* overall format */
>         /*
>          * JSON mode is always one non-binary column
>          */
>         pq_sendint16(&buf, 1);
>         pq_sendint16(&buf, format);
> }
Make sense to me.
>
> > 00010 is the pg_copy_json extension, I think this should be a good
> > case which can utilize the *extendable copy format* feature
>
> It seems that it's convenient that we have one more callback
> for initializing CopyToState::opaque. It's called only once
> when Copy{To,From}Routine is chosen:
>
> typedef struct CopyToRoutine
> {
>         void            (*CopyToInit) (CopyToState cstate);
> ...
> };
I like this, we can alloc private data in this hook.
>
> void
> ProcessCopyOptions(ParseState *pstate,
>                                    CopyFormatOptions *opts_out,
>                                    bool is_from,
>                                    void *cstate,
>                                    List *options)
> {
> ...
>         foreach(option, options)
>         {
>                 DefElem    *defel = lfirst_node(DefElem, option);
>
>                 if (strcmp(defel->defname, "format") == 0)
>                 {
>                         ...
>                         opts_out->to_routine = &CopyToRoutineXXX;
>                         opts_out->to_routine->CopyToInit(cstate);
>                         ...
>                 }
>         }
> ...
> }
>
>
> >                                                              maybe we
> > should delete copy_test_format if we have this extension as an
> > example?
>
> I haven't read the COPY TO format json thread[1] carefully
> (sorry), but we may add the JSON format as a built-in
> format. If we do it, copy_test_format is useful to test the
> extension API.
Yeah, maybe, I have no strong opinion here, pg_copy_json is
just a toy extension for discussion.
>
> [1] https://www.postgresql.org/message-id/flat/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU%3Dkcg%40mail.gmail.com
>
>
> Thanks,
> --
> kou
--
Regards
Junwang Zhao
			
		Hi, In <CAEG8a3Jnmbjw82OiSvRK3v9XN2zSshsB8ew1mZCQDAkKq6r9YQ@mail.gmail.com> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 29 Jan 2024 11:37:07 +0800, Junwang Zhao <zhjwpku@gmail.com> wrote: >> > > Does it make sense to pass only non-builtin options to the custom >> > > format callback after parsing and evaluating the builtin options? That >> > > is, we parse and evaluate only the builtin options and populate >> > > opts_out first, then pass each rest option to the custom format >> > > handler callback. The callback can refer to the builtin option values. >> >> What I imagined is that while parsing the all specified options, we >> evaluate builtin options and we add non-builtin options to another >> list. Then when parsing a non-builtin option, we check if this option >> already exists in the list. If there is, we raise the "option %s not >> recognized" error.". Once we complete checking all options, we pass >> each option in the list to the callback. I implemented this idea and the following ideas: 1. Add init callback for initialization 2. Change GetFormat() to FillCopyXXXResponse() because JSON format always use 1 column 3. FROM only: Eliminate more cstate->opts.csv_mode branches (This is for performance.) See the attached v9 patch set for details. Changes since v7: 0001: * Move CopyToProcessOption() calls to the end of ProcessCopyOptions() for easy to option validation * Add CopyToState::CopyToInit() and call it in ProcessCopyOptionFormatTo() * Change CopyToState::CopyToGetFormat() to CopyToState::CopyToFillCopyOutResponse() and use it in SendCopyBegin() 0002: * Move CopyFromProcessOption() calls to the end of ProcessCopyOptions() for easy to option validation * Add CopyFromState::CopyFromInit() and call it in ProcessCopyOptionFormatFrom() * Change CopyFromState::CopyFromGetFormat() to CopyFromState::CopyFromFillCopyOutResponse() and use it in ReceiveCopyBegin() * Rename NextCopyFromRawFields() to NextCopyFromRawFieldsInternal() and pass the read attributes callback explicitly to eliminate more cstate->opts.csv_mode branches Thanks, -- kou From c136833f4a385574474b246a381014abeb631377 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Fri, 26 Jan 2024 16:46:51 +0900 Subject: [PATCH v9 1/2] Extract COPY TO format implementations This is a part of making COPY format extendable. See also these past discussions: * New Copy Formats - avro/orc/parquet: https://www.postgresql.org/message-id/flat/20180210151304.fonjztsynewldfba%40gmail.com * Make COPY extendable in order to support Parquet and other formats: https://www.postgresql.org/message-id/flat/CAJ7c6TM6Bz1c3F04Cy6%2BSzuWfKmr0kU8c_3Stnvh_8BR0D6k8Q%40mail.gmail.com This doesn't change the current behavior. This just introduces CopyToRoutine, which just has function pointers of format implementation like TupleTableSlotOps, and use it for existing "text", "csv" and "binary" format implementations. Note that CopyToRoutine can't be used from extensions yet because CopySend*() aren't exported yet. Extensions can't send formatted data to a destination without CopySend*(). They will be exported by subsequent patches. Here is a benchmark result with/without this change because there was a discussion that we should care about performance regression: https://www.postgresql.org/message-id/3741749.1655952719%40sss.pgh.pa.us > I think that step 1 ought to be to convert the existing formats into > plug-ins, and demonstrate that there's no significant loss of > performance. You can see that there is no significant loss of performance: Data: Random 32 bit integers: CREATE TABLE data (int32 integer); SELECT setseed(0.29); INSERT INTO data SELECT random() * 10000 FROM generate_series(1, ${n_records}); The number of records: 100K, 1M and 10M 100K without this change: format,elapsed time (ms) text,10.561 csv,10.868 binary,10.287 100K with this change: format,elapsed time (ms) text,9.962 csv,10.453 binary,9.473 1M without this change: format,elapsed time (ms) text,103.265 csv,109.789 binary,104.078 1M with this change: format,elapsed time (ms) text,98.612 csv,101.908 binary,94.456 10M without this change: format,elapsed time (ms) text,1060.614 csv,1065.272 binary,1025.875 10M with this change: format,elapsed time (ms) text,1020.050 csv,1031.279 binary,954.792 --- contrib/file_fdw/file_fdw.c | 2 +- src/backend/commands/copy.c | 82 ++++- src/backend/commands/copyfrom.c | 2 +- src/backend/commands/copyto.c | 587 +++++++++++++++++++++++--------- src/include/commands/copy.h | 8 +- src/include/commands/copyapi.h | 62 ++++ 6 files changed, 560 insertions(+), 183 deletions(-) create mode 100644 src/include/commands/copyapi.h diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c index 249d82d3a0..9e4e819858 100644 --- a/contrib/file_fdw/file_fdw.c +++ b/contrib/file_fdw/file_fdw.c @@ -329,7 +329,7 @@ file_fdw_validator(PG_FUNCTION_ARGS) /* * Now apply the core COPY code's validation logic for more checks. */ - ProcessCopyOptions(NULL, NULL, true, other_options); + ProcessCopyOptions(NULL, NULL, true, NULL, other_options); /* * Either filename or program option is required for file_fdw foreign diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index cc0786c6f4..dd0fe7f0bb 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -442,6 +442,9 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from) * a list of options. In that usage, 'opts_out' can be passed as NULL and * the collected data is just leaked until CurrentMemoryContext is reset. * + * 'cstate' is CopyToState* for !is_from, CopyFromState* for is_from. 'cstate' + * may be NULL. For example, file_fdw uses NULL. + * * Note that additional checking, such as whether column names listed in FORCE * QUOTE actually exist, has to be applied later. This just checks for * self-consistency of the options list. @@ -450,6 +453,7 @@ void ProcessCopyOptions(ParseState *pstate, CopyFormatOptions *opts_out, bool is_from, + void *cstate, List *options) { bool format_specified = false; @@ -457,6 +461,7 @@ ProcessCopyOptions(ParseState *pstate, bool header_specified = false; bool on_error_specified = false; ListCell *option; + List *unknown_options = NIL; /* Support external use for option sanity checking */ if (opts_out == NULL) @@ -464,30 +469,58 @@ ProcessCopyOptions(ParseState *pstate, opts_out->file_encoding = -1; - /* Extract options from the statement node tree */ + /* + * Extract only the "format" option to detect target routine as the first + * step + */ foreach(option, options) { DefElem *defel = lfirst_node(DefElem, option); if (strcmp(defel->defname, "format") == 0) { - char *fmt = defGetString(defel); - if (format_specified) errorConflictingDefElem(defel, pstate); format_specified = true; - if (strcmp(fmt, "text") == 0) - /* default format */ ; - else if (strcmp(fmt, "csv") == 0) - opts_out->csv_mode = true; - else if (strcmp(fmt, "binary") == 0) - opts_out->binary = true; + + if (is_from) + { + char *fmt = defGetString(defel); + + if (strcmp(fmt, "text") == 0) + /* default format */ ; + else if (strcmp(fmt, "csv") == 0) + { + opts_out->csv_mode = true; + } + else if (strcmp(fmt, "binary") == 0) + { + opts_out->binary = true; + } + else + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("COPY format \"%s\" not recognized", fmt), + parser_errposition(pstate, defel->location))); + } else - ereport(ERROR, - (errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("COPY format \"%s\" not recognized", fmt), - parser_errposition(pstate, defel->location))); + ProcessCopyOptionFormatTo(pstate, opts_out, cstate, defel); } + } + if (!format_specified) + /* Set the default format. */ + ProcessCopyOptionFormatTo(pstate, opts_out, cstate, NULL); + + /* + * Extract options except "format" from the statement node tree. Unknown + * options are processed later. + */ + foreach(option, options) + { + DefElem *defel = lfirst_node(DefElem, option); + + if (strcmp(defel->defname, "format") == 0) + continue; else if (strcmp(defel->defname, "freeze") == 0) { if (freeze_specified) @@ -616,11 +649,7 @@ ProcessCopyOptions(ParseState *pstate, opts_out->on_error = defGetCopyOnErrorChoice(defel, pstate, is_from); } else - ereport(ERROR, - (errcode(ERRCODE_SYNTAX_ERROR), - errmsg("option \"%s\" not recognized", - defel->defname), - parser_errposition(pstate, defel->location))); + unknown_options = lappend(unknown_options, defel); } /* @@ -821,6 +850,23 @@ ProcessCopyOptions(ParseState *pstate, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("NULL specification and DEFAULT specification cannot be the same"))); } + + /* Process not built-in options. */ + foreach(option, unknown_options) + { + DefElem *defel = lfirst_node(DefElem, option); + bool processed = false; + + if (!is_from) + processed = opts_out->to_routine->CopyToProcessOption(cstate, defel); + if (!processed) + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("option \"%s\" not recognized", + defel->defname), + parser_errposition(pstate, defel->location))); + } + list_free(unknown_options); } /* diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index 1fe70b9133..fb3d4d9296 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -1416,7 +1416,7 @@ BeginCopyFrom(ParseState *pstate, oldcontext = MemoryContextSwitchTo(cstate->copycontext); /* Extract options from the statement node tree */ - ProcessCopyOptions(pstate, &cstate->opts, true /* is_from */ , options); + ProcessCopyOptions(pstate, &cstate->opts, true /* is_from */ , cstate, options); /* Process the target relation */ cstate->rel = rel; diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c index d3dc3fc854..4fb41f04fc 100644 --- a/src/backend/commands/copyto.c +++ b/src/backend/commands/copyto.c @@ -24,6 +24,7 @@ #include "access/xact.h" #include "access/xlog.h" #include "commands/copy.h" +#include "commands/defrem.h" #include "commands/progress.h" #include "executor/execdesc.h" #include "executor/executor.h" @@ -131,6 +132,427 @@ static void CopySendEndOfRow(CopyToState cstate); static void CopySendInt32(CopyToState cstate, int32 val); static void CopySendInt16(CopyToState cstate, int16 val); +/* + * CopyToRoutine implementations. + */ + +/* + * CopyToRoutine implementation for "text" and "csv". CopyToTextBased*() are + * shared by both of "text" and "csv". CopyToText*() are only for "text" and + * CopyToCSV*() are only for "csv". + * + * We can use the same functions for all callbacks by referring + * cstate->opts.csv_mode but splitting callbacks to eliminate "if + * (cstate->opts.csv_mode)" branches from all callbacks has performance + * merit when many tuples are copied. So we use separated callbacks for "text" + * and "csv". + */ + +static void +CopyToTextBasedInit(CopyToState cstate) +{ +} + +/* + * All "text" and "csv" options are parsed in ProcessCopyOptions(). We may + * move the code to here later. + */ +static bool +CopyToTextBasedProcessOption(CopyToState cstate, DefElem *defel) +{ + return false; +} + +static void +CopyToTextBasedFillCopyOutResponse(CopyToState cstate, StringInfoData *buf) +{ + int16 format = 0; + int natts = list_length(cstate->attnumlist); + int i; + + pq_sendbyte(buf, format); /* overall format */ + pq_sendint16(buf, natts); + for (i = 0; i < natts; i++) + pq_sendint16(buf, format); /* per-column formats */ +} + +static void +CopyToTextBasedSendEndOfRow(CopyToState cstate) +{ + switch (cstate->copy_dest) + { + case COPY_FILE: + /* Default line termination depends on platform */ +#ifndef WIN32 + CopySendChar(cstate, '\n'); +#else + CopySendString(cstate, "\r\n"); +#endif + break; + case COPY_FRONTEND: + /* The FE/BE protocol uses \n as newline for all platforms */ + CopySendChar(cstate, '\n'); + break; + default: + break; + } + CopySendEndOfRow(cstate); +} + +typedef void (*CopyAttributeOutHeaderFunction) (CopyToState cstate, char *string); + +/* + * We can use CopyAttributeOutText() directly but define this for consistency + * with CopyAttributeOutCSVHeader(). "static inline" will prevent performance + * penalty by this wrapping. + */ +static inline void +CopyAttributeOutTextHeader(CopyToState cstate, char *string) +{ + CopyAttributeOutText(cstate, string); +} + +static inline void +CopyAttributeOutCSVHeader(CopyToState cstate, char *string) +{ + CopyAttributeOutCSV(cstate, string, false, + list_length(cstate->attnumlist) == 1); +} + +/* + * We don't use this function as a callback directly. We define + * CopyToTextStart() and CopyToCSVStart() and use them instead. It's for + * eliminating a "if (cstate->opts.csv_mode)" branch. This callback is called + * only once per COPY TO. So this optimization may be meaningless but done for + * consistency with CopyToTextBasedOneRow(). + * + * This must initialize cstate->out_functions for CopyToTextBasedOneRow(). + */ +static inline void +CopyToTextBasedStart(CopyToState cstate, TupleDesc tupDesc, CopyAttributeOutHeaderFunction out) +{ + int num_phys_attrs; + ListCell *cur; + + num_phys_attrs = tupDesc->natts; + /* Get info about the columns we need to process. */ + cstate->out_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo)); + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + Oid out_func_oid; + bool isvarlena; + Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1); + + getTypeOutputInfo(attr->atttypid, &out_func_oid, &isvarlena); + fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]); + } + + /* + * For non-binary copy, we need to convert null_print to file encoding, + * because it will be sent directly with CopySendString. + */ + if (cstate->need_transcoding) + cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print, + cstate->opts.null_print_len, + cstate->file_encoding); + + /* if a header has been requested send the line */ + if (cstate->opts.header_line) + { + bool hdr_delim = false; + + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + char *colname; + + if (hdr_delim) + CopySendChar(cstate, cstate->opts.delim[0]); + hdr_delim = true; + + colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname); + + out(cstate, colname); + } + + CopyToTextBasedSendEndOfRow(cstate); + } +} + +static void +CopyToTextStart(CopyToState cstate, TupleDesc tupDesc) +{ + CopyToTextBasedStart(cstate, tupDesc, CopyAttributeOutTextHeader); +} + +static void +CopyToCSVStart(CopyToState cstate, TupleDesc tupDesc) +{ + CopyToTextBasedStart(cstate, tupDesc, CopyAttributeOutCSVHeader); +} + +typedef void (*CopyAttributeOutValueFunction) (CopyToState cstate, char *string, int attnum); + +static inline void +CopyAttributeOutTextValue(CopyToState cstate, char *string, int attnum) +{ + CopyAttributeOutText(cstate, string); +} + +static inline void +CopyAttributeOutCSVValue(CopyToState cstate, char *string, int attnum) +{ + CopyAttributeOutCSV(cstate, string, + cstate->opts.force_quote_flags[attnum - 1], + list_length(cstate->attnumlist) == 1); +} + +/* + * We don't use this function as a callback directly. We define + * CopyToTextOneRow() and CopyToCSVOneRow() and use them instead. It's for + * eliminating a "if (cstate->opts.csv_mode)" branch. This callback is called + * per tuple. So this optimization will be valuable when many tuples are + * copied. + * + * cstate->out_functions must be initialized in CopyToTextBasedStart(). + */ +static void +CopyToTextBasedOneRow(CopyToState cstate, TupleTableSlot *slot, CopyAttributeOutValueFunction out) +{ + bool need_delim = false; + FmgrInfo *out_functions = cstate->out_functions; + ListCell *cur; + + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + Datum value = slot->tts_values[attnum - 1]; + bool isnull = slot->tts_isnull[attnum - 1]; + + if (need_delim) + CopySendChar(cstate, cstate->opts.delim[0]); + need_delim = true; + + if (isnull) + { + CopySendString(cstate, cstate->opts.null_print_client); + } + else + { + char *string; + + string = OutputFunctionCall(&out_functions[attnum - 1], value); + out(cstate, string, attnum); + } + } + + CopyToTextBasedSendEndOfRow(cstate); +} + +static void +CopyToTextOneRow(CopyToState cstate, TupleTableSlot *slot) +{ + CopyToTextBasedOneRow(cstate, slot, CopyAttributeOutTextValue); +} + +static void +CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot) +{ + CopyToTextBasedOneRow(cstate, slot, CopyAttributeOutCSVValue); +} + +static void +CopyToTextBasedEnd(CopyToState cstate) +{ +} + +/* + * CopyToRoutine implementation for "binary". + */ + +static void +CopyToBinaryInit(CopyToState cstate) +{ +} + +/* + * All "binary" options are parsed in ProcessCopyOptions(). We may move the + * code to here later. + */ +static bool +CopyToBinaryProcessOption(CopyToState cstate, DefElem *defel) +{ + return false; +} + +static void +CopyToBinaryFillCopyOutResponse(CopyToState cstate, StringInfoData *buf) +{ + int16 format = 1; + int natts = list_length(cstate->attnumlist); + int i; + + pq_sendbyte(buf, format); /* overall format */ + pq_sendint16(buf, natts); + for (i = 0; i < natts; i++) + pq_sendint16(buf, format); /* per-column formats */ +} + +/* + * This must initialize cstate->out_functions for CopyToBinaryOneRow(). + */ +static void +CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc) +{ + int num_phys_attrs; + ListCell *cur; + + num_phys_attrs = tupDesc->natts; + /* Get info about the columns we need to process. */ + cstate->out_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo)); + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + Oid out_func_oid; + bool isvarlena; + Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1); + + getTypeBinaryOutputInfo(attr->atttypid, &out_func_oid, &isvarlena); + fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]); + } + + { + /* Generate header for a binary copy */ + int32 tmp; + + /* Signature */ + CopySendData(cstate, BinarySignature, 11); + /* Flags field */ + tmp = 0; + CopySendInt32(cstate, tmp); + /* No header extension */ + tmp = 0; + CopySendInt32(cstate, tmp); + } +} + +/* + * cstate->out_functions must be initialized in CopyToBinaryStart(). + */ +static void +CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot) +{ + FmgrInfo *out_functions = cstate->out_functions; + ListCell *cur; + + /* Binary per-tuple header */ + CopySendInt16(cstate, list_length(cstate->attnumlist)); + + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + Datum value = slot->tts_values[attnum - 1]; + bool isnull = slot->tts_isnull[attnum - 1]; + + if (isnull) + { + CopySendInt32(cstate, -1); + } + else + { + bytea *outputbytes; + + outputbytes = SendFunctionCall(&out_functions[attnum - 1], value); + CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ); + CopySendData(cstate, VARDATA(outputbytes), + VARSIZE(outputbytes) - VARHDRSZ); + } + } + + CopySendEndOfRow(cstate); +} + +static void +CopyToBinaryEnd(CopyToState cstate) +{ + /* Generate trailer for a binary copy */ + CopySendInt16(cstate, -1); + /* Need to flush out the trailer */ + CopySendEndOfRow(cstate); +} + +/* + * CopyToTextBased*() are shared with "csv". CopyToText*() are only for "text". + */ +static const CopyToRoutine CopyToRoutineText = { + .CopyToInit = CopyToTextBasedInit, + .CopyToProcessOption = CopyToTextBasedProcessOption, + .CopyToFillCopyOutResponse = CopyToTextBasedFillCopyOutResponse, + .CopyToStart = CopyToTextStart, + .CopyToOneRow = CopyToTextOneRow, + .CopyToEnd = CopyToTextBasedEnd, +}; + +/* + * CopyToTextBased*() are shared with "text". CopyToCSV*() are only for "csv". + */ +static const CopyToRoutine CopyToRoutineCSV = { + .CopyToInit = CopyToTextBasedInit, + .CopyToProcessOption = CopyToTextBasedProcessOption, + .CopyToFillCopyOutResponse = CopyToTextBasedFillCopyOutResponse, + .CopyToStart = CopyToCSVStart, + .CopyToOneRow = CopyToCSVOneRow, + .CopyToEnd = CopyToTextBasedEnd, +}; + +static const CopyToRoutine CopyToRoutineBinary = { + .CopyToInit = CopyToBinaryInit, + .CopyToProcessOption = CopyToBinaryProcessOption, + .CopyToFillCopyOutResponse = CopyToBinaryFillCopyOutResponse, + .CopyToStart = CopyToBinaryStart, + .CopyToOneRow = CopyToBinaryOneRow, + .CopyToEnd = CopyToBinaryEnd, +}; + +/* + * Process the "format" option for COPY TO. + * + * If defel is NULL, the default format "text" is used. + */ +void +ProcessCopyOptionFormatTo(ParseState *pstate, + CopyFormatOptions *opts_out, + CopyToState cstate, + DefElem *defel) +{ + char *format; + + if (defel) + format = defGetString(defel); + else + format = "text"; + + if (strcmp(format, "text") == 0) + opts_out->to_routine = &CopyToRoutineText; + else if (strcmp(format, "csv") == 0) + { + opts_out->csv_mode = true; + opts_out->to_routine = &CopyToRoutineCSV; + } + else if (strcmp(format, "binary") == 0) + { + opts_out->binary = true; + opts_out->to_routine = &CopyToRoutineBinary; + } + else + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("COPY format \"%s\" not recognized", format), + parser_errposition(pstate, defel->location))); + + opts_out->to_routine->CopyToInit(cstate); +} /* * Send copy start/stop messages for frontend copies. These have changed @@ -140,15 +562,9 @@ static void SendCopyBegin(CopyToState cstate) { StringInfoData buf; - int natts = list_length(cstate->attnumlist); - int16 format = (cstate->opts.binary ? 1 : 0); - int i; pq_beginmessage(&buf, PqMsg_CopyOutResponse); - pq_sendbyte(&buf, format); /* overall format */ - pq_sendint16(&buf, natts); - for (i = 0; i < natts; i++) - pq_sendint16(&buf, format); /* per-column formats */ + cstate->opts.to_routine->CopyToFillCopyOutResponse(cstate, &buf); pq_endmessage(&buf); cstate->copy_dest = COPY_FRONTEND; } @@ -198,16 +614,6 @@ CopySendEndOfRow(CopyToState cstate) switch (cstate->copy_dest) { case COPY_FILE: - if (!cstate->opts.binary) - { - /* Default line termination depends on platform */ -#ifndef WIN32 - CopySendChar(cstate, '\n'); -#else - CopySendString(cstate, "\r\n"); -#endif - } - if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1, cstate->copy_file) != 1 || ferror(cstate->copy_file)) @@ -242,10 +648,6 @@ CopySendEndOfRow(CopyToState cstate) } break; case COPY_FRONTEND: - /* The FE/BE protocol uses \n as newline for all platforms */ - if (!cstate->opts.binary) - CopySendChar(cstate, '\n'); - /* Dump the accumulated row as one CopyData message */ (void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len); break; @@ -431,7 +833,7 @@ BeginCopyTo(ParseState *pstate, oldcontext = MemoryContextSwitchTo(cstate->copycontext); /* Extract options from the statement node tree */ - ProcessCopyOptions(pstate, &cstate->opts, false /* is_from */ , options); + ProcessCopyOptions(pstate, &cstate->opts, false /* is_from */ , cstate, options); /* Process the source/target relation or query */ if (rel) @@ -748,8 +1150,6 @@ DoCopyTo(CopyToState cstate) bool pipe = (cstate->filename == NULL && cstate->data_dest_cb == NULL); bool fe_copy = (pipe && whereToSendOutput == DestRemote); TupleDesc tupDesc; - int num_phys_attrs; - ListCell *cur; uint64 processed; if (fe_copy) @@ -759,32 +1159,11 @@ DoCopyTo(CopyToState cstate) tupDesc = RelationGetDescr(cstate->rel); else tupDesc = cstate->queryDesc->tupDesc; - num_phys_attrs = tupDesc->natts; cstate->opts.null_print_client = cstate->opts.null_print; /* default */ /* We use fe_msgbuf as a per-row buffer regardless of copy_dest */ cstate->fe_msgbuf = makeStringInfo(); - /* Get info about the columns we need to process. */ - cstate->out_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo)); - foreach(cur, cstate->attnumlist) - { - int attnum = lfirst_int(cur); - Oid out_func_oid; - bool isvarlena; - Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1); - - if (cstate->opts.binary) - getTypeBinaryOutputInfo(attr->atttypid, - &out_func_oid, - &isvarlena); - else - getTypeOutputInfo(attr->atttypid, - &out_func_oid, - &isvarlena); - fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]); - } - /* * Create a temporary memory context that we can reset once per row to * recover palloc'd memory. This avoids any problems with leaks inside @@ -795,57 +1174,7 @@ DoCopyTo(CopyToState cstate) "COPY TO", ALLOCSET_DEFAULT_SIZES); - if (cstate->opts.binary) - { - /* Generate header for a binary copy */ - int32 tmp; - - /* Signature */ - CopySendData(cstate, BinarySignature, 11); - /* Flags field */ - tmp = 0; - CopySendInt32(cstate, tmp); - /* No header extension */ - tmp = 0; - CopySendInt32(cstate, tmp); - } - else - { - /* - * For non-binary copy, we need to convert null_print to file - * encoding, because it will be sent directly with CopySendString. - */ - if (cstate->need_transcoding) - cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print, - cstate->opts.null_print_len, - cstate->file_encoding); - - /* if a header has been requested send the line */ - if (cstate->opts.header_line) - { - bool hdr_delim = false; - - foreach(cur, cstate->attnumlist) - { - int attnum = lfirst_int(cur); - char *colname; - - if (hdr_delim) - CopySendChar(cstate, cstate->opts.delim[0]); - hdr_delim = true; - - colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname); - - if (cstate->opts.csv_mode) - CopyAttributeOutCSV(cstate, colname, false, - list_length(cstate->attnumlist) == 1); - else - CopyAttributeOutText(cstate, colname); - } - - CopySendEndOfRow(cstate); - } - } + cstate->opts.to_routine->CopyToStart(cstate, tupDesc); if (cstate->rel) { @@ -884,13 +1213,7 @@ DoCopyTo(CopyToState cstate) processed = ((DR_copy *) cstate->queryDesc->dest)->processed; } - if (cstate->opts.binary) - { - /* Generate trailer for a binary copy */ - CopySendInt16(cstate, -1); - /* Need to flush out the trailer */ - CopySendEndOfRow(cstate); - } + cstate->opts.to_routine->CopyToEnd(cstate); MemoryContextDelete(cstate->rowcontext); @@ -906,71 +1229,15 @@ DoCopyTo(CopyToState cstate) static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot) { - bool need_delim = false; - FmgrInfo *out_functions = cstate->out_functions; MemoryContext oldcontext; - ListCell *cur; - char *string; MemoryContextReset(cstate->rowcontext); oldcontext = MemoryContextSwitchTo(cstate->rowcontext); - if (cstate->opts.binary) - { - /* Binary per-tuple header */ - CopySendInt16(cstate, list_length(cstate->attnumlist)); - } - /* Make sure the tuple is fully deconstructed */ slot_getallattrs(slot); - foreach(cur, cstate->attnumlist) - { - int attnum = lfirst_int(cur); - Datum value = slot->tts_values[attnum - 1]; - bool isnull = slot->tts_isnull[attnum - 1]; - - if (!cstate->opts.binary) - { - if (need_delim) - CopySendChar(cstate, cstate->opts.delim[0]); - need_delim = true; - } - - if (isnull) - { - if (!cstate->opts.binary) - CopySendString(cstate, cstate->opts.null_print_client); - else - CopySendInt32(cstate, -1); - } - else - { - if (!cstate->opts.binary) - { - string = OutputFunctionCall(&out_functions[attnum - 1], - value); - if (cstate->opts.csv_mode) - CopyAttributeOutCSV(cstate, string, - cstate->opts.force_quote_flags[attnum - 1], - list_length(cstate->attnumlist) == 1); - else - CopyAttributeOutText(cstate, string); - } - else - { - bytea *outputbytes; - - outputbytes = SendFunctionCall(&out_functions[attnum - 1], - value); - CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ); - CopySendData(cstate, VARDATA(outputbytes), - VARSIZE(outputbytes) - VARHDRSZ); - } - } - } - - CopySendEndOfRow(cstate); + cstate->opts.to_routine->CopyToOneRow(cstate, slot); MemoryContextSwitchTo(oldcontext); } diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h index b3da3cb0be..de316cfd81 100644 --- a/src/include/commands/copy.h +++ b/src/include/commands/copy.h @@ -14,6 +14,7 @@ #ifndef COPY_H #define COPY_H +#include "commands/copyapi.h" #include "nodes/execnodes.h" #include "nodes/parsenodes.h" #include "parser/parse_node.h" @@ -74,11 +75,11 @@ typedef struct CopyFormatOptions bool convert_selectively; /* do selective binary conversion? */ CopyOnErrorChoice on_error; /* what to do when error happened */ List *convert_select; /* list of column names (can be NIL) */ + const CopyToRoutine *to_routine; /* callback routines for COPY TO */ } CopyFormatOptions; -/* These are private in commands/copy[from|to].c */ +/* This is private in commands/copyfrom.c */ typedef struct CopyFromStateData *CopyFromState; -typedef struct CopyToStateData *CopyToState; typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread); typedef void (*copy_data_dest_cb) (void *data, int len); @@ -87,7 +88,8 @@ extern void DoCopy(ParseState *pstate, const CopyStmt *stmt, int stmt_location, int stmt_len, uint64 *processed); -extern void ProcessCopyOptions(ParseState *pstate, CopyFormatOptions *opts_out, bool is_from, List *options); +extern void ProcessCopyOptions(ParseState *pstate, CopyFormatOptions *opts_out, bool is_from, void *cstate, List *options); +extern void ProcessCopyOptionFormatTo(ParseState *pstate, CopyFormatOptions *opts_out, CopyToState cstate, DefElem *defel); extern CopyFromState BeginCopyFrom(ParseState *pstate, Relation rel, Node *whereClause, const char *filename, bool is_program, copy_data_source_cb data_source_cb, List *attnamelist, List *options); diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h new file mode 100644 index 0000000000..f8901cac51 --- /dev/null +++ b/src/include/commands/copyapi.h @@ -0,0 +1,62 @@ +/*------------------------------------------------------------------------- + * + * copyapi.h + * API for COPY TO/FROM handlers + * + * + * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + * src/include/commands/copyapi.h + * + *------------------------------------------------------------------------- + */ +#ifndef COPYAPI_H +#define COPYAPI_H + +#include "executor/tuptable.h" +#include "nodes/parsenodes.h" + +/* This is private in commands/copyto.c */ +typedef struct CopyToStateData *CopyToState; + +/* Routines for a COPY TO format implementation. */ +typedef struct CopyToRoutine +{ + /* + * Called when this CopyToRoutine is chosen. This can be used for + * initialization. + */ + void (*CopyToInit) (CopyToState cstate); + + /* + * Called for processing one COPY TO option. This will return false when + * the given option is invalid. + */ + bool (*CopyToProcessOption) (CopyToState cstate, DefElem *defel); + + /* + * Called when COPY TO via the PostgreSQL protocol is started. This must + * fill buf as a valid CopyOutResponse message: + * + */ + /*-- + * +--------+--------+--------+--------+--------+ +--------+--------+ + * | Format | N attributes | Attr1's format |...| AttrN's format | + * +--------+--------+--------+--------+--------+ +--------+--------+ + * 0: text 0: text 0: text + * 1: binary 1: binary 1: binary + */ + void (*CopyToFillCopyOutResponse) (CopyToState cstate, StringInfoData *buf); + + /* Called when COPY TO is started. This will send a header. */ + void (*CopyToStart) (CopyToState cstate, TupleDesc tupDesc); + + /* Copy one row for COPY TO. */ + void (*CopyToOneRow) (CopyToState cstate, TupleTableSlot *slot); + + /* Called when COPY TO is ended. This will send a trailer. */ + void (*CopyToEnd) (CopyToState cstate); +} CopyToRoutine; + +#endif /* COPYAPI_H */ -- 2.43.0 From 720cda9c40d4f2f9a6c0b2cf9be5f4526da818d1 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Fri, 26 Jan 2024 17:21:53 +0900 Subject: [PATCH v9 2/2] Extract COPY FROM format implementations This doesn't change the current behavior. This just introduces CopyFromRoutine, which just has function pointers of format implementation like TupleTableSlotOps, and use it for existing "text", "csv" and "binary" format implementations. Note that CopyFromRoutine can't be used from extensions yet because CopyRead*() aren't exported yet. Extensions can't read data from a source without CopyRead*(). They will be exported by subsequent patches. --- src/backend/commands/copy.c | 31 +- src/backend/commands/copyfrom.c | 300 +++++++++++++--- src/backend/commands/copyfromparse.c | 428 +++++++++++++---------- src/include/commands/copy.h | 6 +- src/include/commands/copyapi.h | 46 +++ src/include/commands/copyfrom_internal.h | 4 + 6 files changed, 561 insertions(+), 254 deletions(-) diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index dd0fe7f0bb..7aabed5614 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -484,32 +484,19 @@ ProcessCopyOptions(ParseState *pstate, format_specified = true; if (is_from) - { - char *fmt = defGetString(defel); - - if (strcmp(fmt, "text") == 0) - /* default format */ ; - else if (strcmp(fmt, "csv") == 0) - { - opts_out->csv_mode = true; - } - else if (strcmp(fmt, "binary") == 0) - { - opts_out->binary = true; - } - else - ereport(ERROR, - (errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("COPY format \"%s\" not recognized", fmt), - parser_errposition(pstate, defel->location))); - } + ProcessCopyOptionFormatFrom(pstate, opts_out, cstate, defel); else ProcessCopyOptionFormatTo(pstate, opts_out, cstate, defel); } } if (!format_specified) + { /* Set the default format. */ - ProcessCopyOptionFormatTo(pstate, opts_out, cstate, NULL); + if (is_from) + ProcessCopyOptionFormatFrom(pstate, opts_out, cstate, NULL); + else + ProcessCopyOptionFormatTo(pstate, opts_out, cstate, NULL); + } /* * Extract options except "format" from the statement node tree. Unknown @@ -857,7 +844,9 @@ ProcessCopyOptions(ParseState *pstate, DefElem *defel = lfirst_node(DefElem, option); bool processed = false; - if (!is_from) + if (is_from) + processed = opts_out->from_routine->CopyFromProcessOption(cstate, defel); + else processed = opts_out->to_routine->CopyToProcessOption(cstate, defel); if (!processed) ereport(ERROR, diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index fb3d4d9296..338a885e2c 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -32,6 +32,7 @@ #include "catalog/namespace.h" #include "commands/copy.h" #include "commands/copyfrom_internal.h" +#include "commands/defrem.h" #include "commands/progress.h" #include "commands/trigger.h" #include "executor/execPartition.h" @@ -108,6 +109,253 @@ static char *limit_printout_length(const char *str); static void ClosePipeFromProgram(CopyFromState cstate); + +/* + * CopyFromRoutine implementations. + */ + +/* + * CopyFromRoutine implementation for "text" and "csv". CopyFromTextBased*() + * are shared by both of "text" and "csv". CopyFromText*() are only for "text" + * and CopyFromCSV*() are only for "csv". + * + * We can use the same functions for all callbacks by referring + * cstate->opts.csv_mode but splitting callbacks to eliminate "if + * (cstate->opts.csv_mode)" branches from all callbacks has performance merit + * when many tuples are copied. So we use separated callbacks for "text" and + * "csv". + */ + +static void +CopyFromTextBasedInit(CopyFromState cstate) +{ +} + +/* + * All "text" and "csv" options are parsed in ProcessCopyOptions(). We may + * move the code to here later. + */ +static bool +CopyFromTextBasedProcessOption(CopyFromState cstate, DefElem *defel) +{ + return false; +} + +static void +CopyFromTextBasedFillCopyInResponse(CopyFromState cstate, StringInfoData *buf) +{ + int16 format = 0; + int natts = list_length(cstate->attnumlist); + int i; + + pq_sendbyte(buf, format); /* overall format */ + pq_sendint16(buf, natts); + for (i = 0; i < natts; i++) + pq_sendint16(buf, format); /* per-column formats */ +} + +/* + * This must initialize cstate->in_functions for CopyFromTextBasedOneRow(). + */ +static void +CopyFromTextBasedStart(CopyFromState cstate, TupleDesc tupDesc) +{ + AttrNumber num_phys_attrs = tupDesc->natts; + AttrNumber attr_count; + + /* + * If encoding conversion is needed, we need another buffer to hold the + * converted input data. Otherwise, we can just point input_buf to the + * same buffer as raw_buf. + */ + if (cstate->need_transcoding) + { + cstate->input_buf = (char *) palloc(INPUT_BUF_SIZE + 1); + cstate->input_buf_index = cstate->input_buf_len = 0; + } + else + cstate->input_buf = cstate->raw_buf; + cstate->input_reached_eof = false; + + initStringInfo(&cstate->line_buf); + + /* + * Pick up the required catalog information for each attribute in the + * relation, including the input function, the element type (to pass to + * the input function). + */ + cstate->in_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo)); + cstate->typioparams = (Oid *) palloc(num_phys_attrs * sizeof(Oid)); + for (int attnum = 1; attnum <= num_phys_attrs; attnum++) + { + Form_pg_attribute att = TupleDescAttr(tupDesc, attnum - 1); + Oid in_func_oid; + + /* We don't need info for dropped attributes */ + if (att->attisdropped) + continue; + + /* Fetch the input function and typioparam info */ + getTypeInputInfo(att->atttypid, + &in_func_oid, &cstate->typioparams[attnum - 1]); + fmgr_info(in_func_oid, &cstate->in_functions[attnum - 1]); + } + + /* create workspace for CopyReadAttributes results */ + attr_count = list_length(cstate->attnumlist); + cstate->max_fields = attr_count; + cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *)); +} + +static void +CopyFromTextBasedEnd(CopyFromState cstate) +{ +} + +/* + * CopyFromRoutine implementation for "binary". + */ + +static void +CopyFromBinaryInit(CopyFromState cstate) +{ +} + +/* + * All "binary" options are parsed in ProcessCopyOptions(). We may move the + * code to here later. + */ +static bool +CopyFromBinaryProcessOption(CopyFromState cstate, DefElem *defel) +{ + return false; +} + +static void +CopyFromBinaryFillCopyInResponse(CopyFromState cstate, StringInfoData *buf) +{ + int16 format = 1; + int natts = list_length(cstate->attnumlist); + int i; + + pq_sendbyte(buf, format); /* overall format */ + pq_sendint16(buf, natts); + for (i = 0; i < natts; i++) + pq_sendint16(buf, format); /* per-column formats */ +} + +/* + * This must initialize cstate->in_functions for CopyFromBinaryOneRow(). + */ +static void +CopyFromBinaryStart(CopyFromState cstate, TupleDesc tupDesc) +{ + AttrNumber num_phys_attrs = tupDesc->natts; + + /* + * Pick up the required catalog information for each attribute in the + * relation, including the input function, the element type (to pass to + * the input function). + */ + cstate->in_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo)); + cstate->typioparams = (Oid *) palloc(num_phys_attrs * sizeof(Oid)); + for (int attnum = 1; attnum <= num_phys_attrs; attnum++) + { + Form_pg_attribute att = TupleDescAttr(tupDesc, attnum - 1); + Oid in_func_oid; + + /* We don't need info for dropped attributes */ + if (att->attisdropped) + continue; + + /* Fetch the input function and typioparam info */ + getTypeBinaryInputInfo(att->atttypid, + &in_func_oid, &cstate->typioparams[attnum - 1]); + fmgr_info(in_func_oid, &cstate->in_functions[attnum - 1]); + } + + /* Read and verify binary header */ + ReceiveCopyBinaryHeader(cstate); +} + +static void +CopyFromBinaryEnd(CopyFromState cstate) +{ +} + +/* + * CopyFromTextBased*() are shared with "csv". CopyFromText*() are only for "text". + */ +static const CopyFromRoutine CopyFromRoutineText = { + .CopyFromInit = CopyFromTextBasedInit, + .CopyFromProcessOption = CopyFromTextBasedProcessOption, + .CopyFromFillCopyInResponse = CopyFromTextBasedFillCopyInResponse, + .CopyFromStart = CopyFromTextBasedStart, + .CopyFromOneRow = CopyFromTextOneRow, + .CopyFromEnd = CopyFromTextBasedEnd, +}; + +/* + * CopyFromTextBased*() are shared with "text". CopyFromCSV*() are only for "csv". + */ +static const CopyFromRoutine CopyFromRoutineCSV = { + .CopyFromInit = CopyFromTextBasedInit, + .CopyFromProcessOption = CopyFromTextBasedProcessOption, + .CopyFromFillCopyInResponse = CopyFromTextBasedFillCopyInResponse, + .CopyFromStart = CopyFromTextBasedStart, + .CopyFromOneRow = CopyFromCSVOneRow, + .CopyFromEnd = CopyFromTextBasedEnd, +}; + +static const CopyFromRoutine CopyFromRoutineBinary = { + .CopyFromInit = CopyFromBinaryInit, + .CopyFromProcessOption = CopyFromBinaryProcessOption, + .CopyFromFillCopyInResponse = CopyFromBinaryFillCopyInResponse, + .CopyFromStart = CopyFromBinaryStart, + .CopyFromOneRow = CopyFromBinaryOneRow, + .CopyFromEnd = CopyFromBinaryEnd, +}; + +/* + * Process the "format" option for COPY FROM. + * + * If defel is NULL, the default format "text" is used. + */ +void +ProcessCopyOptionFormatFrom(ParseState *pstate, + CopyFormatOptions *opts_out, + CopyFromState cstate, + DefElem *defel) +{ + char *format; + + if (defel) + format = defGetString(defel); + else + format = "text"; + + if (strcmp(format, "text") == 0) + opts_out->from_routine = &CopyFromRoutineText; + else if (strcmp(format, "csv") == 0) + { + opts_out->csv_mode = true; + opts_out->from_routine = &CopyFromRoutineCSV; + } + else if (strcmp(format, "binary") == 0) + { + opts_out->binary = true; + opts_out->from_routine = &CopyFromRoutineBinary; + } + else + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("COPY format \"%s\" not recognized", format), + parser_errposition(pstate, defel->location))); + + opts_out->from_routine->CopyFromInit(cstate); +} + + /* * error context callback for COPY FROM * @@ -1384,9 +1632,6 @@ BeginCopyFrom(ParseState *pstate, TupleDesc tupDesc; AttrNumber num_phys_attrs, num_defaults; - FmgrInfo *in_functions; - Oid *typioparams; - Oid in_func_oid; int *defmap; ExprState **defexprs; MemoryContext oldcontext; @@ -1571,25 +1816,6 @@ BeginCopyFrom(ParseState *pstate, cstate->raw_buf_index = cstate->raw_buf_len = 0; cstate->raw_reached_eof = false; - if (!cstate->opts.binary) - { - /* - * If encoding conversion is needed, we need another buffer to hold - * the converted input data. Otherwise, we can just point input_buf - * to the same buffer as raw_buf. - */ - if (cstate->need_transcoding) - { - cstate->input_buf = (char *) palloc(INPUT_BUF_SIZE + 1); - cstate->input_buf_index = cstate->input_buf_len = 0; - } - else - cstate->input_buf = cstate->raw_buf; - cstate->input_reached_eof = false; - - initStringInfo(&cstate->line_buf); - } - initStringInfo(&cstate->attribute_buf); /* Assign range table and rteperminfos, we'll need them in CopyFrom. */ @@ -1608,8 +1834,6 @@ BeginCopyFrom(ParseState *pstate, * the input function), and info about defaults and constraints. (Which * input function we use depends on text/binary format choice.) */ - in_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo)); - typioparams = (Oid *) palloc(num_phys_attrs * sizeof(Oid)); defmap = (int *) palloc(num_phys_attrs * sizeof(int)); defexprs = (ExprState **) palloc(num_phys_attrs * sizeof(ExprState *)); @@ -1621,15 +1845,6 @@ BeginCopyFrom(ParseState *pstate, if (att->attisdropped) continue; - /* Fetch the input function and typioparam info */ - if (cstate->opts.binary) - getTypeBinaryInputInfo(att->atttypid, - &in_func_oid, &typioparams[attnum - 1]); - else - getTypeInputInfo(att->atttypid, - &in_func_oid, &typioparams[attnum - 1]); - fmgr_info(in_func_oid, &in_functions[attnum - 1]); - /* Get default info if available */ defexprs[attnum - 1] = NULL; @@ -1689,8 +1904,6 @@ BeginCopyFrom(ParseState *pstate, cstate->bytes_processed = 0; /* We keep those variables in cstate. */ - cstate->in_functions = in_functions; - cstate->typioparams = typioparams; cstate->defmap = defmap; cstate->defexprs = defexprs; cstate->volatile_defexprs = volatile_defexprs; @@ -1763,20 +1976,7 @@ BeginCopyFrom(ParseState *pstate, pgstat_progress_update_multi_param(3, progress_cols, progress_vals); - if (cstate->opts.binary) - { - /* Read and verify binary header */ - ReceiveCopyBinaryHeader(cstate); - } - - /* create workspace for CopyReadAttributes results */ - if (!cstate->opts.binary) - { - AttrNumber attr_count = list_length(cstate->attnumlist); - - cstate->max_fields = attr_count; - cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *)); - } + cstate->opts.from_routine->CopyFromStart(cstate, tupDesc); MemoryContextSwitchTo(oldcontext); @@ -1789,6 +1989,8 @@ BeginCopyFrom(ParseState *pstate, void EndCopyFrom(CopyFromState cstate) { + cstate->opts.from_routine->CopyFromEnd(cstate); + /* No COPY FROM related resources except memory. */ if (cstate->is_program) { diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index 7cacd0b752..f6b130458b 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -171,15 +171,9 @@ void ReceiveCopyBegin(CopyFromState cstate) { StringInfoData buf; - int natts = list_length(cstate->attnumlist); - int16 format = (cstate->opts.binary ? 1 : 0); - int i; pq_beginmessage(&buf, PqMsg_CopyInResponse); - pq_sendbyte(&buf, format); /* overall format */ - pq_sendint16(&buf, natts); - for (i = 0; i < natts; i++) - pq_sendint16(&buf, format); /* per-column formats */ + cstate->opts.from_routine->CopyFromFillCopyInResponse(cstate, &buf); pq_endmessage(&buf); cstate->copy_src = COPY_FRONTEND; cstate->fe_msgbuf = makeStringInfo(); @@ -740,8 +734,19 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes) return copied_bytes; } +typedef int (*CopyReadAttributes) (CopyFromState cstate); + /* - * Read raw fields in the next line for COPY FROM in text or csv mode. + * Read raw fields in the next line for COPY FROM in text or csv + * mode. CopyReadAttributesText() must be used for text mode and + * CopyReadAttributesCSV() for csv mode. This inconvenient is for + * optimization. If "if (cstate->opts.csv_mode)" branch is removed, there is + * performance merit for COPY FROM with many tuples. + * + * NextCopyFromRawFields() can be used instead for convenience + * use. NextCopyFromRawFields() chooses CopyReadAttributesText() or + * CopyReadAttributesCSV() internally. + * * Return false if no more lines. * * An internal temporary buffer is returned via 'fields'. It is valid until @@ -751,8 +756,8 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes) * * NOTE: force_not_null option are not applied to the returned fields. */ -bool -NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields) +static inline bool +NextCopyFromRawFieldsInternal(CopyFromState cstate, char ***fields, int *nfields, CopyReadAttributes copy_read_attributes) { int fldct; bool done; @@ -775,11 +780,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields) { int fldnum; - if (cstate->opts.csv_mode) - fldct = CopyReadAttributesCSV(cstate); - else - fldct = CopyReadAttributesText(cstate); - + fldct = copy_read_attributes(cstate); if (fldct != list_length(cstate->attnumlist)) ereport(ERROR, (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), @@ -830,16 +831,240 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields) return false; /* Parse the line into de-escaped field values */ - if (cstate->opts.csv_mode) - fldct = CopyReadAttributesCSV(cstate); - else - fldct = CopyReadAttributesText(cstate); + fldct = copy_read_attributes(cstate); *fields = cstate->raw_fields; *nfields = fldct; return true; } +/* + * See NextCopyFromRawFieldsInternal() for details. + */ +bool +NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields) +{ + if (cstate->opts.csv_mode) + return NextCopyFromRawFieldsInternal(cstate, fields, nfields, CopyReadAttributesCSV); + else + return NextCopyFromRawFieldsInternal(cstate, fields, nfields, CopyReadAttributesText); +} + +typedef char *(*PostpareColumnValue) (CopyFromState cstate, char *string, int m); + +static inline char * +PostpareColumnValueText(CopyFromState cstate, char *string, int m) +{ + /* do nothing */ + return string; +} + +static inline char * +PostpareColumnValueCSV(CopyFromState cstate, char *string, int m) +{ + if (string == NULL && + cstate->opts.force_notnull_flags[m]) + { + /* + * FORCE_NOT_NULL option is set and column is NULL - convert it to the + * NULL string. + */ + string = cstate->opts.null_print; + } + else if (string != NULL && cstate->opts.force_null_flags[m] + && strcmp(string, cstate->opts.null_print) == 0) + { + /* + * FORCE_NULL option is set and column matches the NULL string. It + * must have been quoted, or otherwise the string would already have + * been set to NULL. Convert it to NULL as specified. + */ + string = NULL; + } + return string; +} + +/* + * We don't use this function as a callback directly. We define + * CopyFromTextOneRow() and CopyFromCSVOneRow() and use them instead. It's for + * eliminating a "if (cstate->opts.csv_mode)" branch. This callback is called + * per tuple. So this optimization will be valuable when many tuples are + * copied. + * + * cstate->in_functions must be initialized in CopyFromTextBasedStart(). + */ +static inline bool +CopyFromTextBasedOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls, CopyReadAttributes copy_read_attributes,PostpareColumnValue postpare_column_value) +{ + TupleDesc tupDesc; + AttrNumber attr_count; + FmgrInfo *in_functions = cstate->in_functions; + Oid *typioparams = cstate->typioparams; + ExprState **defexprs = cstate->defexprs; + char **field_strings; + ListCell *cur; + int fldct; + int fieldno; + char *string; + + tupDesc = RelationGetDescr(cstate->rel); + attr_count = list_length(cstate->attnumlist); + + /* read raw fields in the next line */ + if (!NextCopyFromRawFieldsInternal(cstate, &field_strings, &fldct, copy_read_attributes)) + return false; + + /* check for overflowing fields */ + if (attr_count > 0 && fldct > attr_count) + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("extra data after last expected column"))); + + fieldno = 0; + + /* Loop to read the user attributes on the line. */ + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + int m = attnum - 1; + Form_pg_attribute att = TupleDescAttr(tupDesc, m); + + if (fieldno >= fldct) + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("missing data for column \"%s\"", + NameStr(att->attname)))); + string = field_strings[fieldno++]; + + if (cstate->convert_select_flags && + !cstate->convert_select_flags[m]) + { + /* ignore input field, leaving column as NULL */ + continue; + } + + cstate->cur_attname = NameStr(att->attname); + cstate->cur_attval = string; + + string = postpare_column_value(cstate, string, m); + + if (string != NULL) + nulls[m] = false; + + if (cstate->defaults[m]) + { + /* + * The caller must supply econtext and have switched into the + * per-tuple memory context in it. + */ + Assert(econtext != NULL); + Assert(CurrentMemoryContext == econtext->ecxt_per_tuple_memory); + + values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]); + } + + /* + * If ON_ERROR is specified with IGNORE, skip rows with soft errors + */ + else if (!InputFunctionCallSafe(&in_functions[m], + string, + typioparams[m], + att->atttypmod, + (Node *) cstate->escontext, + &values[m])) + { + cstate->num_errors++; + return true; + } + + cstate->cur_attname = NULL; + cstate->cur_attval = NULL; + } + + Assert(fieldno == attr_count); + + return true; +} + +bool +CopyFromTextOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls) +{ + return CopyFromTextBasedOneRow(cstate, econtext, values, nulls, CopyReadAttributesText, PostpareColumnValueText); +} + +bool +CopyFromCSVOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls) +{ + return CopyFromTextBasedOneRow(cstate, econtext, values, nulls, CopyReadAttributesCSV, PostpareColumnValueCSV); +} + +/* + * cstate->in_functions must be initialized in CopyFromBinaryStart(). + */ +bool +CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls) +{ + TupleDesc tupDesc; + AttrNumber attr_count; + FmgrInfo *in_functions = cstate->in_functions; + Oid *typioparams = cstate->typioparams; + int16 fld_count; + ListCell *cur; + + tupDesc = RelationGetDescr(cstate->rel); + attr_count = list_length(cstate->attnumlist); + + cstate->cur_lineno++; + + if (!CopyGetInt16(cstate, &fld_count)) + { + /* EOF detected (end of file, or protocol-level EOF) */ + return false; + } + + if (fld_count == -1) + { + /* + * Received EOF marker. Wait for the protocol-level EOF, and complain + * if it doesn't come immediately. In COPY FROM STDIN, this ensures + * that we correctly handle CopyFail, if client chooses to send that + * now. When copying from file, we could ignore the rest of the file + * like in text mode, but we choose to be consistent with the COPY + * FROM STDIN case. + */ + char dummy; + + if (CopyReadBinaryData(cstate, &dummy, 1) > 0) + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("received copy data after EOF marker"))); + return false; + } + + if (fld_count != attr_count) + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("row field count is %d, expected %d", + (int) fld_count, attr_count))); + + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + int m = attnum - 1; + Form_pg_attribute att = TupleDescAttr(tupDesc, m); + + cstate->cur_attname = NameStr(att->attname); + values[m] = CopyReadBinaryAttribute(cstate, + &in_functions[m], + typioparams[m], + att->atttypmod, + &nulls[m]); + cstate->cur_attname = NULL; + } + + return true; +} + /* * Read next tuple from file for COPY FROM. Return false if no more tuples. * @@ -857,181 +1082,22 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext, { TupleDesc tupDesc; AttrNumber num_phys_attrs, - attr_count, num_defaults = cstate->num_defaults; - FmgrInfo *in_functions = cstate->in_functions; - Oid *typioparams = cstate->typioparams; int i; int *defmap = cstate->defmap; ExprState **defexprs = cstate->defexprs; tupDesc = RelationGetDescr(cstate->rel); num_phys_attrs = tupDesc->natts; - attr_count = list_length(cstate->attnumlist); /* Initialize all values for row to NULL */ MemSet(values, 0, num_phys_attrs * sizeof(Datum)); MemSet(nulls, true, num_phys_attrs * sizeof(bool)); MemSet(cstate->defaults, false, num_phys_attrs * sizeof(bool)); - if (!cstate->opts.binary) - { - char **field_strings; - ListCell *cur; - int fldct; - int fieldno; - char *string; - - /* read raw fields in the next line */ - if (!NextCopyFromRawFields(cstate, &field_strings, &fldct)) - return false; - - /* check for overflowing fields */ - if (attr_count > 0 && fldct > attr_count) - ereport(ERROR, - (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - errmsg("extra data after last expected column"))); - - fieldno = 0; - - /* Loop to read the user attributes on the line. */ - foreach(cur, cstate->attnumlist) - { - int attnum = lfirst_int(cur); - int m = attnum - 1; - Form_pg_attribute att = TupleDescAttr(tupDesc, m); - - if (fieldno >= fldct) - ereport(ERROR, - (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - errmsg("missing data for column \"%s\"", - NameStr(att->attname)))); - string = field_strings[fieldno++]; - - if (cstate->convert_select_flags && - !cstate->convert_select_flags[m]) - { - /* ignore input field, leaving column as NULL */ - continue; - } - - if (cstate->opts.csv_mode) - { - if (string == NULL && - cstate->opts.force_notnull_flags[m]) - { - /* - * FORCE_NOT_NULL option is set and column is NULL - - * convert it to the NULL string. - */ - string = cstate->opts.null_print; - } - else if (string != NULL && cstate->opts.force_null_flags[m] - && strcmp(string, cstate->opts.null_print) == 0) - { - /* - * FORCE_NULL option is set and column matches the NULL - * string. It must have been quoted, or otherwise the - * string would already have been set to NULL. Convert it - * to NULL as specified. - */ - string = NULL; - } - } - - cstate->cur_attname = NameStr(att->attname); - cstate->cur_attval = string; - - if (string != NULL) - nulls[m] = false; - - if (cstate->defaults[m]) - { - /* - * The caller must supply econtext and have switched into the - * per-tuple memory context in it. - */ - Assert(econtext != NULL); - Assert(CurrentMemoryContext == econtext->ecxt_per_tuple_memory); - - values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]); - } - - /* - * If ON_ERROR is specified with IGNORE, skip rows with soft - * errors - */ - else if (!InputFunctionCallSafe(&in_functions[m], - string, - typioparams[m], - att->atttypmod, - (Node *) cstate->escontext, - &values[m])) - { - cstate->num_errors++; - return true; - } - - cstate->cur_attname = NULL; - cstate->cur_attval = NULL; - } - - Assert(fieldno == attr_count); - } - else - { - /* binary */ - int16 fld_count; - ListCell *cur; - - cstate->cur_lineno++; - - if (!CopyGetInt16(cstate, &fld_count)) - { - /* EOF detected (end of file, or protocol-level EOF) */ - return false; - } - - if (fld_count == -1) - { - /* - * Received EOF marker. Wait for the protocol-level EOF, and - * complain if it doesn't come immediately. In COPY FROM STDIN, - * this ensures that we correctly handle CopyFail, if client - * chooses to send that now. When copying from file, we could - * ignore the rest of the file like in text mode, but we choose to - * be consistent with the COPY FROM STDIN case. - */ - char dummy; - - if (CopyReadBinaryData(cstate, &dummy, 1) > 0) - ereport(ERROR, - (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - errmsg("received copy data after EOF marker"))); - return false; - } - - if (fld_count != attr_count) - ereport(ERROR, - (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - errmsg("row field count is %d, expected %d", - (int) fld_count, attr_count))); - - foreach(cur, cstate->attnumlist) - { - int attnum = lfirst_int(cur); - int m = attnum - 1; - Form_pg_attribute att = TupleDescAttr(tupDesc, m); - - cstate->cur_attname = NameStr(att->attname); - values[m] = CopyReadBinaryAttribute(cstate, - &in_functions[m], - typioparams[m], - att->atttypmod, - &nulls[m]); - cstate->cur_attname = NULL; - } - } + if (!cstate->opts.from_routine->CopyFromOneRow(cstate, econtext, values, + nulls)) + return false; /* * Now compute and insert any defaults available for the columns not diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h index de316cfd81..cab05a0aa0 100644 --- a/src/include/commands/copy.h +++ b/src/include/commands/copy.h @@ -75,12 +75,11 @@ typedef struct CopyFormatOptions bool convert_selectively; /* do selective binary conversion? */ CopyOnErrorChoice on_error; /* what to do when error happened */ List *convert_select; /* list of column names (can be NIL) */ + const CopyFromRoutine *from_routine; /* callback routines for COPY + * FROM */ const CopyToRoutine *to_routine; /* callback routines for COPY TO */ } CopyFormatOptions; -/* This is private in commands/copyfrom.c */ -typedef struct CopyFromStateData *CopyFromState; - typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread); typedef void (*copy_data_dest_cb) (void *data, int len); @@ -89,6 +88,7 @@ extern void DoCopy(ParseState *pstate, const CopyStmt *stmt, uint64 *processed); extern void ProcessCopyOptions(ParseState *pstate, CopyFormatOptions *opts_out, bool is_from, void *cstate, List *options); +extern void ProcessCopyOptionFormatFrom(ParseState *pstate, CopyFormatOptions *opts_out, CopyFromState cstate, DefElem *defel); extern void ProcessCopyOptionFormatTo(ParseState *pstate, CopyFormatOptions *opts_out, CopyToState cstate, DefElem *defel); extern CopyFromState BeginCopyFrom(ParseState *pstate, Relation rel, Node *whereClause, const char *filename, diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index f8901cac51..9f5a4958aa 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -15,8 +15,54 @@ #define COPYAPI_H #include "executor/tuptable.h" +#include "nodes/execnodes.h" #include "nodes/parsenodes.h" +/* This is private in commands/copyfrom.c */ +typedef struct CopyFromStateData *CopyFromState; + +/* Routines for a COPY FROM format implementation. */ +typedef struct CopyFromRoutine +{ + /* + * Called when this CopyFromRoutine is chosen. This can be used for + * initialization. + */ + void (*CopyFromInit) (CopyFromState cstate); + + /* + * Called for processing one COPY FROM option. This will return false when + * the given option is invalid. + */ + bool (*CopyFromProcessOption) (CopyFromState cstate, DefElem *defel); + + /* + * Called when COPY FROM via the PostgreSQL protocol is started. This must + * fill buf as a valid CopyInResponse message: + * + */ + /*-- + * +--------+--------+--------+--------+--------+ +--------+--------+ + * | Format | N attributes | Attr1's format |...| AttrN's format | + * +--------+--------+--------+--------+--------+ +--------+--------+ + * 0: text 0: text 0: text + * 1: binary 1: binary 1: binary + */ + void (*CopyFromFillCopyInResponse) (CopyFromState cstate, StringInfoData *buf); + + /* + * Called when COPY FROM is started. This will initialize something and + * receive a header. + */ + void (*CopyFromStart) (CopyFromState cstate, TupleDesc tupDesc); + + /* Copy one row. It returns false if no more tuples. */ + bool (*CopyFromOneRow) (CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls); + + /* Called when COPY FROM is ended. This will finalize something. */ + void (*CopyFromEnd) (CopyFromState cstate); +} CopyFromRoutine; + /* This is private in commands/copyto.c */ typedef struct CopyToStateData *CopyToState; diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h index cad52fcc78..096b55011e 100644 --- a/src/include/commands/copyfrom_internal.h +++ b/src/include/commands/copyfrom_internal.h @@ -183,4 +183,8 @@ typedef struct CopyFromStateData extern void ReceiveCopyBegin(CopyFromState cstate); extern void ReceiveCopyBinaryHeader(CopyFromState cstate); +extern bool CopyFromTextOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls); +extern bool CopyFromCSVOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls); +extern bool CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls); + #endif /* COPYFROM_INTERNAL_H */ -- 2.43.0
On Mon, Jan 29, 2024 at 6:45 PM Sutou Kouhei <kou@clear-code.com> wrote:
>
> Hi,
>
> In <CAEG8a3Jnmbjw82OiSvRK3v9XN2zSshsB8ew1mZCQDAkKq6r9YQ@mail.gmail.com>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 29 Jan 2024 11:37:07 +0800,
>   Junwang Zhao <zhjwpku@gmail.com> wrote:
>
> >> > > Does it make sense to pass only non-builtin options to the custom
> >> > > format callback after parsing and evaluating the builtin options? That
> >> > > is, we parse and evaluate only the builtin options and populate
> >> > > opts_out first, then pass each rest option to the custom format
> >> > > handler callback. The callback can refer to the builtin option values.
> >>
> >> What I imagined is that while parsing the all specified options, we
> >> evaluate builtin options and we add non-builtin options to another
> >> list. Then when parsing a non-builtin option, we check if this option
> >> already exists in the list. If there is, we raise the "option %s not
> >> recognized" error.". Once we complete checking all options, we pass
> >> each option in the list to the callback.
>
> I implemented this idea and the following ideas:
>
> 1. Add init callback for initialization
> 2. Change GetFormat() to FillCopyXXXResponse()
>    because JSON format always use 1 column
> 3. FROM only: Eliminate more cstate->opts.csv_mode branches
>    (This is for performance.)
>
> See the attached v9 patch set for details. Changes since v7:
>
> 0001:
>
> * Move CopyToProcessOption() calls to the end of
>   ProcessCopyOptions() for easy to option validation
> * Add CopyToState::CopyToInit() and call it in
>   ProcessCopyOptionFormatTo()
> * Change CopyToState::CopyToGetFormat() to
>   CopyToState::CopyToFillCopyOutResponse() and use it in
>   SendCopyBegin()
Thank you for updating the patch! Here are comments on 0001 patch:
---
+        if (!format_specified)
+                /* Set the default format. */
+                ProcessCopyOptionFormatTo(pstate, opts_out, cstate, NULL);
+
I think we can pass "text" in this case instead of NULL. That way,
ProcessCopyOptionFormatTo doesn't need to handle NULL case.
We need curly brackets for this "if branch" as follows:
if (!format_specifed)
{
    /* Set the default format. */
    ProcessCopyOptionFormatTo(pstate, opts_out, cstate, NULL);
}
---
+        /* Process not built-in options. */
+        foreach(option, unknown_options)
+        {
+                DefElem    *defel = lfirst_node(DefElem, option);
+                bool           processed = false;
+
+                if (!is_from)
+                        processed =
opts_out->to_routine->CopyToProcessOption(cstate, defel);
+                if (!processed)
+                        ereport(ERROR,
+                                        (errcode(ERRCODE_SYNTAX_ERROR),
+                                         errmsg("option \"%s\" not recognized",
+                                                        defel->defname),
+                                         parser_errposition(pstate,
defel->location)));
+        }
+        list_free(unknown_options);
I think we can check the duplicated options in the core as we discussed.
---
+static void
+CopyToTextBasedInit(CopyToState cstate)
+{
+}
and
+static void
+CopyToBinaryInit(CopyToState cstate)
+{
+}
Do we really need separate callbacks for the same behavior? I think we
can have a common init function say CopyToBuitinInit() that does
nothing. Or we can make the init callback optional.
The same is true for process-option callback.
---
         List      *convert_select; /* list of column names (can be NIL) */
+        const          CopyToRoutine *to_routine;      /* callback
routines for COPY TO */
 } CopyFormatOptions;
I think CopyToStateData is a better place to have CopyToRoutine.
copy_data_dest_cb is also there.
---
-                        if (strcmp(fmt, "text") == 0)
-                                 /* default format */ ;
-                        else if (strcmp(fmt, "csv") == 0)
-                                opts_out->csv_mode = true;
-                        else if (strcmp(fmt, "binary") == 0)
-                                opts_out->binary = true;
+
+                        if (is_from)
+                        {
+                                char      *fmt = defGetString(defel);
+
+                                if (strcmp(fmt, "text") == 0)
+                                         /* default format */ ;
+                                else if (strcmp(fmt, "csv") == 0)
+                                {
+                                        opts_out->csv_mode = true;
+                                }
+                                else if (strcmp(fmt, "binary") == 0)
+                                {
+                                        opts_out->binary = true;
+                                }
                         else
-                                ereport(ERROR,
-
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-                                                 errmsg("COPY format
\"%s\" not recognized", fmt\),
-
parser_errposition(pstate, defel->location)));
+                                ProcessCopyOptionFormatTo(pstate,
opts_out, cstate, defel);
The 0002 patch replaces the options checks with
ProcessCopyOptionFormatFrom(). However, both
ProcessCopyOptionFormatTo() and ProcessCOpyOptionFormatFrom() would
set format-related options such as opts_out->csv_mode etc, which seems
not elegant. IIUC the reason why we process only the "format" option
first is to set the callback functions and call the init callback. So
I think we don't necessarily need to do both setting callbacks and
setting format-related options together. Probably we can do only the
callback stuff first and then set format-related options in the
original place we used to do?
---
+static void
+CopyToTextBasedFillCopyOutResponse(CopyToState cstate, StringInfoData *buf)
+{
+        int16          format = 0;
+        int                    natts = list_length(cstate->attnumlist);
+        int                    i;
+
+        pq_sendbyte(buf, format);      /* overall format */
+        pq_sendint16(buf, natts);
+        for (i = 0; i < natts; i++)
+                pq_sendint16(buf, format);     /* per-column formats */
+}
This function and CopyToBinaryFillCopyOutResponse() fill three things:
overall format, the number of columns, and per-column formats. While
this approach is flexible, extensions will have to understand the
format of CopyOutResponse message. An alternative is to have one or
more callbacks that return these three things.
---
+        /* Get info about the columns we need to process. */
+        cstate->out_functions = (FmgrInfo *) palloc(num_phys_attrs *
sizeof(Fmgr\Info));
+        foreach(cur, cstate->attnumlist)
+        {
+                int                    attnum = lfirst_int(cur);
+                Oid                    out_func_oid;
+                bool           isvarlena;
+                Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
+
+                getTypeOutputInfo(attr->atttypid, &out_func_oid, &isvarlena);
+                fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
+        }
Is preparing the out functions an extension's responsibility? I
thought the core could prepare them based on the overall format
specified by extensions, as long as the overall format matches the
actual data format to send. What do you think?
---
+        /*
+         * Called when COPY TO via the PostgreSQL protocol is
started. This must
+         * fill buf as a valid CopyOutResponse message:
+         *
+         */
+        /*--
+         * +--------+--------+--------+--------+--------+   +--------+--------+
+         * | Format | N attributes    | Attr1's format  |...| AttrN's format  |
+         * +--------+--------+--------+--------+--------+   +--------+--------+
+         * 0: text                      0: text               0: text
+         * 1: binary                    1: binary             1: binary
+         */
I think this kind of diagram could be missed from being updated when
we update the CopyOutResponse format. It's better to refer to the
documentation instead.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
			
		Hi,
In <CAD21AoBmNiWwrspuedgAPgbAqsn7e7NoZYF6gNnYBf+gXEk9Mg@mail.gmail.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Tue, 30 Jan 2024 11:11:59 +0900,
  Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> ---
> +        if (!format_specified)
> +                /* Set the default format. */
> +                ProcessCopyOptionFormatTo(pstate, opts_out, cstate, NULL);
> +
> 
> I think we can pass "text" in this case instead of NULL. That way,
> ProcessCopyOptionFormatTo doesn't need to handle NULL case.
Yes, we can do it. But it needs a DefElem allocation. Is it
acceptable?
> We need curly brackets for this "if branch" as follows:
> 
> if (!format_specifed)
> {
>     /* Set the default format. */
>     ProcessCopyOptionFormatTo(pstate, opts_out, cstate, NULL);
> }
Oh, sorry. I assumed that pgindent adjusts the style too.
> ---
> +        /* Process not built-in options. */
> +        foreach(option, unknown_options)
> +        {
> +                DefElem    *defel = lfirst_node(DefElem, option);
> +                bool           processed = false;
> +
> +                if (!is_from)
> +                        processed =
> opts_out->to_routine->CopyToProcessOption(cstate, defel);
> +                if (!processed)
> +                        ereport(ERROR,
> +                                        (errcode(ERRCODE_SYNTAX_ERROR),
> +                                         errmsg("option \"%s\" not recognized",
> +                                                        defel->defname),
> +                                         parser_errposition(pstate,
> defel->location)));
> +        }
> +        list_free(unknown_options);
> 
> I think we can check the duplicated options in the core as we discussed.
Oh, sorry. I missed the part. I'll implement it.
> ---
> +static void
> +CopyToTextBasedInit(CopyToState cstate)
> +{
> +}
> 
> and
> 
> +static void
> +CopyToBinaryInit(CopyToState cstate)
> +{
> +}
> 
> Do we really need separate callbacks for the same behavior? I think we
> can have a common init function say CopyToBuitinInit() that does
> nothing. Or we can make the init callback optional.
> 
> The same is true for process-option callback.
OK. I'll make them optional.
> ---
>          List      *convert_select; /* list of column names (can be NIL) */
> +        const          CopyToRoutine *to_routine;      /* callback
> routines for COPY TO */
>  } CopyFormatOptions;
> 
> I think CopyToStateData is a better place to have CopyToRoutine.
> copy_data_dest_cb is also there.
We can do it but ProcessCopyOptions() accepts NULL
CopyToState for file_fdw. Can we create an empty
CopyToStateData internally like we did for opts_out in
ProcessCopyOptions()? (But it requires exporting
CopyToStateData. We'll export it in a later patch but it's
not yet in 0001.)
> The 0002 patch replaces the options checks with
> ProcessCopyOptionFormatFrom(). However, both
> ProcessCopyOptionFormatTo() and ProcessCOpyOptionFormatFrom() would
> set format-related options such as opts_out->csv_mode etc, which seems
> not elegant. IIUC the reason why we process only the "format" option
> first is to set the callback functions and call the init callback. So
> I think we don't necessarily need to do both setting callbacks and
> setting format-related options together. Probably we can do only the
> callback stuff first and then set format-related options in the
> original place we used to do?
If we do it, we need to write the (strcmp(format, "csv") ==
0) condition in copyto.c and copy.c. I wanted to avoid it. I
think that the duplication (setting opts_out->csv_mode in
copyto.c and copyfrom.c) is not a problem. But it's not a
strong opinion. If (strcmp(format, "csv") == 0) duplication
is better than opts_out->csv_mode = true duplication, I'll
do it.
BTW, if we want to make the CSV format implementation more
modularized, we will remove opts_out->csv_mode, move CSV
related options to CopyToCSVProcessOption() and keep CSV
related options in its opaque space. For example,
opts_out->force_quote exists in COPY TO opaque space but
doesn't exist in COPY FROM opaque space because it's not
used in COPY FROM.
> +static void
> +CopyToTextBasedFillCopyOutResponse(CopyToState cstate, StringInfoData *buf)
> +{
> +        int16          format = 0;
> +        int                    natts = list_length(cstate->attnumlist);
> +        int                    i;
> +
> +        pq_sendbyte(buf, format);      /* overall format */
> +        pq_sendint16(buf, natts);
> +        for (i = 0; i < natts; i++)
> +                pq_sendint16(buf, format);     /* per-column formats */
> +}
> 
> This function and CopyToBinaryFillCopyOutResponse() fill three things:
> overall format, the number of columns, and per-column formats. While
> this approach is flexible, extensions will have to understand the
> format of CopyOutResponse message. An alternative is to have one or
> more callbacks that return these three things.
Yes, we can choose the approach. I don't have a strong
opinion on which approach to choose.
> +        /* Get info about the columns we need to process. */
> +        cstate->out_functions = (FmgrInfo *) palloc(num_phys_attrs *
> sizeof(Fmgr\Info));
> +        foreach(cur, cstate->attnumlist)
> +        {
> +                int                    attnum = lfirst_int(cur);
> +                Oid                    out_func_oid;
> +                bool           isvarlena;
> +                Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
> +
> +                getTypeOutputInfo(attr->atttypid, &out_func_oid, &isvarlena);
> +                fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
> +        }
> 
> Is preparing the out functions an extension's responsibility? I
> thought the core could prepare them based on the overall format
> specified by extensions, as long as the overall format matches the
> actual data format to send. What do you think?
Hmm. I want to keep the preparation as an extension's
responsibility. Because it's not needed for all formats. For
example, Apache Arrow FORMAT doesn't need it. And JSON
FORMAT doesn't need it too because it use
composite_to_json().
> +        /*
> +         * Called when COPY TO via the PostgreSQL protocol is
> started. This must
> +         * fill buf as a valid CopyOutResponse message:
> +         *
> +         */
> +        /*--
> +         * +--------+--------+--------+--------+--------+   +--------+--------+
> +         * | Format | N attributes    | Attr1's format  |...| AttrN's format  |
> +         * +--------+--------+--------+--------+--------+   +--------+--------+
> +         * 0: text                      0: text               0: text
> +         * 1: binary                    1: binary             1: binary
> +         */
> 
> I think this kind of diagram could be missed from being updated when
> we update the CopyOutResponse format. It's better to refer to the
> documentation instead.
It makes sense. I couldn't find the documentation when I
wrote it but I found it now...:
https://www.postgresql.org/docs/current/protocol-flow.html#PROTOCOL-COPY
Is there recommended comment style to refer a documentation?
"See doc/src/sgml/protocol.sgml for the CopyOutResponse
message details" is OK?
Thanks,
-- 
kou
			
		On Tue, Jan 30, 2024 at 02:45:31PM +0900, Sutou Kouhei wrote:
> In <CAD21AoBmNiWwrspuedgAPgbAqsn7e7NoZYF6gNnYBf+gXEk9Mg@mail.gmail.com>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Tue, 30 Jan 2024 11:11:59 +0900,
>   Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
>> ---
>> +        if (!format_specified)
>> +                /* Set the default format. */
>> +                ProcessCopyOptionFormatTo(pstate, opts_out, cstate, NULL);
>> +
>>
>> I think we can pass "text" in this case instead of NULL. That way,
>> ProcessCopyOptionFormatTo doesn't need to handle NULL case.
>
> Yes, we can do it. But it needs a DefElem allocation. Is it
> acceptable?
I don't think that there is a need for a DelElem at all here?  While I
am OK with the choice of calling CopyToInit() in the
ProcessCopyOption*() routines that exist to keep the set of callbacks
local to copyto.c and copyfrom.c, I think that this should not bother
about setting opts_out->csv_mode or opts_out->csv_mode but just set
the opts_out->{to,from}_routine callbacks.
>> +static void
>> +CopyToTextBasedInit(CopyToState cstate)
>> +{
>> +}
>>
>> and
>>
>> +static void
>> +CopyToBinaryInit(CopyToState cstate)
>> +{
>> +}
>>
>> Do we really need separate callbacks for the same behavior? I think we
>> can have a common init function say CopyToBuitinInit() that does
>> nothing. Or we can make the init callback optional.
Keeping empty options does not strike as a bad idea, because this
forces extension developers to think about this code path rather than
just ignore it.  Now, all the Init() callbacks are empty for the
in-core callbacks, so I think that we should just remove it entirely
for now.  Let's keep the core patch a maximum simple.  It is always
possible to build on top of it depending on what people need.  It's
been mentioned that JSON would want that, but this also proves that we
just don't care about that for all the in-core callbacks, as well.  I
would choose a minimalistic design for now.
>> +        /* Get info about the columns we need to process. */
>> +        cstate->out_functions = (FmgrInfo *) palloc(num_phys_attrs *
>> sizeof(Fmgr\Info));
>> +        foreach(cur, cstate->attnumlist)
>> +        {
>> +                int                    attnum = lfirst_int(cur);
>> +                Oid                    out_func_oid;
>> +                bool           isvarlena;
>> +                Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
>> +
>> +                getTypeOutputInfo(attr->atttypid, &out_func_oid, &isvarlena);
>> +                fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
>> +        }
>>
>> Is preparing the out functions an extension's responsibility? I
>> thought the core could prepare them based on the overall format
>> specified by extensions, as long as the overall format matches the
>> actual data format to send. What do you think?
>
> Hmm. I want to keep the preparation as an extension's
> responsibility. Because it's not needed for all formats. For
> example, Apache Arrow FORMAT doesn't need it. And JSON
> FORMAT doesn't need it too because it use
> composite_to_json().
I agree that it could be really useful for extensions to be able to
force that.  We already know that for the in-core formats we've cared
about being able to enforce the way data is handled in input and
output.
> It makes sense. I couldn't find the documentation when I
> wrote it but I found it now...:
> https://www.postgresql.org/docs/current/protocol-flow.html#PROTOCOL-COPY
>
> Is there recommended comment style to refer a documentation?
> "See doc/src/sgml/protocol.sgml for the CopyOutResponse
> message details" is OK?
There are a couple of places in the C code where we refer to SGML docs
when it comes to specific details, so using a method like that here to
avoid a duplication with the docs sounds sensible for me.
I would be really tempted to put my hands on this patch to put into
shape a minimal set of changes because I'm caring quite a lot about
the performance gains reported with the removal of the "if" checks in
the per-row callbacks, and that's one goal of this thread quite
independent on the extensibility.  Sutou-san, would you be OK with
that?
--
Michael
			
		Вложения
Hi,
In <ZbijVn9_51mljMAG@paquier.xyz>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Tue, 30 Jan 2024 16:20:54 +0900,
  Michael Paquier <michael@paquier.xyz> wrote:
>>> +        if (!format_specified)
>>> +                /* Set the default format. */
>>> +                ProcessCopyOptionFormatTo(pstate, opts_out, cstate, NULL);
>>> +
>>> 
>>> I think we can pass "text" in this case instead of NULL. That way,
>>> ProcessCopyOptionFormatTo doesn't need to handle NULL case.
>> 
>> Yes, we can do it. But it needs a DefElem allocation. Is it
>> acceptable?
> 
> I don't think that there is a need for a DelElem at all here?
We use defel->location for an error message. (We don't need
to set location for the default "text" DefElem.)
>                                                                While I
> am OK with the choice of calling CopyToInit() in the
> ProcessCopyOption*() routines that exist to keep the set of callbacks
> local to copyto.c and copyfrom.c, I think that this should not bother
> about setting opts_out->csv_mode or opts_out->csv_mode but just set 
> the opts_out->{to,from}_routine callbacks.
OK. I'll keep opts_out->{csv_mode,binary} in copy.c.
>                  Now, all the Init() callbacks are empty for the
> in-core callbacks, so I think that we should just remove it entirely
> for now.  Let's keep the core patch a maximum simple.  It is always
> possible to build on top of it depending on what people need.  It's
> been mentioned that JSON would want that, but this also proves that we
> just don't care about that for all the in-core callbacks, as well.  I
> would choose a minimalistic design for now.
OK. Let's remove Init() callbacks from the first patch set.
> I would be really tempted to put my hands on this patch to put into
> shape a minimal set of changes because I'm caring quite a lot about
> the performance gains reported with the removal of the "if" checks in
> the per-row callbacks, and that's one goal of this thread quite
> independent on the extensibility.  Sutou-san, would you be OK with
> that?
Yes, sure.
(We want to focus on the performance gains in the first
patch set and then focus on extensibility again, right?)
For the purpose, I think that the v7 patch set is more
suitable than the v9 patch set. The v7 patch set doesn't
include Init() callbacks, custom options validation support
or extra Copy{In,Out}Response support. But the v7 patch set
misses the removal of the "if" checks in
NextCopyFromRawFields() that exists in the v9 patch set. I'm
not sure how much performance will improve by this but it
may be worth a try.
Can I prepare the v10 patch set as "the v7 patch set" + "the
removal of the "if" checks in NextCopyFromRawFields()"?
(+ reverting opts_out->{csv_mode,binary} changes in
ProcessCopyOptions().)
Thanks,
-- 
kou
			
		On Tue, Jan 30, 2024 at 05:15:11PM +0900, Sutou Kouhei wrote:
> We use defel->location for an error message. (We don't need
> to set location for the default "text" DefElem.)
Yeah, but you should not need to have this error in the paths that set
the callback routines in opts_out if the same validation happens a few
lines before, in copy.c.
> Yes, sure.
> (We want to focus on the performance gains in the first
> patch set and then focus on extensibility again, right?)
Yep, exactly, the numbers are too good to just ignore.  I don't want
to hijack the thread, but I am really worried about the complexities
this thread is getting into because we are trying to shape the
callbacks in the most generic way possible based on *two* use cases.
This is going to be a never-ending discussion.  I'd rather get some
simple basics, and then we can discuss if tweaking the callbacks is
really necessary or not.  Even after introducing the pg_proc lookups
to get custom callbacks.
> For the purpose, I think that the v7 patch set is more
> suitable than the v9 patch set. The v7 patch set doesn't
> include Init() callbacks, custom options validation support
> or extra Copy{In,Out}Response support. But the v7 patch set
> misses the removal of the "if" checks in
> NextCopyFromRawFields() that exists in the v9 patch set. I'm
> not sure how much performance will improve by this but it
> may be worth a try.
Yeah..  The custom options don't seem like an absolute strong
requirement for the first shot with the callbacks or even the
possibility to retrieve the callbacks from a function call.  I mean,
you could provide some control with SET commands and a few GUCs, at
least, even if that would be strange.  Manipulations with a list of
DefElems is the intuitive way to have custom options at query level,
but we also have to guess the set of callbacks from this list of
DefElems coming from the query.  You see my point, I am not sure
if it would be the best thing to process twice the options, especially
when it comes to decide if a DefElem should be valid or not depending
on the callbacks used.  Or we could use a kind of "special" DefElem
where we could store a set of key:value fed to a callback :)
> Can I prepare the v10 patch set as "the v7 patch set" + "the
> removal of the "if" checks in NextCopyFromRawFields()"?
> (+ reverting opts_out->{csv_mode,binary} changes in
> ProcessCopyOptions().)
Yep, if I got it that would make sense to me.  If you can do that,
that would help quite a bit.  :)
--
Michael
			
		Вложения
Hi,
In <Zbi1TwPfAvUpKqTd@paquier.xyz>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Tue, 30 Jan 2024 17:37:35 +0900,
  Michael Paquier <michael@paquier.xyz> wrote:
>> We use defel->location for an error message. (We don't need
>> to set location for the default "text" DefElem.)
> 
> Yeah, but you should not need to have this error in the paths that set
> the callback routines in opts_out if the same validation happens a few
> lines before, in copy.c.
Ah, yes. defel->location is used in later patches. For
example, it's used when a COPY handler for the specified
FORMAT isn't found.
>                           I am really worried about the complexities
> this thread is getting into because we are trying to shape the
> callbacks in the most generic way possible based on *two* use cases.
> This is going to be a never-ending discussion.  I'd rather get some
> simple basics, and then we can discuss if tweaking the callbacks is
> really necessary or not.  Even after introducing the pg_proc lookups
> to get custom callbacks.
I understand your concern. Let's introduce minimal callbacks
as the first step. I think that we completed our design
discussion for this feature. We can choose minimal callbacks
based on the discussion.
>         The custom options don't seem like an absolute strong
> requirement for the first shot with the callbacks or even the
> possibility to retrieve the callbacks from a function call.  I mean,
> you could provide some control with SET commands and a few GUCs, at
> least, even if that would be strange.  Manipulations with a list of
> DefElems is the intuitive way to have custom options at query level,
> but we also have to guess the set of callbacks from this list of
> DefElems coming from the query.  You see my point, I am not sure 
> if it would be the best thing to process twice the options, especially
> when it comes to decide if a DefElem should be valid or not depending
> on the callbacks used.  Or we could use a kind of "special" DefElem
> where we could store a set of key:value fed to a callback :)
Interesting. Let's remove custom options support from the
initial minimal callbacks.
>> Can I prepare the v10 patch set as "the v7 patch set" + "the
>> removal of the "if" checks in NextCopyFromRawFields()"?
>> (+ reverting opts_out->{csv_mode,binary} changes in
>> ProcessCopyOptions().)
> 
> Yep, if I got it that would make sense to me.  If you can do that,
> that would help quite a bit.  :)
I've prepared the v10 patch set. Could you try this?
Changes since the v7 patch set:
0001:
* Remove CopyToProcessOption() callback
* Remove CopyToGetFormat() callback
* Revert passing CopyToState to ProcessCopyOptions()
* Revert moving "opts_out->{csv_mode,binary} = true" to
  ProcessCopyOptionFormatTo()
* Change to receive "const char *format" instead "DefElem  *defel"
  by ProcessCopyOptionFormatTo()
0002:
* Remove CopyFromProcessOption() callback
* Remove CopyFromGetFormat() callback
* Change to receive "const char *format" instead "DefElem
  *defel" by ProcessCopyOptionFormatFrom()
* Remove "if (cstate->opts.csv_mode)" branches from
  NextCopyFromRawFields()
FYI: Here are Copy{From,To}Routine in the v10 patch set. I
think that only Copy{From,To}OneRow are minimal callbacks
for the performance gain. But can we keep Copy{From,To}Start
and Copy{From,To}End for consistency? We can remove a few
{csv_mode,binary} conditions by Copy{From,To}{Start,End}. It
doesn't depend on the number of COPY target tuples. So they
will not affect performance.
/* Routines for a COPY FROM format implementation. */
typedef struct CopyFromRoutine
{
    /*
     * Called when COPY FROM is started. This will initialize something and
     * receive a header.
     */
    void        (*CopyFromStart) (CopyFromState cstate, TupleDesc tupDesc);
    /* Copy one row. It returns false if no more tuples. */
    bool        (*CopyFromOneRow) (CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls);
    /* Called when COPY FROM is ended. This will finalize something. */
    void        (*CopyFromEnd) (CopyFromState cstate);
}            CopyFromRoutine;
/* Routines for a COPY TO format implementation. */
typedef struct CopyToRoutine
{
    /* Called when COPY TO is started. This will send a header. */
    void        (*CopyToStart) (CopyToState cstate, TupleDesc tupDesc);
    /* Copy one row for COPY TO. */
    void        (*CopyToOneRow) (CopyToState cstate, TupleTableSlot *slot);
    /* Called when COPY TO is ended. This will send a trailer. */
    void        (*CopyToEnd) (CopyToState cstate);
}            CopyToRoutine;
Thanks,
-- 
kou
From f827f1f1632dc330ef5d78141b85df8ca1bce63b Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Wed, 31 Jan 2024 13:22:04 +0900
Subject: [PATCH v10 1/2] Extract COPY TO format implementations
This doesn't change the current behavior. This just introduces
CopyToRoutine, which just has function pointers of format
implementation like TupleTableSlotOps, and use it for existing "text",
"csv" and "binary" format implementations.
This is for performance. We can remove "if (cstate->opts.csv_mode)"
and "if (!cstate->opts.binary)" branches in CopyOneRowTo() by using
callbacks for each format. It improves performance.
---
 src/backend/commands/copy.c    |   8 +
 src/backend/commands/copyto.c  | 494 +++++++++++++++++++++++----------
 src/include/commands/copy.h    |   6 +-
 src/include/commands/copyapi.h |  35 +++
 4 files changed, 389 insertions(+), 154 deletions(-)
 create mode 100644 src/include/commands/copyapi.h
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cc0786c6f4..c88510f8c7 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -487,6 +487,8 @@ ProcessCopyOptions(ParseState *pstate,
                         (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                          errmsg("COPY format \"%s\" not recognized", fmt),
                          parser_errposition(pstate, defel->location)));
+            if (!is_from)
+                ProcessCopyOptionFormatTo(pstate, opts_out, fmt);
         }
         else if (strcmp(defel->defname, "freeze") == 0)
         {
@@ -622,6 +624,12 @@ ProcessCopyOptions(ParseState *pstate,
                             defel->defname),
                      parser_errposition(pstate, defel->location)));
     }
+    if (!format_specified)
+    {
+        /* Set the default format. */
+        if (!is_from)
+            ProcessCopyOptionFormatTo(pstate, opts_out, "text");
+    }
 
     /*
      * Check for incompatible options (must do these two before inserting
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index d3dc3fc854..70a28ab44d 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -131,6 +131,345 @@ static void CopySendEndOfRow(CopyToState cstate);
 static void CopySendInt32(CopyToState cstate, int32 val);
 static void CopySendInt16(CopyToState cstate, int16 val);
 
+/*
+ * CopyToRoutine implementations.
+ */
+
+/*
+ * CopyToRoutine implementation for "text" and "csv". CopyToTextBased*() are
+ * shared by both of "text" and "csv". CopyToText*() are only for "text" and
+ * CopyToCSV*() are only for "csv".
+ *
+ * We can use the same functions for all callbacks by referring
+ * cstate->opts.csv_mode but splitting callbacks to eliminate "if
+ * (cstate->opts.csv_mode)" branches from all callbacks has performance
+ * merit when many tuples are copied. So we use separated callbacks for "text"
+ * and "csv".
+ */
+
+static void
+CopyToTextBasedSendEndOfRow(CopyToState cstate)
+{
+    switch (cstate->copy_dest)
+    {
+        case COPY_FILE:
+            /* Default line termination depends on platform */
+#ifndef WIN32
+            CopySendChar(cstate, '\n');
+#else
+            CopySendString(cstate, "\r\n");
+#endif
+            break;
+        case COPY_FRONTEND:
+            /* The FE/BE protocol uses \n as newline for all platforms */
+            CopySendChar(cstate, '\n');
+            break;
+        default:
+            break;
+    }
+    CopySendEndOfRow(cstate);
+}
+
+typedef void (*CopyAttributeOutHeaderFunction) (CopyToState cstate, char *string);
+
+/*
+ * We can use CopyAttributeOutText() directly but define this for consistency
+ * with CopyAttributeOutCSVHeader(). "static inline" will prevent performance
+ * penalty by this wrapping.
+ */
+static inline void
+CopyAttributeOutTextHeader(CopyToState cstate, char *string)
+{
+    CopyAttributeOutText(cstate, string);
+}
+
+static inline void
+CopyAttributeOutCSVHeader(CopyToState cstate, char *string)
+{
+    CopyAttributeOutCSV(cstate, string, false,
+                        list_length(cstate->attnumlist) == 1);
+}
+
+/*
+ * We don't use this function as a callback directly. We define
+ * CopyToTextStart() and CopyToCSVStart() and use them instead. It's for
+ * eliminating a "if (cstate->opts.csv_mode)" branch. This callback is called
+ * only once per COPY TO. So this optimization may be meaningless but done for
+ * consistency with CopyToTextBasedOneRow().
+ *
+ * This must initialize cstate->out_functions for CopyToTextBasedOneRow().
+ */
+static inline void
+CopyToTextBasedStart(CopyToState cstate, TupleDesc tupDesc, CopyAttributeOutHeaderFunction out)
+{
+    int            num_phys_attrs;
+    ListCell   *cur;
+
+    num_phys_attrs = tupDesc->natts;
+    /* Get info about the columns we need to process. */
+    cstate->out_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo));
+    foreach(cur, cstate->attnumlist)
+    {
+        int            attnum = lfirst_int(cur);
+        Oid            out_func_oid;
+        bool        isvarlena;
+        Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
+
+        getTypeOutputInfo(attr->atttypid, &out_func_oid, &isvarlena);
+        fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
+    }
+
+    /*
+     * For non-binary copy, we need to convert null_print to file encoding,
+     * because it will be sent directly with CopySendString.
+     */
+    if (cstate->need_transcoding)
+        cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print,
+                                                          cstate->opts.null_print_len,
+                                                          cstate->file_encoding);
+
+    /* if a header has been requested send the line */
+    if (cstate->opts.header_line)
+    {
+        bool        hdr_delim = false;
+
+        foreach(cur, cstate->attnumlist)
+        {
+            int            attnum = lfirst_int(cur);
+            char       *colname;
+
+            if (hdr_delim)
+                CopySendChar(cstate, cstate->opts.delim[0]);
+            hdr_delim = true;
+
+            colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
+
+            out(cstate, colname);
+        }
+
+        CopyToTextBasedSendEndOfRow(cstate);
+    }
+}
+
+static void
+CopyToTextStart(CopyToState cstate, TupleDesc tupDesc)
+{
+    CopyToTextBasedStart(cstate, tupDesc, CopyAttributeOutTextHeader);
+}
+
+static void
+CopyToCSVStart(CopyToState cstate, TupleDesc tupDesc)
+{
+    CopyToTextBasedStart(cstate, tupDesc, CopyAttributeOutCSVHeader);
+}
+
+typedef void (*CopyAttributeOutValueFunction) (CopyToState cstate, char *string, int attnum);
+
+static inline void
+CopyAttributeOutTextValue(CopyToState cstate, char *string, int attnum)
+{
+    CopyAttributeOutText(cstate, string);
+}
+
+static inline void
+CopyAttributeOutCSVValue(CopyToState cstate, char *string, int attnum)
+{
+    CopyAttributeOutCSV(cstate, string,
+                        cstate->opts.force_quote_flags[attnum - 1],
+                        list_length(cstate->attnumlist) == 1);
+}
+
+/*
+ * We don't use this function as a callback directly. We define
+ * CopyToTextOneRow() and CopyToCSVOneRow() and use them instead. It's for
+ * eliminating a "if (cstate->opts.csv_mode)" branch. This callback is called
+ * per tuple. So this optimization will be valuable when many tuples are
+ * copied.
+ *
+ * cstate->out_functions must be initialized in CopyToTextBasedStart().
+ */
+static void
+CopyToTextBasedOneRow(CopyToState cstate, TupleTableSlot *slot, CopyAttributeOutValueFunction out)
+{
+    bool        need_delim = false;
+    FmgrInfo   *out_functions = cstate->out_functions;
+    ListCell   *cur;
+
+    foreach(cur, cstate->attnumlist)
+    {
+        int            attnum = lfirst_int(cur);
+        Datum        value = slot->tts_values[attnum - 1];
+        bool        isnull = slot->tts_isnull[attnum - 1];
+
+        if (need_delim)
+            CopySendChar(cstate, cstate->opts.delim[0]);
+        need_delim = true;
+
+        if (isnull)
+        {
+            CopySendString(cstate, cstate->opts.null_print_client);
+        }
+        else
+        {
+            char       *string;
+
+            string = OutputFunctionCall(&out_functions[attnum - 1], value);
+            out(cstate, string, attnum);
+        }
+    }
+
+    CopyToTextBasedSendEndOfRow(cstate);
+}
+
+static void
+CopyToTextOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+    CopyToTextBasedOneRow(cstate, slot, CopyAttributeOutTextValue);
+}
+
+static void
+CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+    CopyToTextBasedOneRow(cstate, slot, CopyAttributeOutCSVValue);
+}
+
+static void
+CopyToTextBasedEnd(CopyToState cstate)
+{
+}
+
+/*
+ * CopyToRoutine implementation for "binary".
+ */
+
+/*
+ * This must initialize cstate->out_functions for CopyToBinaryOneRow().
+ */
+static void
+CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc)
+{
+    int            num_phys_attrs;
+    ListCell   *cur;
+
+    num_phys_attrs = tupDesc->natts;
+    /* Get info about the columns we need to process. */
+    cstate->out_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo));
+    foreach(cur, cstate->attnumlist)
+    {
+        int            attnum = lfirst_int(cur);
+        Oid            out_func_oid;
+        bool        isvarlena;
+        Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
+
+        getTypeBinaryOutputInfo(attr->atttypid, &out_func_oid, &isvarlena);
+        fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
+    }
+
+    {
+        /* Generate header for a binary copy */
+        int32        tmp;
+
+        /* Signature */
+        CopySendData(cstate, BinarySignature, 11);
+        /* Flags field */
+        tmp = 0;
+        CopySendInt32(cstate, tmp);
+        /* No header extension */
+        tmp = 0;
+        CopySendInt32(cstate, tmp);
+    }
+}
+
+/*
+ * cstate->out_functions must be initialized in CopyToBinaryStart().
+ */
+static void
+CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+    FmgrInfo   *out_functions = cstate->out_functions;
+    ListCell   *cur;
+
+    /* Binary per-tuple header */
+    CopySendInt16(cstate, list_length(cstate->attnumlist));
+
+    foreach(cur, cstate->attnumlist)
+    {
+        int            attnum = lfirst_int(cur);
+        Datum        value = slot->tts_values[attnum - 1];
+        bool        isnull = slot->tts_isnull[attnum - 1];
+
+        if (isnull)
+        {
+            CopySendInt32(cstate, -1);
+        }
+        else
+        {
+            bytea       *outputbytes;
+
+            outputbytes = SendFunctionCall(&out_functions[attnum - 1], value);
+            CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
+            CopySendData(cstate, VARDATA(outputbytes),
+                         VARSIZE(outputbytes) - VARHDRSZ);
+        }
+    }
+
+    CopySendEndOfRow(cstate);
+}
+
+static void
+CopyToBinaryEnd(CopyToState cstate)
+{
+    /* Generate trailer for a binary copy */
+    CopySendInt16(cstate, -1);
+    /* Need to flush out the trailer */
+    CopySendEndOfRow(cstate);
+}
+
+/*
+ * CopyToTextBased*() are shared with "csv". CopyToText*() are only for "text".
+ */
+static const CopyToRoutine CopyToRoutineText = {
+    .CopyToStart = CopyToTextStart,
+    .CopyToOneRow = CopyToTextOneRow,
+    .CopyToEnd = CopyToTextBasedEnd,
+};
+
+/*
+ * CopyToTextBased*() are shared with "text". CopyToCSV*() are only for "csv".
+ */
+static const CopyToRoutine CopyToRoutineCSV = {
+    .CopyToStart = CopyToCSVStart,
+    .CopyToOneRow = CopyToCSVOneRow,
+    .CopyToEnd = CopyToTextBasedEnd,
+};
+
+static const CopyToRoutine CopyToRoutineBinary = {
+    .CopyToStart = CopyToBinaryStart,
+    .CopyToOneRow = CopyToBinaryOneRow,
+    .CopyToEnd = CopyToBinaryEnd,
+};
+
+/*
+ * Process the FORMAT option for COPY TO.
+ *
+ * 'format' must be "text", "csv" or "binary".
+ */
+void
+ProcessCopyOptionFormatTo(ParseState *pstate,
+                          CopyFormatOptions *opts_out,
+                          const char *format)
+{
+    if (strcmp(format, "text") == 0)
+        opts_out->to_routine = &CopyToRoutineText;
+    else if (strcmp(format, "csv") == 0)
+    {
+        opts_out->to_routine = &CopyToRoutineCSV;
+    }
+    else if (strcmp(format, "binary") == 0)
+    {
+        opts_out->to_routine = &CopyToRoutineBinary;
+    }
+}
 
 /*
  * Send copy start/stop messages for frontend copies.  These have changed
@@ -198,16 +537,6 @@ CopySendEndOfRow(CopyToState cstate)
     switch (cstate->copy_dest)
     {
         case COPY_FILE:
-            if (!cstate->opts.binary)
-            {
-                /* Default line termination depends on platform */
-#ifndef WIN32
-                CopySendChar(cstate, '\n');
-#else
-                CopySendString(cstate, "\r\n");
-#endif
-            }
-
             if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1,
                        cstate->copy_file) != 1 ||
                 ferror(cstate->copy_file))
@@ -242,10 +571,6 @@ CopySendEndOfRow(CopyToState cstate)
             }
             break;
         case COPY_FRONTEND:
-            /* The FE/BE protocol uses \n as newline for all platforms */
-            if (!cstate->opts.binary)
-                CopySendChar(cstate, '\n');
-
             /* Dump the accumulated row as one CopyData message */
             (void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len);
             break;
@@ -748,8 +1073,6 @@ DoCopyTo(CopyToState cstate)
     bool        pipe = (cstate->filename == NULL && cstate->data_dest_cb == NULL);
     bool        fe_copy = (pipe && whereToSendOutput == DestRemote);
     TupleDesc    tupDesc;
-    int            num_phys_attrs;
-    ListCell   *cur;
     uint64        processed;
 
     if (fe_copy)
@@ -759,32 +1082,11 @@ DoCopyTo(CopyToState cstate)
         tupDesc = RelationGetDescr(cstate->rel);
     else
         tupDesc = cstate->queryDesc->tupDesc;
-    num_phys_attrs = tupDesc->natts;
     cstate->opts.null_print_client = cstate->opts.null_print;    /* default */
 
     /* We use fe_msgbuf as a per-row buffer regardless of copy_dest */
     cstate->fe_msgbuf = makeStringInfo();
 
-    /* Get info about the columns we need to process. */
-    cstate->out_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo));
-    foreach(cur, cstate->attnumlist)
-    {
-        int            attnum = lfirst_int(cur);
-        Oid            out_func_oid;
-        bool        isvarlena;
-        Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
-
-        if (cstate->opts.binary)
-            getTypeBinaryOutputInfo(attr->atttypid,
-                                    &out_func_oid,
-                                    &isvarlena);
-        else
-            getTypeOutputInfo(attr->atttypid,
-                              &out_func_oid,
-                              &isvarlena);
-        fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
-    }
-
     /*
      * Create a temporary memory context that we can reset once per row to
      * recover palloc'd memory.  This avoids any problems with leaks inside
@@ -795,57 +1097,7 @@ DoCopyTo(CopyToState cstate)
                                                "COPY TO",
                                                ALLOCSET_DEFAULT_SIZES);
 
-    if (cstate->opts.binary)
-    {
-        /* Generate header for a binary copy */
-        int32        tmp;
-
-        /* Signature */
-        CopySendData(cstate, BinarySignature, 11);
-        /* Flags field */
-        tmp = 0;
-        CopySendInt32(cstate, tmp);
-        /* No header extension */
-        tmp = 0;
-        CopySendInt32(cstate, tmp);
-    }
-    else
-    {
-        /*
-         * For non-binary copy, we need to convert null_print to file
-         * encoding, because it will be sent directly with CopySendString.
-         */
-        if (cstate->need_transcoding)
-            cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print,
-                                                              cstate->opts.null_print_len,
-                                                              cstate->file_encoding);
-
-        /* if a header has been requested send the line */
-        if (cstate->opts.header_line)
-        {
-            bool        hdr_delim = false;
-
-            foreach(cur, cstate->attnumlist)
-            {
-                int            attnum = lfirst_int(cur);
-                char       *colname;
-
-                if (hdr_delim)
-                    CopySendChar(cstate, cstate->opts.delim[0]);
-                hdr_delim = true;
-
-                colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
-
-                if (cstate->opts.csv_mode)
-                    CopyAttributeOutCSV(cstate, colname, false,
-                                        list_length(cstate->attnumlist) == 1);
-                else
-                    CopyAttributeOutText(cstate, colname);
-            }
-
-            CopySendEndOfRow(cstate);
-        }
-    }
+    cstate->opts.to_routine->CopyToStart(cstate, tupDesc);
 
     if (cstate->rel)
     {
@@ -884,13 +1136,7 @@ DoCopyTo(CopyToState cstate)
         processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
     }
 
-    if (cstate->opts.binary)
-    {
-        /* Generate trailer for a binary copy */
-        CopySendInt16(cstate, -1);
-        /* Need to flush out the trailer */
-        CopySendEndOfRow(cstate);
-    }
+    cstate->opts.to_routine->CopyToEnd(cstate);
 
     MemoryContextDelete(cstate->rowcontext);
 
@@ -906,71 +1152,15 @@ DoCopyTo(CopyToState cstate)
 static void
 CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 {
-    bool        need_delim = false;
-    FmgrInfo   *out_functions = cstate->out_functions;
     MemoryContext oldcontext;
-    ListCell   *cur;
-    char       *string;
 
     MemoryContextReset(cstate->rowcontext);
     oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
 
-    if (cstate->opts.binary)
-    {
-        /* Binary per-tuple header */
-        CopySendInt16(cstate, list_length(cstate->attnumlist));
-    }
-
     /* Make sure the tuple is fully deconstructed */
     slot_getallattrs(slot);
 
-    foreach(cur, cstate->attnumlist)
-    {
-        int            attnum = lfirst_int(cur);
-        Datum        value = slot->tts_values[attnum - 1];
-        bool        isnull = slot->tts_isnull[attnum - 1];
-
-        if (!cstate->opts.binary)
-        {
-            if (need_delim)
-                CopySendChar(cstate, cstate->opts.delim[0]);
-            need_delim = true;
-        }
-
-        if (isnull)
-        {
-            if (!cstate->opts.binary)
-                CopySendString(cstate, cstate->opts.null_print_client);
-            else
-                CopySendInt32(cstate, -1);
-        }
-        else
-        {
-            if (!cstate->opts.binary)
-            {
-                string = OutputFunctionCall(&out_functions[attnum - 1],
-                                            value);
-                if (cstate->opts.csv_mode)
-                    CopyAttributeOutCSV(cstate, string,
-                                        cstate->opts.force_quote_flags[attnum - 1],
-                                        list_length(cstate->attnumlist) == 1);
-                else
-                    CopyAttributeOutText(cstate, string);
-            }
-            else
-            {
-                bytea       *outputbytes;
-
-                outputbytes = SendFunctionCall(&out_functions[attnum - 1],
-                                               value);
-                CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
-                CopySendData(cstate, VARDATA(outputbytes),
-                             VARSIZE(outputbytes) - VARHDRSZ);
-            }
-        }
-    }
-
-    CopySendEndOfRow(cstate);
+    cstate->opts.to_routine->CopyToOneRow(cstate, slot);
 
     MemoryContextSwitchTo(oldcontext);
 }
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index b3da3cb0be..18486a3715 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -14,6 +14,7 @@
 #ifndef COPY_H
 #define COPY_H
 
+#include "commands/copyapi.h"
 #include "nodes/execnodes.h"
 #include "nodes/parsenodes.h"
 #include "parser/parse_node.h"
@@ -74,11 +75,11 @@ typedef struct CopyFormatOptions
     bool        convert_selectively;    /* do selective binary conversion? */
     CopyOnErrorChoice on_error; /* what to do when error happened */
     List       *convert_select; /* list of column names (can be NIL) */
+    const        CopyToRoutine *to_routine;    /* callback routines for COPY TO */
 } CopyFormatOptions;
 
-/* These are private in commands/copy[from|to].c */
+/* This is private in commands/copyfrom.c */
 typedef struct CopyFromStateData *CopyFromState;
-typedef struct CopyToStateData *CopyToState;
 
 typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
 typedef void (*copy_data_dest_cb) (void *data, int len);
@@ -88,6 +89,7 @@ extern void DoCopy(ParseState *pstate, const CopyStmt *stmt,
                    uint64 *processed);
 
 extern void ProcessCopyOptions(ParseState *pstate, CopyFormatOptions *opts_out, bool is_from, List *options);
+extern void ProcessCopyOptionFormatTo(ParseState *pstate, CopyFormatOptions *opts_out, const char *format);
 extern CopyFromState BeginCopyFrom(ParseState *pstate, Relation rel, Node *whereClause,
                                    const char *filename,
                                    bool is_program, copy_data_source_cb data_source_cb, List *attnamelist, List
*options);
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
new file mode 100644
index 0000000000..2f9ecd0e2b
--- /dev/null
+++ b/src/include/commands/copyapi.h
@@ -0,0 +1,35 @@
+/*-------------------------------------------------------------------------
+ *
+ * copyapi.h
+ *      API for COPY TO/FROM handlers
+ *
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/commands/copyapi.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef COPYAPI_H
+#define COPYAPI_H
+
+#include "executor/tuptable.h"
+
+/* This is private in commands/copyto.c */
+typedef struct CopyToStateData *CopyToState;
+
+/* Routines for a COPY TO format implementation. */
+typedef struct CopyToRoutine
+{
+    /* Called when COPY TO is started. This will send a header. */
+    void        (*CopyToStart) (CopyToState cstate, TupleDesc tupDesc);
+
+    /* Copy one row for COPY TO. */
+    void        (*CopyToOneRow) (CopyToState cstate, TupleTableSlot *slot);
+
+    /* Called when COPY TO is ended. This will send a trailer. */
+    void        (*CopyToEnd) (CopyToState cstate);
+}            CopyToRoutine;
+
+#endif                            /* COPYAPI_H */
-- 
2.43.0
From 9487884f2c0a8976945778821abd850418b6623c Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Wed, 31 Jan 2024 13:37:02 +0900
Subject: [PATCH v10 2/2] Extract COPY FROM format implementations
This doesn't change the current behavior. This just introduces
CopyFromRoutine, which just has function pointers of format
implementation like TupleTableSlotOps, and use it for existing "text",
"csv" and "binary" format implementations.
This is for performance. We can remove "if (cstate->opts.csv_mode)"
and "if (!cstate->opts.binary)" branches in NextCopyFrom() by using
callbacks for each format. It improves performance.
---
 src/backend/commands/copy.c              |   8 +-
 src/backend/commands/copyfrom.c          | 217 +++++++++---
 src/backend/commands/copyfromparse.c     | 420 +++++++++++++----------
 src/include/commands/copy.h              |   6 +-
 src/include/commands/copyapi.h           |  20 ++
 src/include/commands/copyfrom_internal.h |   4 +
 6 files changed, 447 insertions(+), 228 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index c88510f8c7..cd79e614b9 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -487,7 +487,9 @@ ProcessCopyOptions(ParseState *pstate,
                         (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                          errmsg("COPY format \"%s\" not recognized", fmt),
                          parser_errposition(pstate, defel->location)));
-            if (!is_from)
+            if (is_from)
+                ProcessCopyOptionFormatFrom(pstate, opts_out, fmt);
+            else
                 ProcessCopyOptionFormatTo(pstate, opts_out, fmt);
         }
         else if (strcmp(defel->defname, "freeze") == 0)
@@ -627,7 +629,9 @@ ProcessCopyOptions(ParseState *pstate,
     if (!format_specified)
     {
         /* Set the default format. */
-        if (!is_from)
+        if (is_from)
+            ProcessCopyOptionFormatFrom(pstate, opts_out, "text");
+        else
             ProcessCopyOptionFormatTo(pstate, opts_out, "text");
     }
 
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 1fe70b9133..b51096fc0d 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -108,6 +108,171 @@ static char *limit_printout_length(const char *str);
 
 static void ClosePipeFromProgram(CopyFromState cstate);
 
+
+/*
+ * CopyFromRoutine implementations.
+ */
+
+/*
+ * CopyFromRoutine implementation for "text" and "csv". CopyFromTextBased*()
+ * are shared by both of "text" and "csv". CopyFromText*() are only for "text"
+ * and CopyFromCSV*() are only for "csv".
+ *
+ * We can use the same functions for all callbacks by referring
+ * cstate->opts.csv_mode but splitting callbacks to eliminate "if
+ * (cstate->opts.csv_mode)" branches from all callbacks has performance merit
+ * when many tuples are copied. So we use separated callbacks for "text" and
+ * "csv".
+ */
+
+/*
+ * This must initialize cstate->in_functions for CopyFromTextBasedOneRow().
+ */
+static void
+CopyFromTextBasedStart(CopyFromState cstate, TupleDesc tupDesc)
+{
+    AttrNumber    num_phys_attrs = tupDesc->natts;
+    AttrNumber    attr_count;
+
+    /*
+     * If encoding conversion is needed, we need another buffer to hold the
+     * converted input data.  Otherwise, we can just point input_buf to the
+     * same buffer as raw_buf.
+     */
+    if (cstate->need_transcoding)
+    {
+        cstate->input_buf = (char *) palloc(INPUT_BUF_SIZE + 1);
+        cstate->input_buf_index = cstate->input_buf_len = 0;
+    }
+    else
+        cstate->input_buf = cstate->raw_buf;
+    cstate->input_reached_eof = false;
+
+    initStringInfo(&cstate->line_buf);
+
+    /*
+     * Pick up the required catalog information for each attribute in the
+     * relation, including the input function, the element type (to pass to
+     * the input function).
+     */
+    cstate->in_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo));
+    cstate->typioparams = (Oid *) palloc(num_phys_attrs * sizeof(Oid));
+    for (int attnum = 1; attnum <= num_phys_attrs; attnum++)
+    {
+        Form_pg_attribute att = TupleDescAttr(tupDesc, attnum - 1);
+        Oid            in_func_oid;
+
+        /* We don't need info for dropped attributes */
+        if (att->attisdropped)
+            continue;
+
+        /* Fetch the input function and typioparam info */
+        getTypeInputInfo(att->atttypid,
+                         &in_func_oid, &cstate->typioparams[attnum - 1]);
+        fmgr_info(in_func_oid, &cstate->in_functions[attnum - 1]);
+    }
+
+    /* create workspace for CopyReadAttributes results */
+    attr_count = list_length(cstate->attnumlist);
+    cstate->max_fields = attr_count;
+    cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *));
+}
+
+static void
+CopyFromTextBasedEnd(CopyFromState cstate)
+{
+}
+
+/*
+ * CopyFromRoutine implementation for "binary".
+ */
+
+/*
+ * This must initialize cstate->in_functions for CopyFromBinaryOneRow().
+ */
+static void
+CopyFromBinaryStart(CopyFromState cstate, TupleDesc tupDesc)
+{
+    AttrNumber    num_phys_attrs = tupDesc->natts;
+
+    /*
+     * Pick up the required catalog information for each attribute in the
+     * relation, including the input function, the element type (to pass to
+     * the input function).
+     */
+    cstate->in_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo));
+    cstate->typioparams = (Oid *) palloc(num_phys_attrs * sizeof(Oid));
+    for (int attnum = 1; attnum <= num_phys_attrs; attnum++)
+    {
+        Form_pg_attribute att = TupleDescAttr(tupDesc, attnum - 1);
+        Oid            in_func_oid;
+
+        /* We don't need info for dropped attributes */
+        if (att->attisdropped)
+            continue;
+
+        /* Fetch the input function and typioparam info */
+        getTypeBinaryInputInfo(att->atttypid,
+                               &in_func_oid, &cstate->typioparams[attnum - 1]);
+        fmgr_info(in_func_oid, &cstate->in_functions[attnum - 1]);
+    }
+
+    /* Read and verify binary header */
+    ReceiveCopyBinaryHeader(cstate);
+}
+
+static void
+CopyFromBinaryEnd(CopyFromState cstate)
+{
+}
+
+/*
+ * CopyFromTextBased*() are shared with "csv". CopyFromText*() are only for "text".
+ */
+static const CopyFromRoutine CopyFromRoutineText = {
+    .CopyFromStart = CopyFromTextBasedStart,
+    .CopyFromOneRow = CopyFromTextOneRow,
+    .CopyFromEnd = CopyFromTextBasedEnd,
+};
+
+/*
+ * CopyFromTextBased*() are shared with "text". CopyFromCSV*() are only for "csv".
+ */
+static const CopyFromRoutine CopyFromRoutineCSV = {
+    .CopyFromStart = CopyFromTextBasedStart,
+    .CopyFromOneRow = CopyFromCSVOneRow,
+    .CopyFromEnd = CopyFromTextBasedEnd,
+};
+
+static const CopyFromRoutine CopyFromRoutineBinary = {
+    .CopyFromStart = CopyFromBinaryStart,
+    .CopyFromOneRow = CopyFromBinaryOneRow,
+    .CopyFromEnd = CopyFromBinaryEnd,
+};
+
+/*
+ * Process the FORMAT option for COPY FROM.
+ *
+ * 'format' must be "text", "csv" or "binary".
+ */
+void
+ProcessCopyOptionFormatFrom(ParseState *pstate,
+                            CopyFormatOptions *opts_out,
+                            const char *format)
+{
+    if (strcmp(format, "text") == 0)
+        opts_out->from_routine = &CopyFromRoutineText;
+    else if (strcmp(format, "csv") == 0)
+    {
+        opts_out->from_routine = &CopyFromRoutineCSV;
+    }
+    else if (strcmp(format, "binary") == 0)
+    {
+        opts_out->from_routine = &CopyFromRoutineBinary;
+    }
+}
+
+
 /*
  * error context callback for COPY FROM
  *
@@ -1384,9 +1549,6 @@ BeginCopyFrom(ParseState *pstate,
     TupleDesc    tupDesc;
     AttrNumber    num_phys_attrs,
                 num_defaults;
-    FmgrInfo   *in_functions;
-    Oid           *typioparams;
-    Oid            in_func_oid;
     int           *defmap;
     ExprState **defexprs;
     MemoryContext oldcontext;
@@ -1571,25 +1733,6 @@ BeginCopyFrom(ParseState *pstate,
     cstate->raw_buf_index = cstate->raw_buf_len = 0;
     cstate->raw_reached_eof = false;
 
-    if (!cstate->opts.binary)
-    {
-        /*
-         * If encoding conversion is needed, we need another buffer to hold
-         * the converted input data.  Otherwise, we can just point input_buf
-         * to the same buffer as raw_buf.
-         */
-        if (cstate->need_transcoding)
-        {
-            cstate->input_buf = (char *) palloc(INPUT_BUF_SIZE + 1);
-            cstate->input_buf_index = cstate->input_buf_len = 0;
-        }
-        else
-            cstate->input_buf = cstate->raw_buf;
-        cstate->input_reached_eof = false;
-
-        initStringInfo(&cstate->line_buf);
-    }
-
     initStringInfo(&cstate->attribute_buf);
 
     /* Assign range table and rteperminfos, we'll need them in CopyFrom. */
@@ -1608,8 +1751,6 @@ BeginCopyFrom(ParseState *pstate,
      * the input function), and info about defaults and constraints. (Which
      * input function we use depends on text/binary format choice.)
      */
-    in_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo));
-    typioparams = (Oid *) palloc(num_phys_attrs * sizeof(Oid));
     defmap = (int *) palloc(num_phys_attrs * sizeof(int));
     defexprs = (ExprState **) palloc(num_phys_attrs * sizeof(ExprState *));
 
@@ -1621,15 +1762,6 @@ BeginCopyFrom(ParseState *pstate,
         if (att->attisdropped)
             continue;
 
-        /* Fetch the input function and typioparam info */
-        if (cstate->opts.binary)
-            getTypeBinaryInputInfo(att->atttypid,
-                                   &in_func_oid, &typioparams[attnum - 1]);
-        else
-            getTypeInputInfo(att->atttypid,
-                             &in_func_oid, &typioparams[attnum - 1]);
-        fmgr_info(in_func_oid, &in_functions[attnum - 1]);
-
         /* Get default info if available */
         defexprs[attnum - 1] = NULL;
 
@@ -1689,8 +1821,6 @@ BeginCopyFrom(ParseState *pstate,
     cstate->bytes_processed = 0;
 
     /* We keep those variables in cstate. */
-    cstate->in_functions = in_functions;
-    cstate->typioparams = typioparams;
     cstate->defmap = defmap;
     cstate->defexprs = defexprs;
     cstate->volatile_defexprs = volatile_defexprs;
@@ -1763,20 +1893,7 @@ BeginCopyFrom(ParseState *pstate,
 
     pgstat_progress_update_multi_param(3, progress_cols, progress_vals);
 
-    if (cstate->opts.binary)
-    {
-        /* Read and verify binary header */
-        ReceiveCopyBinaryHeader(cstate);
-    }
-
-    /* create workspace for CopyReadAttributes results */
-    if (!cstate->opts.binary)
-    {
-        AttrNumber    attr_count = list_length(cstate->attnumlist);
-
-        cstate->max_fields = attr_count;
-        cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *));
-    }
+    cstate->opts.from_routine->CopyFromStart(cstate, tupDesc);
 
     MemoryContextSwitchTo(oldcontext);
 
@@ -1789,6 +1906,8 @@ BeginCopyFrom(ParseState *pstate,
 void
 EndCopyFrom(CopyFromState cstate)
 {
+    cstate->opts.from_routine->CopyFromEnd(cstate);
+
     /* No COPY FROM related resources except memory. */
     if (cstate->is_program)
     {
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 7cacd0b752..658d2429a9 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -740,8 +740,19 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes)
     return copied_bytes;
 }
 
+typedef int (*CopyReadAttributes) (CopyFromState cstate);
+
 /*
- * Read raw fields in the next line for COPY FROM in text or csv mode.
+ * Read raw fields in the next line for COPY FROM in text or csv
+ * mode. CopyReadAttributesText() must be used for text mode and
+ * CopyReadAttributesCSV() for csv mode. This inconvenient is for
+ * optimization. If "if (cstate->opts.csv_mode)" branch is removed, there is
+ * performance merit for COPY FROM with many tuples.
+ *
+ * NextCopyFromRawFields() can be used instead for convenience
+ * use. NextCopyFromRawFields() chooses CopyReadAttributesText() or
+ * CopyReadAttributesCSV() internally.
+ *
  * Return false if no more lines.
  *
  * An internal temporary buffer is returned via 'fields'. It is valid until
@@ -751,8 +762,8 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes)
  *
  * NOTE: force_not_null option are not applied to the returned fields.
  */
-bool
-NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
+static inline bool
+NextCopyFromRawFieldsInternal(CopyFromState cstate, char ***fields, int *nfields, CopyReadAttributes
copy_read_attributes)
 {
     int            fldct;
     bool        done;
@@ -775,11 +786,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
         {
             int            fldnum;
 
-            if (cstate->opts.csv_mode)
-                fldct = CopyReadAttributesCSV(cstate);
-            else
-                fldct = CopyReadAttributesText(cstate);
-
+            fldct = copy_read_attributes(cstate);
             if (fldct != list_length(cstate->attnumlist))
                 ereport(ERROR,
                         (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
@@ -830,16 +837,240 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
         return false;
 
     /* Parse the line into de-escaped field values */
-    if (cstate->opts.csv_mode)
-        fldct = CopyReadAttributesCSV(cstate);
-    else
-        fldct = CopyReadAttributesText(cstate);
+    fldct = copy_read_attributes(cstate);
 
     *fields = cstate->raw_fields;
     *nfields = fldct;
     return true;
 }
 
+/*
+ * See NextCopyFromRawFieldsInternal() for details.
+ */
+bool
+NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
+{
+    if (cstate->opts.csv_mode)
+        return NextCopyFromRawFieldsInternal(cstate, fields, nfields, CopyReadAttributesCSV);
+    else
+        return NextCopyFromRawFieldsInternal(cstate, fields, nfields, CopyReadAttributesText);
+}
+
+typedef char *(*PostpareColumnValue) (CopyFromState cstate, char *string, int m);
+
+static inline char *
+PostpareColumnValueText(CopyFromState cstate, char *string, int m)
+{
+    /* do nothing */
+    return string;
+}
+
+static inline char *
+PostpareColumnValueCSV(CopyFromState cstate, char *string, int m)
+{
+    if (string == NULL &&
+        cstate->opts.force_notnull_flags[m])
+    {
+        /*
+         * FORCE_NOT_NULL option is set and column is NULL - convert it to the
+         * NULL string.
+         */
+        string = cstate->opts.null_print;
+    }
+    else if (string != NULL && cstate->opts.force_null_flags[m]
+             && strcmp(string, cstate->opts.null_print) == 0)
+    {
+        /*
+         * FORCE_NULL option is set and column matches the NULL string. It
+         * must have been quoted, or otherwise the string would already have
+         * been set to NULL. Convert it to NULL as specified.
+         */
+        string = NULL;
+    }
+    return string;
+}
+
+/*
+ * We don't use this function as a callback directly. We define
+ * CopyFromTextOneRow() and CopyFromCSVOneRow() and use them instead. It's for
+ * eliminating a "if (cstate->opts.csv_mode)" branch. This callback is called
+ * per tuple. So this optimization will be valuable when many tuples are
+ * copied.
+ *
+ * cstate->in_functions must be initialized in CopyFromTextBasedStart().
+ */
+static inline bool
+CopyFromTextBasedOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls, CopyReadAttributes
copy_read_attributes,PostpareColumnValue postpare_column_value)
 
+{
+    TupleDesc    tupDesc;
+    AttrNumber    attr_count;
+    FmgrInfo   *in_functions = cstate->in_functions;
+    Oid           *typioparams = cstate->typioparams;
+    ExprState **defexprs = cstate->defexprs;
+    char      **field_strings;
+    ListCell   *cur;
+    int            fldct;
+    int            fieldno;
+    char       *string;
+
+    tupDesc = RelationGetDescr(cstate->rel);
+    attr_count = list_length(cstate->attnumlist);
+
+    /* read raw fields in the next line */
+    if (!NextCopyFromRawFieldsInternal(cstate, &field_strings, &fldct, copy_read_attributes))
+        return false;
+
+    /* check for overflowing fields */
+    if (attr_count > 0 && fldct > attr_count)
+        ereport(ERROR,
+                (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                 errmsg("extra data after last expected column")));
+
+    fieldno = 0;
+
+    /* Loop to read the user attributes on the line. */
+    foreach(cur, cstate->attnumlist)
+    {
+        int            attnum = lfirst_int(cur);
+        int            m = attnum - 1;
+        Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+        if (fieldno >= fldct)
+            ereport(ERROR,
+                    (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                     errmsg("missing data for column \"%s\"",
+                            NameStr(att->attname))));
+        string = field_strings[fieldno++];
+
+        if (cstate->convert_select_flags &&
+            !cstate->convert_select_flags[m])
+        {
+            /* ignore input field, leaving column as NULL */
+            continue;
+        }
+
+        cstate->cur_attname = NameStr(att->attname);
+        cstate->cur_attval = string;
+
+        string = postpare_column_value(cstate, string, m);
+
+        if (string != NULL)
+            nulls[m] = false;
+
+        if (cstate->defaults[m])
+        {
+            /*
+             * The caller must supply econtext and have switched into the
+             * per-tuple memory context in it.
+             */
+            Assert(econtext != NULL);
+            Assert(CurrentMemoryContext == econtext->ecxt_per_tuple_memory);
+
+            values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]);
+        }
+
+        /*
+         * If ON_ERROR is specified with IGNORE, skip rows with soft errors
+         */
+        else if (!InputFunctionCallSafe(&in_functions[m],
+                                        string,
+                                        typioparams[m],
+                                        att->atttypmod,
+                                        (Node *) cstate->escontext,
+                                        &values[m]))
+        {
+            cstate->num_errors++;
+            return true;
+        }
+
+        cstate->cur_attname = NULL;
+        cstate->cur_attval = NULL;
+    }
+
+    Assert(fieldno == attr_count);
+
+    return true;
+}
+
+bool
+CopyFromTextOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls)
+{
+    return CopyFromTextBasedOneRow(cstate, econtext, values, nulls, CopyReadAttributesText, PostpareColumnValueText);
+}
+
+bool
+CopyFromCSVOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls)
+{
+    return CopyFromTextBasedOneRow(cstate, econtext, values, nulls, CopyReadAttributesCSV, PostpareColumnValueCSV);
+}
+
+/*
+ * cstate->in_functions must be initialized in CopyFromBinaryStart().
+ */
+bool
+CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls)
+{
+    TupleDesc    tupDesc;
+    AttrNumber    attr_count;
+    FmgrInfo   *in_functions = cstate->in_functions;
+    Oid           *typioparams = cstate->typioparams;
+    int16        fld_count;
+    ListCell   *cur;
+
+    tupDesc = RelationGetDescr(cstate->rel);
+    attr_count = list_length(cstate->attnumlist);
+
+    cstate->cur_lineno++;
+
+    if (!CopyGetInt16(cstate, &fld_count))
+    {
+        /* EOF detected (end of file, or protocol-level EOF) */
+        return false;
+    }
+
+    if (fld_count == -1)
+    {
+        /*
+         * Received EOF marker.  Wait for the protocol-level EOF, and complain
+         * if it doesn't come immediately.  In COPY FROM STDIN, this ensures
+         * that we correctly handle CopyFail, if client chooses to send that
+         * now.  When copying from file, we could ignore the rest of the file
+         * like in text mode, but we choose to be consistent with the COPY
+         * FROM STDIN case.
+         */
+        char        dummy;
+
+        if (CopyReadBinaryData(cstate, &dummy, 1) > 0)
+            ereport(ERROR,
+                    (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                     errmsg("received copy data after EOF marker")));
+        return false;
+    }
+
+    if (fld_count != attr_count)
+        ereport(ERROR,
+                (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                 errmsg("row field count is %d, expected %d",
+                        (int) fld_count, attr_count)));
+
+    foreach(cur, cstate->attnumlist)
+    {
+        int            attnum = lfirst_int(cur);
+        int            m = attnum - 1;
+        Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+        cstate->cur_attname = NameStr(att->attname);
+        values[m] = CopyReadBinaryAttribute(cstate,
+                                            &in_functions[m],
+                                            typioparams[m],
+                                            att->atttypmod,
+                                            &nulls[m]);
+        cstate->cur_attname = NULL;
+    }
+
+    return true;
+}
+
 /*
  * Read next tuple from file for COPY FROM. Return false if no more tuples.
  *
@@ -857,181 +1088,22 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 {
     TupleDesc    tupDesc;
     AttrNumber    num_phys_attrs,
-                attr_count,
                 num_defaults = cstate->num_defaults;
-    FmgrInfo   *in_functions = cstate->in_functions;
-    Oid           *typioparams = cstate->typioparams;
     int            i;
     int           *defmap = cstate->defmap;
     ExprState **defexprs = cstate->defexprs;
 
     tupDesc = RelationGetDescr(cstate->rel);
     num_phys_attrs = tupDesc->natts;
-    attr_count = list_length(cstate->attnumlist);
 
     /* Initialize all values for row to NULL */
     MemSet(values, 0, num_phys_attrs * sizeof(Datum));
     MemSet(nulls, true, num_phys_attrs * sizeof(bool));
     MemSet(cstate->defaults, false, num_phys_attrs * sizeof(bool));
 
-    if (!cstate->opts.binary)
-    {
-        char      **field_strings;
-        ListCell   *cur;
-        int            fldct;
-        int            fieldno;
-        char       *string;
-
-        /* read raw fields in the next line */
-        if (!NextCopyFromRawFields(cstate, &field_strings, &fldct))
-            return false;
-
-        /* check for overflowing fields */
-        if (attr_count > 0 && fldct > attr_count)
-            ereport(ERROR,
-                    (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                     errmsg("extra data after last expected column")));
-
-        fieldno = 0;
-
-        /* Loop to read the user attributes on the line. */
-        foreach(cur, cstate->attnumlist)
-        {
-            int            attnum = lfirst_int(cur);
-            int            m = attnum - 1;
-            Form_pg_attribute att = TupleDescAttr(tupDesc, m);
-
-            if (fieldno >= fldct)
-                ereport(ERROR,
-                        (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                         errmsg("missing data for column \"%s\"",
-                                NameStr(att->attname))));
-            string = field_strings[fieldno++];
-
-            if (cstate->convert_select_flags &&
-                !cstate->convert_select_flags[m])
-            {
-                /* ignore input field, leaving column as NULL */
-                continue;
-            }
-
-            if (cstate->opts.csv_mode)
-            {
-                if (string == NULL &&
-                    cstate->opts.force_notnull_flags[m])
-                {
-                    /*
-                     * FORCE_NOT_NULL option is set and column is NULL -
-                     * convert it to the NULL string.
-                     */
-                    string = cstate->opts.null_print;
-                }
-                else if (string != NULL && cstate->opts.force_null_flags[m]
-                         && strcmp(string, cstate->opts.null_print) == 0)
-                {
-                    /*
-                     * FORCE_NULL option is set and column matches the NULL
-                     * string. It must have been quoted, or otherwise the
-                     * string would already have been set to NULL. Convert it
-                     * to NULL as specified.
-                     */
-                    string = NULL;
-                }
-            }
-
-            cstate->cur_attname = NameStr(att->attname);
-            cstate->cur_attval = string;
-
-            if (string != NULL)
-                nulls[m] = false;
-
-            if (cstate->defaults[m])
-            {
-                /*
-                 * The caller must supply econtext and have switched into the
-                 * per-tuple memory context in it.
-                 */
-                Assert(econtext != NULL);
-                Assert(CurrentMemoryContext == econtext->ecxt_per_tuple_memory);
-
-                values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]);
-            }
-
-            /*
-             * If ON_ERROR is specified with IGNORE, skip rows with soft
-             * errors
-             */
-            else if (!InputFunctionCallSafe(&in_functions[m],
-                                            string,
-                                            typioparams[m],
-                                            att->atttypmod,
-                                            (Node *) cstate->escontext,
-                                            &values[m]))
-            {
-                cstate->num_errors++;
-                return true;
-            }
-
-            cstate->cur_attname = NULL;
-            cstate->cur_attval = NULL;
-        }
-
-        Assert(fieldno == attr_count);
-    }
-    else
-    {
-        /* binary */
-        int16        fld_count;
-        ListCell   *cur;
-
-        cstate->cur_lineno++;
-
-        if (!CopyGetInt16(cstate, &fld_count))
-        {
-            /* EOF detected (end of file, or protocol-level EOF) */
-            return false;
-        }
-
-        if (fld_count == -1)
-        {
-            /*
-             * Received EOF marker.  Wait for the protocol-level EOF, and
-             * complain if it doesn't come immediately.  In COPY FROM STDIN,
-             * this ensures that we correctly handle CopyFail, if client
-             * chooses to send that now.  When copying from file, we could
-             * ignore the rest of the file like in text mode, but we choose to
-             * be consistent with the COPY FROM STDIN case.
-             */
-            char        dummy;
-
-            if (CopyReadBinaryData(cstate, &dummy, 1) > 0)
-                ereport(ERROR,
-                        (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                         errmsg("received copy data after EOF marker")));
-            return false;
-        }
-
-        if (fld_count != attr_count)
-            ereport(ERROR,
-                    (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                     errmsg("row field count is %d, expected %d",
-                            (int) fld_count, attr_count)));
-
-        foreach(cur, cstate->attnumlist)
-        {
-            int            attnum = lfirst_int(cur);
-            int            m = attnum - 1;
-            Form_pg_attribute att = TupleDescAttr(tupDesc, m);
-
-            cstate->cur_attname = NameStr(att->attname);
-            values[m] = CopyReadBinaryAttribute(cstate,
-                                                &in_functions[m],
-                                                typioparams[m],
-                                                att->atttypmod,
-                                                &nulls[m]);
-            cstate->cur_attname = NULL;
-        }
-    }
+    if (!cstate->opts.from_routine->CopyFromOneRow(cstate, econtext, values,
+                                                   nulls))
+        return false;
 
     /*
      * Now compute and insert any defaults available for the columns not
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 18486a3715..799219c9ae 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -75,12 +75,11 @@ typedef struct CopyFormatOptions
     bool        convert_selectively;    /* do selective binary conversion? */
     CopyOnErrorChoice on_error; /* what to do when error happened */
     List       *convert_select; /* list of column names (can be NIL) */
+    const        CopyFromRoutine *from_routine;    /* callback routines for COPY
+                                                 * FROM */
     const        CopyToRoutine *to_routine;    /* callback routines for COPY TO */
 } CopyFormatOptions;
 
-/* This is private in commands/copyfrom.c */
-typedef struct CopyFromStateData *CopyFromState;
-
 typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
 typedef void (*copy_data_dest_cb) (void *data, int len);
 
@@ -89,6 +88,7 @@ extern void DoCopy(ParseState *pstate, const CopyStmt *stmt,
                    uint64 *processed);
 
 extern void ProcessCopyOptions(ParseState *pstate, CopyFormatOptions *opts_out, bool is_from, List *options);
+extern void ProcessCopyOptionFormatFrom(ParseState *pstate, CopyFormatOptions *opts_out, const char *format);
 extern void ProcessCopyOptionFormatTo(ParseState *pstate, CopyFormatOptions *opts_out, const char *format);
 extern CopyFromState BeginCopyFrom(ParseState *pstate, Relation rel, Node *whereClause,
                                    const char *filename,
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 2f9ecd0e2b..38406a8447 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -15,6 +15,26 @@
 #define COPYAPI_H
 
 #include "executor/tuptable.h"
+#include "nodes/execnodes.h"
+
+/* This is private in commands/copyfrom.c */
+typedef struct CopyFromStateData *CopyFromState;
+
+/* Routines for a COPY FROM format implementation. */
+typedef struct CopyFromRoutine
+{
+    /*
+     * Called when COPY FROM is started. This will initialize something and
+     * receive a header.
+     */
+    void        (*CopyFromStart) (CopyFromState cstate, TupleDesc tupDesc);
+
+    /* Copy one row. It returns false if no more tuples. */
+    bool        (*CopyFromOneRow) (CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls);
+
+    /* Called when COPY FROM is ended. This will finalize something. */
+    void        (*CopyFromEnd) (CopyFromState cstate);
+}            CopyFromRoutine;
 
 /* This is private in commands/copyto.c */
 typedef struct CopyToStateData *CopyToState;
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index cad52fcc78..096b55011e 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -183,4 +183,8 @@ typedef struct CopyFromStateData
 extern void ReceiveCopyBegin(CopyFromState cstate);
 extern void ReceiveCopyBinaryHeader(CopyFromState cstate);
 
+extern bool CopyFromTextOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls);
+extern bool CopyFromCSVOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls);
+extern bool CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls);
+
 #endif                            /* COPYFROM_INTERNAL_H */
-- 
2.43.0
			
		On Wed, Jan 31, 2024 at 02:11:22PM +0900, Sutou Kouhei wrote:
> Ah, yes. defel->location is used in later patches. For
> example, it's used when a COPY handler for the specified
> FORMAT isn't found.
I see.
> I've prepared the v10 patch set. Could you try this?
Thanks, I'm looking into that now.
> FYI: Here are Copy{From,To}Routine in the v10 patch set. I
> think that only Copy{From,To}OneRow are minimal callbacks
> for the performance gain. But can we keep Copy{From,To}Start
> and Copy{From,To}End for consistency? We can remove a few
> {csv_mode,binary} conditions by Copy{From,To}{Start,End}. It
> doesn't depend on the number of COPY target tuples. So they
> will not affect performance.
I think I'm OK to keep the start/end callbacks.  This makes the code
more consistent as a whole, as well.
--
Michael
			
		Вложения
On Wed, Jan 31, 2024 at 02:39:54PM +0900, Michael Paquier wrote:
> Thanks, I'm looking into that now.
I have much to say about the patch, but for now I have begun running
some performance tests using the patches, because this thread won't
get far until we are sure that the callbacks do not impact performance
in some kind of worst-case scenario.  First, here is what I used to
setup a set of tables used for COPY FROM and COPY TO (requires [1] to
feed COPY FROM's data to the void, and note that default values is to
have a strict control on the size of the StringInfos used in the copy
paths):
CREATE EXTENSION blackhole_am;
CREATE OR REPLACE FUNCTION create_table_cols(tabname text, num_cols int)
RETURNS VOID AS
$func$
DECLARE
  query text;
BEGIN
  query := 'CREATE UNLOGGED TABLE ' || tabname || ' (';
  FOR i IN 1..num_cols LOOP
    query := query || 'a_' || i::text || ' int default 1';
    IF i != num_cols THEN
      query := query || ', ';
    END IF;
  END LOOP;
  query := query || ')';
  EXECUTE format(query);
END
$func$ LANGUAGE plpgsql;
-- Tables used for COPY TO
SELECT create_table_cols ('to_tab_1', 1);
SELECT create_table_cols ('to_tab_10', 10);
INSERT INTO to_tab_1 SELECT FROM generate_series(1, 10000000);
INSERT INTO to_tab_10 SELECT FROM generate_series(1, 10000000);
-- Data for COPY FROM
COPY to_tab_1 TO '/tmp/to_tab_1.bin' WITH (format binary);
COPY to_tab_10 TO '/tmp/to_tab_10.bin' WITH (format binary);
COPY to_tab_1 TO '/tmp/to_tab_1.txt' WITH (format text);
COPY to_tab_10 TO '/tmp/to_tab_10.txt' WITH (format text);
-- Tables used for COPY FROM
SELECT create_table_cols ('from_tab_1', 1);
SELECT create_table_cols ('from_tab_10', 10);
ALTER TABLE from_tab_1 SET ACCESS METHOD blackhole_am;
ALTER TABLE from_tab_10 SET ACCESS METHOD blackhole_am;
Then I have run a set of tests using HEAD, v7 and v10 with queries
like that (adapt them depending on the format and table):
COPY to_tab_1 TO '/dev/null' WITH (FORMAT text) \watch count=5
SET client_min_messages TO error; -- for blackhole_am
COPY from_tab_1 FROM '/tmp/to_tab_1.txt' with (FORMAT 'text') \watch count=5
COPY from_tab_1 FROM '/tmp/to_tab_1.bin' with (FORMAT 'binary') \watch count=5
All the patches have been compiled with -O2, without assertions, etc.
Postgres is run in tmpfs mode, on scissors, without fsync.  Unlogged
tables help a bit in focusing on the execution paths as we don't care
about WAL, of course.  I have also included v7 in the test of tests,
as this version uses more simple per-row callbacks.
And here are the results I get for text and binary (ms, average of 15
queries after discarding the three highest and three lowest values):
      test       | master |  v7  | v10
-----------------+--------+------+------
 from_bin_1col   | 1575   | 1546 | 1575
 from_bin_10col  | 5364   | 5208 | 5230
 from_text_1col  | 1690   | 1715 | 1684
 from_text_10col | 4875   | 4793 | 4757
 to_bin_1col     | 1717   | 1730 | 1731
 to_bin_10col    | 7728   | 7707 | 7513
 to_text_1col    | 1710   | 1730 | 1698
 to_text_10col   | 5998   | 5960 | 5987
I am getting an interesting trend here in terms of a speedup between
HEAD and the patches with a table that has 10 attributes filled with
integers, especially for binary and text with COPY FROM.  COPY TO
binary also gets nice numbers, while text looks rather stable.  Hmm.
These were on my buildfarm animal, but we need to be more confident
about all this.  Could more people run these tests?  I am going to do
a second session on a local machine I have at hand and see what
happens.  Will publish the numbers here, the method will be the same.
[1]: https://github.com/michaelpq/pg_plugins/tree/main/blackhole_am
--
Michael
			
		Вложения
Hi Michael,
On Thu, Feb 1, 2024 at 9:58 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Jan 31, 2024 at 02:39:54PM +0900, Michael Paquier wrote:
> > Thanks, I'm looking into that now.
>
> I have much to say about the patch, but for now I have begun running
> some performance tests using the patches, because this thread won't
> get far until we are sure that the callbacks do not impact performance
> in some kind of worst-case scenario.  First, here is what I used to
> setup a set of tables used for COPY FROM and COPY TO (requires [1] to
> feed COPY FROM's data to the void, and note that default values is to
> have a strict control on the size of the StringInfos used in the copy
> paths):
> CREATE EXTENSION blackhole_am;
> CREATE OR REPLACE FUNCTION create_table_cols(tabname text, num_cols int)
> RETURNS VOID AS
> $func$
> DECLARE
>   query text;
> BEGIN
>   query := 'CREATE UNLOGGED TABLE ' || tabname || ' (';
>   FOR i IN 1..num_cols LOOP
>     query := query || 'a_' || i::text || ' int default 1';
>     IF i != num_cols THEN
>       query := query || ', ';
>     END IF;
>   END LOOP;
>   query := query || ')';
>   EXECUTE format(query);
> END
> $func$ LANGUAGE plpgsql;
> -- Tables used for COPY TO
> SELECT create_table_cols ('to_tab_1', 1);
> SELECT create_table_cols ('to_tab_10', 10);
> INSERT INTO to_tab_1 SELECT FROM generate_series(1, 10000000);
> INSERT INTO to_tab_10 SELECT FROM generate_series(1, 10000000);
> -- Data for COPY FROM
> COPY to_tab_1 TO '/tmp/to_tab_1.bin' WITH (format binary);
> COPY to_tab_10 TO '/tmp/to_tab_10.bin' WITH (format binary);
> COPY to_tab_1 TO '/tmp/to_tab_1.txt' WITH (format text);
> COPY to_tab_10 TO '/tmp/to_tab_10.txt' WITH (format text);
> -- Tables used for COPY FROM
> SELECT create_table_cols ('from_tab_1', 1);
> SELECT create_table_cols ('from_tab_10', 10);
> ALTER TABLE from_tab_1 SET ACCESS METHOD blackhole_am;
> ALTER TABLE from_tab_10 SET ACCESS METHOD blackhole_am;
>
> Then I have run a set of tests using HEAD, v7 and v10 with queries
> like that (adapt them depending on the format and table):
> COPY to_tab_1 TO '/dev/null' WITH (FORMAT text) \watch count=5
> SET client_min_messages TO error; -- for blackhole_am
> COPY from_tab_1 FROM '/tmp/to_tab_1.txt' with (FORMAT 'text') \watch count=5
> COPY from_tab_1 FROM '/tmp/to_tab_1.bin' with (FORMAT 'binary') \watch count=5
>
> All the patches have been compiled with -O2, without assertions, etc.
> Postgres is run in tmpfs mode, on scissors, without fsync.  Unlogged
> tables help a bit in focusing on the execution paths as we don't care
> about WAL, of course.  I have also included v7 in the test of tests,
> as this version uses more simple per-row callbacks.
>
> And here are the results I get for text and binary (ms, average of 15
> queries after discarding the three highest and three lowest values):
>       test       | master |  v7  | v10
> -----------------+--------+------+------
>  from_bin_1col   | 1575   | 1546 | 1575
>  from_bin_10col  | 5364   | 5208 | 5230
>  from_text_1col  | 1690   | 1715 | 1684
>  from_text_10col | 4875   | 4793 | 4757
>  to_bin_1col     | 1717   | 1730 | 1731
>  to_bin_10col    | 7728   | 7707 | 7513
>  to_text_1col    | 1710   | 1730 | 1698
>  to_text_10col   | 5998   | 5960 | 5987
>
> I am getting an interesting trend here in terms of a speedup between
> HEAD and the patches with a table that has 10 attributes filled with
> integers, especially for binary and text with COPY FROM.  COPY TO
> binary also gets nice numbers, while text looks rather stable.  Hmm.
>
> These were on my buildfarm animal, but we need to be more confident
> about all this.  Could more people run these tests?  I am going to do
> a second session on a local machine I have at hand and see what
> happens.  Will publish the numbers here, the method will be the same.
>
> [1]: https://github.com/michaelpq/pg_plugins/tree/main/blackhole_am
> --
> Michael
I'm running the benchmark, but I got some strong numbers:
postgres=# \timing
Timing is on.
postgres=# COPY to_tab_10 TO '/dev/null' WITH (FORMAT binary) \watch count=15
COPY 10000000
Time: 3168.497 ms (00:03.168)
COPY 10000000
Time: 3255.464 ms (00:03.255)
COPY 10000000
Time: 3270.625 ms (00:03.271)
COPY 10000000
Time: 3285.112 ms (00:03.285)
COPY 10000000
Time: 3322.304 ms (00:03.322)
COPY 10000000
Time: 3341.328 ms (00:03.341)
COPY 10000000
Time: 3621.564 ms (00:03.622)
COPY 10000000
Time: 3700.911 ms (00:03.701)
COPY 10000000
Time: 3717.992 ms (00:03.718)
COPY 10000000
Time: 3708.350 ms (00:03.708)
COPY 10000000
Time: 3704.367 ms (00:03.704)
COPY 10000000
Time: 3724.281 ms (00:03.724)
COPY 10000000
Time: 3703.335 ms (00:03.703)
COPY 10000000
Time: 3728.629 ms (00:03.729)
COPY 10000000
Time: 3758.135 ms (00:03.758)
The first 6 rounds are like 10% better than the later 9 rounds, is this normal?
--
Regards
Junwang Zhao
			
		On Thu, Feb 01, 2024 at 10:57:58AM +0900, Michael Paquier wrote:
> And here are the results I get for text and binary (ms, average of 15
> queries after discarding the three highest and three lowest values):
>       test       | master |  v7  | v10
> -----------------+--------+------+------
>  from_bin_1col   | 1575   | 1546 | 1575
>  from_bin_10col  | 5364   | 5208 | 5230
>  from_text_1col  | 1690   | 1715 | 1684
>  from_text_10col | 4875   | 4793 | 4757
>  to_bin_1col     | 1717   | 1730 | 1731
>  to_bin_10col    | 7728   | 7707 | 7513
>  to_text_1col    | 1710   | 1730 | 1698
>  to_text_10col   | 5998   | 5960 | 5987
Here are some numbers from a second local machine:
      test       | master |  v7  | v10
-----------------+--------+------+------
 from_bin_1col   | 508    | 467  | 461
 from_bin_10col  | 2192   | 2083 | 2098
 from_text_1col  | 510    | 499  | 517
 from_text_10col | 1970   | 1678 | 1654
 to_bin_1col     | 575    | 577  | 573
 to_bin_10col    | 2680   | 2678 | 2722
 to_text_1col    | 516    | 506  | 527
 to_text_10col   | 2250   | 2245 | 2235
This is confirming a speedup with COPY FROM for both text and binary,
with more impact with a larger number of attributes.  That's harder to
conclude about COPY TO in both cases, but at least I'm not seeing any
regression even with some variance caused by what looks like noise.
We need more numbers from more people.  Sutou-san or Sawada-san, or
any volunteers?
--
Michael
			
		Вложения
On Thu, Feb 01, 2024 at 11:43:07AM +0800, Junwang Zhao wrote: > The first 6 rounds are like 10% better than the later 9 rounds, is this normal? Even with HEAD? Perhaps you have some OS cache eviction in play here? FWIW, I'm not seeing any of that with longer runs after 7~ tries in a loop of 15. -- Michael
Вложения
On Thu, Feb 1, 2024 at 11:56 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Thu, Feb 01, 2024 at 11:43:07AM +0800, Junwang Zhao wrote: > > The first 6 rounds are like 10% better than the later 9 rounds, is this normal? > > Even with HEAD? Perhaps you have some OS cache eviction in play here? > FWIW, I'm not seeing any of that with longer runs after 7~ tries in a > loop of 15. Yeah, with HEAD. I'm on ubuntu 22.04, I did not change any gucs, maybe I should set a higher shared_buffers? But I dought that's related ;( > -- > Michael -- Regards Junwang Zhao
Hi,
Thanks for preparing benchmark.
In <ZbsU53b3eEV-mMT3@paquier.xyz>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Thu, 1 Feb 2024 12:49:59 +0900,
  Michael Paquier <michael@paquier.xyz> wrote:
> On Thu, Feb 01, 2024 at 10:57:58AM +0900, Michael Paquier wrote:
>> And here are the results I get for text and binary (ms, average of 15
>> queries after discarding the three highest and three lowest values):
>>       test       | master |  v7  | v10  
>> -----------------+--------+------+------
>>  from_bin_1col   | 1575   | 1546 | 1575
>>  from_bin_10col  | 5364   | 5208 | 5230
>>  from_text_1col  | 1690   | 1715 | 1684
>>  from_text_10col | 4875   | 4793 | 4757
>>  to_bin_1col     | 1717   | 1730 | 1731
>>  to_bin_10col    | 7728   | 7707 | 7513
>>  to_text_1col    | 1710   | 1730 | 1698
>>  to_text_10col   | 5998   | 5960 | 5987
> 
> Here are some numbers from a second local machine:
>       test       | master |  v7  | v10  
> -----------------+--------+------+------
>  from_bin_1col   | 508    | 467  | 461
>  from_bin_10col  | 2192   | 2083 | 2098
>  from_text_1col  | 510    | 499  | 517
>  from_text_10col | 1970   | 1678 | 1654
>  to_bin_1col     | 575    | 577  | 573
>  to_bin_10col    | 2680   | 2678 | 2722
>  to_text_1col    | 516    | 506  | 527
>  to_text_10col   | 2250   | 2245 | 2235
> 
> This is confirming a speedup with COPY FROM for both text and binary,
> with more impact with a larger number of attributes.  That's harder to
> conclude about COPY TO in both cases, but at least I'm not seeing any
> regression even with some variance caused by what looks like noise.
> We need more numbers from more people.  Sutou-san or Sawada-san, or
> any volunteers?
Here are some numbers on my local machine (Note that my
local machine isn't suitable for benchmark as I said
before. Each number is median of "\watch 15" results):
1:
 direction     format  n_columns     master         v7        v10
        to       text          1   1077.254   1016.953   1028.434
        to        csv          1    1079.88   1055.545    1053.95
        to     binary          1   1051.247    1033.93    1003.44
        to       text         10   4373.168   3980.442    3955.94
        to        csv         10   4753.842     4719.2   4677.643
        to     binary         10   4598.374   4431.238   4285.757
      from       text          1    875.729    916.526    869.283
      from        csv          1    909.355   1001.277    918.655
      from     binary          1    872.943    907.778    859.433
      from       text         10   2594.429   2345.292   2587.603
      from        csv         10   2968.972   3039.544   2964.468
      from     binary         10    3072.01   3109.267   3093.983
2:
 direction     format  n_columns     master         v7        v10
        to       text          1   1061.908    988.768    978.291
        to        csv          1   1095.109   1037.015   1041.613
        to     binary          1   1076.992   1000.212    983.318
        to       text         10   4336.517   3901.833   3841.789
        to        csv         10   4679.411   4640.975   4570.774
        to     binary         10    4465.04   4508.063   4261.749
      from       text          1    866.689     917.54    830.417
      from        csv          1    917.973   1695.401    871.991
      from     binary          1    841.104   1422.012    820.786
      from       text         10   2523.607   3147.738   2517.505
      from        csv         10   2917.018   3042.685   2950.338
      from     binary         10   2998.051   3128.542   3018.954
3:
 direction     format  n_columns     master         v7        v10
        to       text          1   1021.168   1031.183    962.945
        to        csv          1   1076.549   1069.661   1060.258
        to     binary          1   1024.611   1022.143    975.768
        to       text         10    4327.24   3936.703   4049.893
        to        csv         10   4620.436   4531.676   4685.672
        to     binary         10   4457.165   4390.992   4301.463
      from       text          1    887.532    907.365    888.892
      from        csv          1    945.167    1012.29    895.921
      from     binary          1     853.06    854.652    849.661
      from       text         10   2660.509   2304.256   2527.071
      from        csv         10   2913.644   2968.204   2935.081
      from     binary         10   3020.812   3081.162   3090.803
I'll measure again on my local machine later. I'll stop
other processes such as Web browser, editor and so on as
much as possible when I do.
Thanks,
-- 
kou
			
		On Fri, Feb 02, 2024 at 12:19:51AM +0900, Sutou Kouhei wrote:
> Here are some numbers on my local machine (Note that my
> local machine isn't suitable for benchmark as I said
> before. Each number is median of "\watch 15" results):
>>
> I'll measure again on my local machine later. I'll stop
> other processes such as Web browser, editor and so on as
> much as possible when I do.
Thanks for compiling some numbers.  This is showing a lot of variance.
Expecially, these two lines in table 2 are showing surprising results
for v7:
  direction     format  n_columns     master         v7        v10
       from        csv          1    917.973   1695.401    871.991
       from     binary          1    841.104   1422.012    820.786
I am going to try to plug in some rusage() calls in the backend for
the COPY paths.  I hope that gives more precision about the backend
activity.  I'll post that with more numbers.
--
Michael
			
		Вложения
Hi,
In <ZbwSRsCqVS638Xjz@paquier.xyz>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 2 Feb 2024 06:51:02 +0900,
  Michael Paquier <michael@paquier.xyz> wrote:
> On Fri, Feb 02, 2024 at 12:19:51AM +0900, Sutou Kouhei wrote:
>> Here are some numbers on my local machine (Note that my
>> local machine isn't suitable for benchmark as I said
>> before. Each number is median of "\watch 15" results):
>>>
>> I'll measure again on my local machine later. I'll stop
>> other processes such as Web browser, editor and so on as
>> much as possible when I do.
> 
> Thanks for compiling some numbers.  This is showing a lot of variance.
> Expecially, these two lines in table 2 are showing surprising results
> for v7:
>   direction     format  n_columns     master         v7        v10
>        from        csv          1    917.973   1695.401    871.991
>        from     binary          1    841.104   1422.012    820.786
Here are more numbers:
1:
 direction     format  n_columns     master         v7        v10
        to       text          1   1053.844    978.998    956.575
        to        csv          1   1091.316   1020.584   1098.314
        to     binary          1   1034.685    969.224    980.458
        to       text         10   4216.264   3886.515   4111.417
        to        csv         10   4649.228   4530.882   4682.988
        to     binary         10   4219.228    4189.99   4211.942
      from       text          1    851.697    896.968    890.458
      from        csv          1    890.229    936.231     887.15
      from     binary          1    784.407     817.07    938.736
      from       text         10   2549.056   2233.899   2630.892
      from        csv         10   2809.441   2868.411   2895.196
      from     binary         10   2985.674   3027.522     3397.5
2:
 direction     format  n_columns     master         v7        v10
        to       text          1   1013.764   1011.968    940.855
        to        csv          1   1060.431   1065.468    1040.68
        to     binary          1   1013.652   1009.956    965.675
        to       text         10   4411.484   4031.571   3896.836
        to        csv         10   4739.625    4715.81   4631.002
        to     binary         10   4374.077   4357.942   4227.215
      from       text          1    955.078    922.346    866.222
      from        csv          1   1040.717    986.524    905.657
      from     binary          1    849.316    864.859    833.152
      from       text         10   2703.209   2361.651   2533.992
      from        csv         10    2990.35   3059.167   2930.632
      from     binary         10   3008.375   3368.714   3055.723
3:
 direction     format  n_columns     master         v7        v10
        to       text          1   1084.756   1003.822    994.409
        to        csv          1     1092.4   1062.536   1079.027
        to     binary          1   1046.774    994.168    993.633
        to       text         10    4363.51   3978.205   4124.359
        to        csv         10   4866.762   4616.001   4715.052
        to     binary         10   4382.412   4363.269   4213.456
      from       text          1    852.976    907.315    860.749
      from        csv          1    925.187    962.632    897.833
      from     binary          1    824.997    897.046    828.231
      from       text         10    2591.07   2358.541   2540.431
      from        csv         10   2907.033   3018.486   2915.997
      from     binary         10   3069.027    3209.21   3119.128
Other processes are stopped while I measure them. But I'm
not sure these numbers are more reliable than before...
> I am going to try to plug in some rusage() calls in the backend for
> the COPY paths.  I hope that gives more precision about the backend
> activity.  I'll post that with more numbers.
Thanks. It'll help us.
-- 
kou
			
		On Fri, Feb 02, 2024 at 06:51:02AM +0900, Michael Paquier wrote:
> I am going to try to plug in some rusage() calls in the backend for
> the COPY paths.  I hope that gives more precision about the backend
> activity.  I'll post that with more numbers.
And here they are with log_statement_stats enabled to get rusage() fot
these queries:
         test         |  user_s  | system_s | elapsed_s
----------------------+----------+----------+-----------
 head_to_bin_1col     | 1.639761 | 0.007998 |  1.647762
 v7_to_bin_1col       | 1.645499 | 0.004003 |  1.649498
 v10_to_bin_1col      | 1.639466 | 0.004008 |  1.643488
 head_to_bin_10col    | 7.486369 | 0.056007 |  7.542485
 v7_to_bin_10col      | 7.314341 | 0.039990 |  7.354743
 v10_to_bin_10col     | 7.329355 | 0.052007 |  7.381408
 head_to_text_1col    | 1.581140 | 0.012000 |  1.593166
 v7_to_text_1col      | 1.615441 | 0.003992 |  1.619446
 v10_to_text_1col     | 1.613443 | 0.000000 |  1.613454
 head_to_text_10col   | 5.897014 | 0.011990 |  5.909063
 v7_to_text_10col     | 5.722872 | 0.016014 |  5.738979
 v10_to_text_10col    | 5.762286 | 0.011993 |  5.774265
 head_from_bin_1col   | 1.524038 | 0.020000 |  1.544046
 v7_from_bin_1col     | 1.551367 | 0.016015 |  1.567408
 v10_from_bin_1col    | 1.560087 | 0.016001 |  1.576115
 head_from_bin_10col  | 5.238444 | 0.139993 |  5.378595
 v7_from_bin_10col    | 5.170503 | 0.076021 |  5.246588
 v10_from_bin_10col   | 5.106496 | 0.112020 |  5.218565
 head_from_text_1col  | 1.664124 | 0.003998 |  1.668172
 v7_from_text_1col    | 1.720616 | 0.007990 |  1.728617
 v10_from_text_1col   | 1.683950 | 0.007990 |  1.692098
 head_from_text_10col | 4.859651 | 0.015996 |  4.875747
 v7_from_text_10col   | 4.775975 | 0.032000 |  4.808051
 v10_from_text_10col  | 4.737512 | 0.028012 |  4.765522
(24 rows)
I'm looking at this table, and what I can see is still a lot of
variance in the tests with tables involving 1 attribute.  However, a
second thing stands out to me here: there is a speedup with the
10-attribute case for all both COPY FROM and COPY TO, and both
formats.  The data posted at [1] is showing me the same trend.  In
short, let's move on with this split refactoring with the per-row
callbacks.  That clearly shows benefits.
[1] https://www.postgresql.org/message-id/Zbr6piWuVHDtFFOl@paquier.xyz
--
Michael
			
		Вложения
On Fri, Feb 02, 2024 at 09:40:56AM +0900, Sutou Kouhei wrote:
> Thanks. It'll help us.
I have done a review of v10, see v11 attached which is still WIP, with
the patches for COPY TO and COPY FROM merged together.  Note that I'm
thinking to merge them into a single commit.
@@ -74,11 +75,11 @@ typedef struct CopyFormatOptions
     bool        convert_selectively;    /* do selective binary conversion? */
     CopyOnErrorChoice on_error; /* what to do when error happened */
     List       *convert_select; /* list of column names (can be NIL) */
+    const        CopyToRoutine *to_routine;    /* callback routines for COPY TO */
 } CopyFormatOptions;
Adding the routines to the structure for the format options is in my
opinion incorrect.  The elements of this structure are first processed
in the option deparsing path, and then we need to use the options to
guess which routines we need.  A more natural location is cstate
itself, so as the pointer to the routines is isolated within copyto.c
and copyfrom_internal.h.  My point is: the routines are an
implementation detail that the centralized copy.c has no need to know
about.  This also led to a strange separation with
ProcessCopyOptionFormatFrom() and ProcessCopyOptionFormatTo() to fit
the hole in-between.
The separation between cstate and the format-related fields could be
much better, though I am not sure if it is worth doing as it
introduces more duplication.  For example, max_fields and raw_fields
are specific to text and csv, while binary does not care much.
Perhaps this is just useful to be for custom formats.
copyapi.h needs more documentation, like what is expected for
extension developers when using these, what are the arguments, etc.  I
have added what I had in mind for now.
+typedef char *(*PostpareColumnValue) (CopyFromState cstate, char *string, int m);
CopyReadAttributes and PostpareColumnValue are also callbacks specific
to text and csv, except that they are used within the per-row
callbacks.  The same can be said about CopyAttributeOutHeaderFunction.
It seems to me that it would be less confusing to store pointers to
them in the routine structures, where the final picture involves not
having multiple layers of APIs like CopyToCSVStart,
CopyAttributeOutTextValue, etc.  These *have* to be documented
properly in copyapi.h, and this is much easier now that cstate stores
the routine pointers.  That would also make simpler function stacks.
Note that I have not changed that in the v11 attached.
This business with the extra callbacks required for csv and text is my
main point of contention, but I'd be OK once the model of the APIs is
more linear, with everything in Copy{From,To}State.  The changes would
be rather simple, and I'd be OK to put my hands on it.  Just,
Sutou-san, would you agree with my last point about these extra
callbacks?
--
Michael
			
		Вложения
On Fri, Feb 2, 2024 at 2:21 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Fri, Feb 02, 2024 at 09:40:56AM +0900, Sutou Kouhei wrote:
> > Thanks. It'll help us.
>
> I have done a review of v10, see v11 attached which is still WIP, with
> the patches for COPY TO and COPY FROM merged together.  Note that I'm
> thinking to merge them into a single commit.
>
> @@ -74,11 +75,11 @@ typedef struct CopyFormatOptions
>      bool        convert_selectively;    /* do selective binary conversion? */
>      CopyOnErrorChoice on_error; /* what to do when error happened */
>      List       *convert_select; /* list of column names (can be NIL) */
> +    const        CopyToRoutine *to_routine;    /* callback routines for COPY TO */
>  } CopyFormatOptions;
>
> Adding the routines to the structure for the format options is in my
> opinion incorrect.  The elements of this structure are first processed
> in the option deparsing path, and then we need to use the options to
> guess which routines we need.  A more natural location is cstate
> itself, so as the pointer to the routines is isolated within copyto.c
I agree CopyToRoutine should be placed into CopyToStateData, but
why set it after ProcessCopyOptions, the implementation of
CopyToGetRoutine doesn't make sense if we want to support custom
format in the future.
Seems the refactor of v11 only considered performance but not
*extendable copy format*.
> and copyfrom_internal.h.  My point is: the routines are an
> implementation detail that the centralized copy.c has no need to know
> about.  This also led to a strange separation with
> ProcessCopyOptionFormatFrom() and ProcessCopyOptionFormatTo() to fit
> the hole in-between.
>
> The separation between cstate and the format-related fields could be
> much better, though I am not sure if it is worth doing as it
> introduces more duplication.  For example, max_fields and raw_fields
> are specific to text and csv, while binary does not care much.
> Perhaps this is just useful to be for custom formats.
I think those can be placed in format specific fields by utilizing the opaque
space, but yeah, this will introduce duplication.
>
> copyapi.h needs more documentation, like what is expected for
> extension developers when using these, what are the arguments, etc.  I
> have added what I had in mind for now.
>
> +typedef char *(*PostpareColumnValue) (CopyFromState cstate, char *string, int m);
>
> CopyReadAttributes and PostpareColumnValue are also callbacks specific
> to text and csv, except that they are used within the per-row
> callbacks.  The same can be said about CopyAttributeOutHeaderFunction.
> It seems to me that it would be less confusing to store pointers to
> them in the routine structures, where the final picture involves not
> having multiple layers of APIs like CopyToCSVStart,
> CopyAttributeOutTextValue, etc.  These *have* to be documented
> properly in copyapi.h, and this is much easier now that cstate stores
> the routine pointers.  That would also make simpler function stacks.
> Note that I have not changed that in the v11 attached.
>
> This business with the extra callbacks required for csv and text is my
> main point of contention, but I'd be OK once the model of the APIs is
> more linear, with everything in Copy{From,To}State.  The changes would
> be rather simple, and I'd be OK to put my hands on it.  Just,
> Sutou-san, would you agree with my last point about these extra
> callbacks?
> --
> Michael
If V7 and V10 have no performance reduction, then I think V6 is also
good with performance, since most of the time goes to CopyToOneRow
and CopyFromOneRow.
I just think we should take the *extendable* into consideration at
the beginning.
--
Regards
Junwang Zhao
			
		Hi, In <ZbyJ60Fd7CHt4m0i@paquier.xyz> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 2 Feb 2024 15:21:31 +0900, Michael Paquier <michael@paquier.xyz> wrote: > I have done a review of v10, see v11 attached which is still WIP, with > the patches for COPY TO and COPY FROM merged together. Note that I'm > thinking to merge them into a single commit. OK. I don't have a strong opinion for commit unit. > @@ -74,11 +75,11 @@ typedef struct CopyFormatOptions > bool convert_selectively; /* do selective binary conversion? */ > CopyOnErrorChoice on_error; /* what to do when error happened */ > List *convert_select; /* list of column names (can be NIL) */ > + const CopyToRoutine *to_routine; /* callback routines for COPY TO */ > } CopyFormatOptions; > > Adding the routines to the structure for the format options is in my > opinion incorrect. The elements of this structure are first processed > in the option deparsing path, and then we need to use the options to > guess which routines we need. This was discussed with Sawada-san a bit before. [1][2] [1] https://www.postgresql.org/message-id/flat/CAD21AoBmNiWwrspuedgAPgbAqsn7e7NoZYF6gNnYBf%2BgXEk9Mg%40mail.gmail.com#bfd19262d261c67058fdb8d64e6a723c [2] https://www.postgresql.org/message-id/flat/20240130.144531.1257430878438173740.kou%40clear-code.com#fc55392d77f400fc74e42686fe7e348a I kept the routines in CopyFormatOptions for custom option processing. But I should have not cared about it in this patch set because this patch set doesn't include custom option processing. So I'm OK that we move the routines to Copy{From,To}StateData. > This also led to a strange separation with > ProcessCopyOptionFormatFrom() and ProcessCopyOptionFormatTo() to fit > the hole in-between. They also for custom option processing. We don't need to care about them in this patch set too. > copyapi.h needs more documentation, like what is expected for > extension developers when using these, what are the arguments, etc. I > have added what I had in mind for now. Thanks! I'm not good at writing documentation in English... > +typedef char *(*PostpareColumnValue) (CopyFromState cstate, char *string, int m); > > CopyReadAttributes and PostpareColumnValue are also callbacks specific > to text and csv, except that they are used within the per-row > callbacks. The same can be said about CopyAttributeOutHeaderFunction. > It seems to me that it would be less confusing to store pointers to > them in the routine structures, where the final picture involves not > having multiple layers of APIs like CopyToCSVStart, > CopyAttributeOutTextValue, etc. These *have* to be documented > properly in copyapi.h, and this is much easier now that cstate stores > the routine pointers. That would also make simpler function stacks. > Note that I have not changed that in the v11 attached. > > This business with the extra callbacks required for csv and text is my > main point of contention, but I'd be OK once the model of the APIs is > more linear, with everything in Copy{From,To}State. The changes would > be rather simple, and I'd be OK to put my hands on it. Just, > Sutou-san, would you agree with my last point about these extra > callbacks? I'm OK with the approach. But how about adding the extra callbacks to Copy{From,To}StateData not Copy{From,To}Routines like CopyToStateData::data_dest_cb and CopyFromStateData::data_source_cb? They are only needed for "text" and "csv". So we don't need to add them to Copy{From,To}Routines to keep required callback minimum. What is the better next action for us? Do you want to complete the WIP v11 patch set by yourself (and commit it)? Or should I take over it? Thanks, -- kou
Hi, In <CAEG8a3LxnBwNRPRwvmimDvOkPvYL8pB1+rhLBnxjeddFt3MeNw@mail.gmail.com> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 2 Feb 2024 15:27:15 +0800, Junwang Zhao <zhjwpku@gmail.com> wrote: > I agree CopyToRoutine should be placed into CopyToStateData, but > why set it after ProcessCopyOptions, the implementation of > CopyToGetRoutine doesn't make sense if we want to support custom > format in the future. > > Seems the refactor of v11 only considered performance but not > *extendable copy format*. Right. We focus on performance for now. And then we will focus on extendability. [1] [1] https://www.postgresql.org/message-id/flat/20240130.171511.2014195814665030502.kou%40clear-code.com#757a48c273f140081656ec8eb69f502b > If V7 and V10 have no performance reduction, then I think V6 is also > good with performance, since most of the time goes to CopyToOneRow > and CopyFromOneRow. Don't worry. I'll re-submit changes in the v6 patch set again after the current patch set that focuses on performance is merged. > I just think we should take the *extendable* into consideration at > the beginning. Introducing Copy{To,From}Routine is also valuable for extendability. We can improve extendability later. Let's focus on only performance for now to introduce Copy{To,From}Routine. Thanks, -- kou
On Fri, Feb 02, 2024 at 04:33:19PM +0900, Sutou Kouhei wrote: > Hi, > > In <ZbyJ60Fd7CHt4m0i@paquier.xyz> > "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 2 Feb 2024 15:21:31 +0900, > Michael Paquier <michael@paquier.xyz> wrote: > > > I have done a review of v10, see v11 attached which is still WIP, with > > the patches for COPY TO and COPY FROM merged together. Note that I'm > > thinking to merge them into a single commit. > > OK. I don't have a strong opinion for commit unit. > > > @@ -74,11 +75,11 @@ typedef struct CopyFormatOptions > > bool convert_selectively; /* do selective binary conversion? */ > > CopyOnErrorChoice on_error; /* what to do when error happened */ > > List *convert_select; /* list of column names (can be NIL) */ > > + const CopyToRoutine *to_routine; /* callback routines for COPY TO */ > > } CopyFormatOptions; > > > > Adding the routines to the structure for the format options is in my > > opinion incorrect. The elements of this structure are first processed > > in the option deparsing path, and then we need to use the options to > > guess which routines we need. > > This was discussed with Sawada-san a bit before. [1][2] > > [1] https://www.postgresql.org/message-id/flat/CAD21AoBmNiWwrspuedgAPgbAqsn7e7NoZYF6gNnYBf%2BgXEk9Mg%40mail.gmail.com#bfd19262d261c67058fdb8d64e6a723c > [2] https://www.postgresql.org/message-id/flat/20240130.144531.1257430878438173740.kou%40clear-code.com#fc55392d77f400fc74e42686fe7e348a > > I kept the routines in CopyFormatOptions for custom option > processing. But I should have not cared about it in this > patch set because this patch set doesn't include custom > option processing. One idea I was considering is whether we should use a special value in the "format" DefElem, say "custom:$my_custom_format" where it would be possible to bypass the formay check when processing options and find the routines after processing all the options. I'm not wedded to that, but attaching the routines to the state data is IMO the correct thing, because this has nothing to do with CopyFormatOptions. > So I'm OK that we move the routines to > Copy{From,To}StateData. Okay. >> copyapi.h needs more documentation, like what is expected for >> extension developers when using these, what are the arguments, etc. I >> have added what I had in mind for now. > > Thanks! I'm not good at writing documentation in English... No worries. > I'm OK with the approach. But how about adding the extra > callbacks to Copy{From,To}StateData not > Copy{From,To}Routines like CopyToStateData::data_dest_cb and > CopyFromStateData::data_source_cb? They are only needed for > "text" and "csv". So we don't need to add them to > Copy{From,To}Routines to keep required callback minimum. And set them in cstate while we are in the Start routine, right? Hmm. Why not.. That would get rid of the multiples layers v11 has, which is my pain point, and we have many fields in cstate that are already used on a per-format basis. > What is the better next action for us? Do you want to > complete the WIP v11 patch set by yourself (and commit it)? > Or should I take over it? I was planning to work on that, but wanted to be sure how you felt about the problem with text and csv first. -- Michael
Вложения
Hi,
In <ZbyiDHIrxRgzYT99@paquier.xyz>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 2 Feb 2024 17:04:28 +0900,
  Michael Paquier <michael@paquier.xyz> wrote:
> One idea I was considering is whether we should use a special value in
> the "format" DefElem, say "custom:$my_custom_format" where it would be
> possible to bypass the formay check when processing options and find
> the routines after processing all the options.  I'm not wedded to
> that, but attaching the routines to the state data is IMO the correct
> thing, because this has nothing to do with CopyFormatOptions.
Thanks for sharing your idea.
Let's discuss how to support custom options after we
complete the current performance changes.
>> I'm OK with the approach. But how about adding the extra
>> callbacks to Copy{From,To}StateData not
>> Copy{From,To}Routines like CopyToStateData::data_dest_cb and
>> CopyFromStateData::data_source_cb? They are only needed for
>> "text" and "csv". So we don't need to add them to
>> Copy{From,To}Routines to keep required callback minimum.
> 
> And set them in cstate while we are in the Start routine, right?
I imagined that it's done around the following part:
@@ -1418,6 +1579,9 @@ BeginCopyFrom(ParseState *pstate,
        /* Extract options from the statement node tree */
        ProcessCopyOptions(pstate, &cstate->opts, true /* is_from */ , options);
 
+       /* Set format routine */
+       cstate->routine = CopyFromGetRoutine(cstate->opts);
+
        /* Process the target relation */
        cstate->rel = rel;
 
Example1:
/* Set format routine */
cstate->routine = CopyFromGetRoutine(cstate->opts);
if (!cstate->opts.binary)
    if (cstate->opts.csv_mode)
        cstate->copy_read_attributes = CopyReadAttributesCSV;
    else
        cstate->copy_read_attributes = CopyReadAttributesText;
Example2:
static void
CopyFromSetRoutine(CopyFromState cstate)
{
    if (cstate->opts.csv_mode)
    {
        cstate->routine = &CopyFromRoutineCSV;
        cstate->copy_read_attributes = CopyReadAttributesCSV;
    }
    else if (cstate.binary)
        cstate->routine = &CopyFromRoutineBinary;
    else
    {
        cstate->routine = &CopyFromRoutineText;
        cstate->copy_read_attributes = CopyReadAttributesText;
    }
}
BeginCopyFrom()
{
    /* Set format routine */
    CopyFromSetRoutine(cstate);
}
But I don't object your original approach. If we have the
extra callbacks in Copy{From,To}Routines, I just don't use
them for my custom format extension.
>> What is the better next action for us? Do you want to
>> complete the WIP v11 patch set by yourself (and commit it)?
>> Or should I take over it?
> 
> I was planning to work on that, but wanted to be sure how you felt
> about the problem with text and csv first.
OK.
My opinion is the above. I have an idea how to implement it
but it's not a strong idea. You can choose whichever you like.
Thanks,
-- 
kou
			
		On Fri, Feb 02, 2024 at 05:46:18PM +0900, Sutou Kouhei wrote:
> Hi,
>
> In <ZbyiDHIrxRgzYT99@paquier.xyz>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 2 Feb 2024 17:04:28 +0900,
>   Michael Paquier <michael@paquier.xyz> wrote:
>
> > One idea I was considering is whether we should use a special value in
> > the "format" DefElem, say "custom:$my_custom_format" where it would be
> > possible to bypass the formay check when processing options and find
> > the routines after processing all the options.  I'm not wedded to
> > that, but attaching the routines to the state data is IMO the correct
> > thing, because this has nothing to do with CopyFormatOptions.
>
> Thanks for sharing your idea.
> Let's discuss how to support custom options after we
> complete the current performance changes.
>
> >> I'm OK with the approach. But how about adding the extra
> >> callbacks to Copy{From,To}StateData not
> >> Copy{From,To}Routines like CopyToStateData::data_dest_cb and
> >> CopyFromStateData::data_source_cb? They are only needed for
> >> "text" and "csv". So we don't need to add them to
> >> Copy{From,To}Routines to keep required callback minimum.
> >
> > And set them in cstate while we are in the Start routine, right?
>
> I imagined that it's done around the following part:
>
> @@ -1418,6 +1579,9 @@ BeginCopyFrom(ParseState *pstate,
>         /* Extract options from the statement node tree */
>         ProcessCopyOptions(pstate, &cstate->opts, true /* is_from */ , options);
>
> +       /* Set format routine */
> +       cstate->routine = CopyFromGetRoutine(cstate->opts);
> +
>         /* Process the target relation */
>         cstate->rel = rel;
>
>
> Example1:
>
> /* Set format routine */
> cstate->routine = CopyFromGetRoutine(cstate->opts);
> if (!cstate->opts.binary)
>     if (cstate->opts.csv_mode)
>         cstate->copy_read_attributes = CopyReadAttributesCSV;
>     else
>         cstate->copy_read_attributes = CopyReadAttributesText;
>
> Example2:
>
> static void
> CopyFromSetRoutine(CopyFromState cstate)
> {
>     if (cstate->opts.csv_mode)
>     {
>         cstate->routine = &CopyFromRoutineCSV;
>         cstate->copy_read_attributes = CopyReadAttributesCSV;
>     }
>     else if (cstate.binary)
>         cstate->routine = &CopyFromRoutineBinary;
>     else
>     {
>         cstate->routine = &CopyFromRoutineText;
>         cstate->copy_read_attributes = CopyReadAttributesText;
>     }
> }
>
> BeginCopyFrom()
> {
>     /* Set format routine */
>     CopyFromSetRoutine(cstate);
> }
>
>
> But I don't object your original approach. If we have the
> extra callbacks in Copy{From,To}Routines, I just don't use
> them for my custom format extension.
>
> >> What is the better next action for us? Do you want to
> >> complete the WIP v11 patch set by yourself (and commit it)?
> >> Or should I take over it?
> >
> > I was planning to work on that, but wanted to be sure how you felt
> > about the problem with text and csv first.
>
> OK.
> My opinion is the above. I have an idea how to implement it
> but it's not a strong idea. You can choose whichever you like.
So, I've looked at all that today, and finished by applying two
patches as of 2889fd23be56 and 95fb5b49024a to get some of the
weirdness with the workhorse routines out of the way.  Both have added
callbacks assigned in their respective cstate data for text and csv.
As this is called within the OneRow routine, I can live with that.  If
there is an opposition to that, we could just attach it within the
routines.  The CopyAttributeOut routines had a strange argument
layout, actually, the flag for the quotes is required as a header uses
no quotes, but there was little point in the "single arg" case, so
I've removed it.
I am attaching a v12 which is close to what I want it to be, with
much more documentation and comments.  There are two things that I've
changed compared to the previous versions though:
1) I have added a callback to set up the input and output functions
rather than attach that in the Start callback.  These routines are now
called once per argument, where we know that the argument is valid.
The callbacks are in charge of filling the FmgrInfos.  There are some
good reasons behind that:
- No need for plugins to think about how to allocate this data.  v11
and other versions were doing things the wrong way by allocating this
stuff in the wrong memory context as we switch to the COPY context
when we are in the Start routines.
- This avoids attisdropped problems, and we have a long history of
bugs regarding that.  I'm ready to bet that custom formats would get
that wrong.
2) I have backpedaled on the postpare callback, which did not bring
much in clarity IMO while being a CSV-only callback.  Note that we
have in copyfromparse.c more paths that are only for CSV but the past
versions of the patch never cared about that.  This makes the text and
CSV implementations much closer to each other, as a result.
I had mixed feelings about CopySendEndOfRow() being split to
CopyToTextSendEndOfRow() to send the line terminations when sending a
CSV/text row, but I'm OK with that at the end.  v12 is mostly about
moving code around at this point, making it kind of straight-forward
to follow as the code blocks are the same.  I'm still planning to do a
few more measurements, just lacked of time.  Let me know if you have
comments about all that.
--
Michael
			
		Вложения
Hi,
In <ZcCKwAeFrlOqPBuN@paquier.xyz>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 5 Feb 2024 16:14:08 +0900,
  Michael Paquier <michael@paquier.xyz> wrote:
> So, I've looked at all that today, and finished by applying two
> patches as of 2889fd23be56 and 95fb5b49024a to get some of the
> weirdness with the workhorse routines out of the way.
Thanks!
> As this is called within the OneRow routine, I can live with that.  If
> there is an opposition to that, we could just attach it within the
> routines.
I don't object the approach.
> I am attaching a v12 which is close to what I want it to be, with
> much more documentation and comments.  There are two things that I've
> changed compared to the previous versions though:
> 1) I have added a callback to set up the input and output functions
> rather than attach that in the Start callback.
I'm OK with this. I just don't use them in Apache Arrow COPY
FORMAT extension.
> - No need for plugins to think about how to allocate this data.  v11
> and other versions were doing things the wrong way by allocating this
> stuff in the wrong memory context as we switch to the COPY context
> when we are in the Start routines.
Oh, sorry. I missed it when I moved them.
> 2) I have backpedaled on the postpare callback, which did not bring
> much in clarity IMO while being a CSV-only callback.  Note that we
> have in copyfromparse.c more paths that are only for CSV but the past
> versions of the patch never cared about that.  This makes the text and
> CSV implementations much closer to each other, as a result.
Ah, sorry. I forgot to eliminate cstate->opts.csv_mode in
CopyReadLineText(). The postpare callback is for
optimization. If it doesn't improve performance, we don't
need to introduce it.
We may want to try eliminating cstate->opts.csv_mode in
CopyReadLineText() for performance. But we don't need to
do this in introducing CopyFromRoutine. We can defer it.
So I don't object removing the postpare callback.
>                                              Let me know if you have
> comments about all that.
Here are some comments for the patch:
+    /*
+     * Called when COPY FROM is started to set up the input functions
+     * associated to the relation's attributes writing to.  `fmgr_info` can be
fmgr_info ->
finfo
+     * optionally filled to provide the catalog information of the input
+     * function.  `typioparam` can be optinally filled to define the OID of
optinally ->
optionally
+     * the type to pass to the input function.  `atttypid` is the OID of data
+     * type used by the relation's attribute.
+     */
+    void        (*CopyFromInFunc) (Oid atttypid, FmgrInfo *finfo,
+                                   Oid *typioparam);
How about passing CopyFromState cstate too like other
callbacks for consistency?
+    /*
+     * Copy one row to a set of `values` and `nulls` of size tupDesc->natts.
+     *
+     * 'econtext' is used to evaluate default expression for each column that
+     * is either not read from the file or is using the DEFAULT option of COPY
or is ->
or
(I'm not sure...)
+     * FROM.  It is NULL if no default values are used.
+     *
+     * Returns false if there are no more tuples to copy.
+     */
+    bool        (*CopyFromOneRow) (CopyFromState cstate, ExprContext *econtext,
+                                   Datum *values, bool *nulls);
+typedef struct CopyToRoutine
+{
+    /*
+     * Called when COPY TO is started to set up the output functions
+     * associated to the relation's attributes reading from.  `fmgr_info` can
fmgr_info ->
finfo
+     * be optionally filled. `atttypid` is the OID of data type used by the
+     * relation's attribute.
+     */
+    void        (*CopyToOutFunc) (Oid atttypid, FmgrInfo *finfo);
How about passing CopyToState cstate too like other
callbacks for consistency?
@@ -200,4 +204,10 @@ extern void ReceiveCopyBinaryHeader(CopyFromState cstate);
 extern int    CopyReadAttributesCSV(CopyFromState cstate);
 extern int    CopyReadAttributesText(CopyFromState cstate);
 
+/* Callbacks for CopyFromRoutine->OneRow */
CopyFromRoutine->OneRow ->
CopyFromRoutine->CopyFromOneRow
+extern bool CopyFromTextOneRow(CopyFromState cstate, ExprContext *econtext,
+                               Datum *values, bool *nulls);
+extern bool CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext,
+                                 Datum *values, bool *nulls);
+
 #endif                            /* COPYFROM_INTERNAL_H */
+/*
+ * CopyFromTextStart
CopyFromTextStart ->
CopyFromBinaryStart
+ *
+ * Start of COPY FROM for binary format.
+ */
+static void
+CopyFromBinaryStart(CopyFromState cstate, TupleDesc tupDesc)
+{
+    /* Read and verify binary header */
+    ReceiveCopyBinaryHeader(cstate);
+}
+
+/*
+ * CopyFromTextEnd
CopyFromTextEnd ->
CopyFromBinaryEnd
+ *
+ * End of COPY FROM for binary format.
+ */
+static void
+CopyFromBinaryEnd(CopyFromState cstate)
+{
+    /* nothing to do */
+}
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 91433d439b..d02a7773e3 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -473,6 +473,7 @@ ConvertRowtypeExpr
 CookedConstraint
 CopyDest
 CopyFormatOptions
+CopyFromRoutine
 CopyFromState
 CopyFromStateData
 CopyHeaderChoice
@@ -482,6 +483,7 @@ CopyMultiInsertInfo
 CopyOnErrorChoice
 CopySource
 CopyStmt
+CopyToRoutine
 CopyToState
 CopyToStateData
 Cost
Wow! I didn't know that we need to update typedefs.list when
I add a "typedef struct".
Thanks,
-- 
kou
			
		Hi, Have you benchmarked the performance effects of 2889fd23be5 ? I'd not at all be surprised if it lead to a measurable performance regression. I think callbacks for individual attributes is the wrong approach - the dispatch needs to happen at a higher level, otherwise there are too many indirect function calls. Greetings, Andres Freund
On Mon, Feb 05, 2024 at 06:05:15PM +0900, Sutou Kouhei wrote: > In <ZcCKwAeFrlOqPBuN@paquier.xyz> > "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 5 Feb 2024 16:14:08 +0900, > Michael Paquier <michael@paquier.xyz> wrote: >> 2) I have backpedaled on the postpare callback, which did not bring >> much in clarity IMO while being a CSV-only callback. Note that we >> have in copyfromparse.c more paths that are only for CSV but the past >> versions of the patch never cared about that. This makes the text and >> CSV implementations much closer to each other, as a result. > > Ah, sorry. I forgot to eliminate cstate->opts.csv_mode in > CopyReadLineText(). The postpare callback is for > optimization. If it doesn't improve performance, we don't > need to introduce it. No worries. > We may want to try eliminating cstate->opts.csv_mode in > CopyReadLineText() for performance. But we don't need to > do this in introducing CopyFromRoutine. We can defer it. > > So I don't object removing the postpare callback. Rather related, but there has been a comment from Andres about this kind of splits a few hours ago, so perhaps this is for the best: https://www.postgresql.org/message-id/20240205182118.h5rkbnjgujwzuxip%40awork3.anarazel.de I'll reply to this one in a bit. >> Let me know if you have >> comments about all that. > > Here are some comments for the patch: Thanks. My head was spinning after reading the diffs more than 20 times :) > fmgr_info -> > finfo > optinally -> > optionally > CopyFromRoutine->OneRow -> > CopyFromRoutine->CopyFromOneRow > CopyFromTextStart -> > CopyFromBinaryStart > CopyFromTextEnd -> > CopyFromBinaryEnd Fixed all these. > How about passing CopyFromState cstate too like other > callbacks for consistency? Yes, I was wondering a bit if this can be useful for the custom formats. > + /* > + * Copy one row to a set of `values` and `nulls` of size tupDesc->natts. > + * > + * 'econtext' is used to evaluate default expression for each column that > + * is either not read from the file or is using the DEFAULT option of COPY > > or is -> > or "or is" is correct here IMO. > Wow! I didn't know that we need to update typedefs.list when > I add a "typedef struct". That's for the automated indentation. This is a habit I have when it comes to work on shaping up patches to avoid weird diffs with pgindent and new structure names. It's OK to forget about it :) Attaching a v13 for now. -- Michael
Вложения
On Mon, Feb 05, 2024 at 10:21:18AM -0800, Andres Freund wrote:
> Have you benchmarked the performance effects of 2889fd23be5 ? I'd not at all
> be surprised if it lead to a measurable performance regression.
Yes, I was looking at runtimes and some profiles around CopyOneRowTo()
to see the effects that this has yesterday.  The principal point of
contention is CopyOneRowTo() where the callback is called once per
attribute, so more attributes stress it more.  The method I've used is
described in [1], where I've used up to 50 int attributes (fixed value
size to limit appendBinaryStringInfo) with 5 million rows, with
shared_buffers large enough that all the data fits in it, while
prewarming the whole.  Postgres runs on a tmpfs, and COPY TO is
redirected to /dev/null.
For reference, I still have some reports lying around (-g attached to
the backend process running the COPY TO queries with text format), so
here you go:
* At 95fb5b49024a:
-   83.04%    11.46%  postgres  postgres            [.] CopyOneRowTo
    - 71.58% CopyOneRowTo
       - 30.37% OutputFunctionCall
          + 27.77% int4out
       + 13.18% CopyAttributeOutText
       + 10.19% appendBinaryStringInfo
         3.76% 0xffffa7096234
         2.78% 0xffffa7096214
       + 2.49% CopySendEndOfRow
         1.21% int4out
         0.83% memcpy@plt
         0.76% 0xffffa7094ba8
         0.75% 0xffffa7094ba4
         0.69% pgstat_progress_update_param
         0.57% enlargeStringInfo
         0.52% 0xffffa7096204
         0.52% 0xffffa7094b8c
    + 11.46% _start
* At 2889fd23be56:
-   83.53%    14.24%  postgres  postgres            [.] CopyOneRowTo
    - 69.29% CopyOneRowTo
       - 29.89% OutputFunctionCall
          + 27.43% int4out
       - 12.89% CopyAttributeOutText
            pg_server_to_any
       + 9.31% appendBinaryStringInfo
         3.68% 0xffffa6940234
       + 2.74% CopySendEndOfRow
         2.43% 0xffffa6940214
         1.36% int4out
         0.74% 0xffffa693eba8
         0.73% pgstat_progress_update_param
         0.65% memcpy@plt
         0.53% MemoryContextReset
    + 14.24% _start
If you have concerns about that, I'm OK to revert, I'm not wedded to
this level of control.  Note that I've actually seen *better*
runtimes.
[1]: https://www.postgresql.org/message-id/Zbr6piWuVHDtFFOl@paquier.xyz
> I think callbacks for individual attributes is the wrong approach - the
> dispatch needs to happen at a higher level, otherwise there are too many
> indirect function calls.
Hmm.  Do you have concerns about v13 posted on [2] then?  If yes, then
I'd assume that this shuts down the whole thread or that it needs a
completely different approach, because we will multiply indirect
function calls that can control how data is generated for each row,
which is the original case that Sutou-san wanted to tackle.  There
could be many indirect calls with custom callbacks that control how
things should be processed at row-level, and COPY likes doing work
with loads of data.  The End, Start and In/OutFunc callbacks are
called only once per query, so these don't matter AFAIU.
[2]: https://www.postgresql.org/message-id/ZcFz59nJjQNjwgX0@paquier.xyz
--
Michael
			
		Вложения
Hi, On 2024-02-06 10:01:36 +0900, Michael Paquier wrote: > On Mon, Feb 05, 2024 at 10:21:18AM -0800, Andres Freund wrote: > > Have you benchmarked the performance effects of 2889fd23be5 ? I'd not at all > > be surprised if it lead to a measurable performance regression. > > Yes, I was looking at runtimes and some profiles around CopyOneRowTo() > to see the effects that this has yesterday. The principal point of > contention is CopyOneRowTo() where the callback is called once per > attribute, so more attributes stress it more. Right. > If you have concerns about that, I'm OK to revert, I'm not wedded to > this level of control. Note that I've actually seen *better* > runtimes. I'm somewhat worried that handling the different formats at that level will make it harder to improve copy performance - it's quite attrociously slow right now. The more we reduce the per-row/field overhead, the more the dispatch overhead will matter. > [1]: https://www.postgresql.org/message-id/Zbr6piWuVHDtFFOl@paquier.xyz > > > I think callbacks for individual attributes is the wrong approach - the > > dispatch needs to happen at a higher level, otherwise there are too many > > indirect function calls. > > Hmm. Do you have concerns about v13 posted on [2] then? As is I'm indeed not a fan. It imo doesn't make sense to have an indirect dispatch for *both* ->copy_attribute_out *and* ->CopyToOneRow. After all, when in ->CopyToOneRow for text, we could know that we need to call CopyAttributeOutText etc. > If yes, then I'd assume that this shuts down the whole thread or that it > needs a completely different approach, because we will multiply indirect > function calls that can control how data is generated for each row, which is > the original case that Sutou-san wanted to tackle. I think it could be rescued fairly easily - remove the dispatch via ->copy_attribute_out(). To avoid duplicating code you could use a static inline function that's used with constant arguments by both csv and text mode. I think it might also be worth ensuring that future patches can move branches like if (cstate->encoding_embeds_ascii) if (cstate->need_transcoding) into the choice of per-row callback. > The End, Start and In/OutFunc callbacks are called only once per query, so > these don't matter AFAIU. Right. Greetings, Andres Freund
On Mon, Feb 05, 2024 at 05:41:25PM -0800, Andres Freund wrote: > On 2024-02-06 10:01:36 +0900, Michael Paquier wrote: >> If you have concerns about that, I'm OK to revert, I'm not wedded to >> this level of control. Note that I've actually seen *better* >> runtimes. > > I'm somewhat worried that handling the different formats at that level will > make it harder to improve copy performance - it's quite attrociously slow > right now. The more we reduce the per-row/field overhead, the more the > dispatch overhead will matter. Yep. That's the hard part when it comes to design these callbacks. We don't want something too high level because this leads to more code duplication churns when someone wants to plug in its own routine set, and we don't want to be at a too low level because of the indirect calls as you said. I'd like to think that the current CopyFromOneRow offers a good balance here, avoiding the "if" branch with the binary and non-binary paths. >> Hmm. Do you have concerns about v13 posted on [2] then? > > As is I'm indeed not a fan. It imo doesn't make sense to have an indirect > dispatch for *both* ->copy_attribute_out *and* ->CopyToOneRow. After all, when > in ->CopyToOneRow for text, we could know that we need to call > CopyAttributeOutText etc. Right. >> If yes, then I'd assume that this shuts down the whole thread or that it >> needs a completely different approach, because we will multiply indirect >> function calls that can control how data is generated for each row, which is >> the original case that Sutou-san wanted to tackle. > > I think it could be rescued fairly easily - remove the dispatch via > ->copy_attribute_out(). To avoid duplicating code you could use a static > inline function that's used with constant arguments by both csv and text mode. Hmm. So you basically mean to tweak the beginning of CopyToTextOneRow() and CopyToTextStart() so as copy_attribute_out is saved in a local variable outside of cstate and we'd save the "if" checked for each attribute. If I got that right, it would mean something like the v13-0002 attached, on top of the v13-0001 of upthread. Is that what you meant? > I think it might also be worth ensuring that future patches can move branches > like > if (cstate->encoding_embeds_ascii) > if (cstate->need_transcoding) > into the choice of per-row callback. Yeah, I'm still not sure how much we should split CopyToStateData in the initial patch set. I'd like to think that the best result would be to have in the state data an opaque (void *) that points to a structure that can be set for each format, so as there is a clean split between which variable gets set and used where (same remark applies to COPY FROM with its raw_fields, raw_fields, for example). -- Michael
Вложения
Hi, On 2024-02-06 11:41:06 +0900, Michael Paquier wrote: > On Mon, Feb 05, 2024 at 05:41:25PM -0800, Andres Freund wrote: > > On 2024-02-06 10:01:36 +0900, Michael Paquier wrote: > >> If you have concerns about that, I'm OK to revert, I'm not wedded to > >> this level of control. Note that I've actually seen *better* > >> runtimes. > > > > I'm somewhat worried that handling the different formats at that level will > > make it harder to improve copy performance - it's quite attrociously slow > > right now. The more we reduce the per-row/field overhead, the more the > > dispatch overhead will matter. > > Yep. That's the hard part when it comes to design these callbacks. > We don't want something too high level because this leads to more code > duplication churns when someone wants to plug in its own routine set, > and we don't want to be at a too low level because of the indirect > calls as you said. I'd like to think that the current CopyFromOneRow > offers a good balance here, avoiding the "if" branch with the binary > and non-binary paths. One way to address code duplication is to use static inline helper functions that do a lot of the work in a generic fashion, but where the compiler can optimize the branches away, because it can do constant folding. > >> If yes, then I'd assume that this shuts down the whole thread or that it > >> needs a completely different approach, because we will multiply indirect > >> function calls that can control how data is generated for each row, which is > >> the original case that Sutou-san wanted to tackle. > > > > I think it could be rescued fairly easily - remove the dispatch via > > ->copy_attribute_out(). To avoid duplicating code you could use a static > > inline function that's used with constant arguments by both csv and text mode. > > Hmm. So you basically mean to tweak the beginning of > CopyToTextOneRow() and CopyToTextStart() so as copy_attribute_out is > saved in a local variable outside of cstate and we'd save the "if" > checked for each attribute. If I got that right, it would mean > something like the v13-0002 attached, on top of the v13-0001 of > upthread. Is that what you meant? No - what I mean is that it doesn't make sense to have copy_attribute_out(), as e.g. CopyToTextOneRow() already knows that it's dealing with text, so it can directly call the right function. That does require splitting a bit more between csv and text output, but I think that can be done without much duplication. Greetings, Andres Freund
On Mon, Feb 05, 2024 at 09:46:42PM -0800, Andres Freund wrote: > No - what I mean is that it doesn't make sense to have copy_attribute_out(), > as e.g. CopyToTextOneRow() already knows that it's dealing with text, so it > can directly call the right function. That does require splitting a bit more > between csv and text output, but I think that can be done without much > duplication. I am not sure to understand here. In what is that different from reverting 2889fd23be56 then mark CopyAttributeOutCSV and CopyAttributeOutText as static inline? Or you mean to merge CopyAttributeOutText and CopyAttributeOutCSV together into a single inlined function, reducing a bit code readability? Both routines have their own roadmap for encoding_embeds_ascii with quoting and escaping, so keeping them separated looks kinda cleaner here. -- Michael
Вложения
Hi,
On 2024-02-06 15:11:05 +0900, Michael Paquier wrote:
> On Mon, Feb 05, 2024 at 09:46:42PM -0800, Andres Freund wrote:
> > No - what I mean is that it doesn't make sense to have copy_attribute_out(),
> > as e.g. CopyToTextOneRow() already knows that it's dealing with text, so it
> > can directly call the right function. That does require splitting a bit more
> > between csv and text output, but I think that can be done without much
> > duplication.
> 
> I am not sure to understand here.  In what is that different from
> reverting 2889fd23be56 then mark CopyAttributeOutCSV and
> CopyAttributeOutText as static inline?
Well, you can't just do that, because there's only one caller, namely
CopyToTextOneRow(). What I am trying to suggest is something like the
attached, just a quick hacky POC. Namely to split out CSV support from
CopyToTextOneRow() by introducing CopyToCSVOneRow(), and to avoid code
duplication by moving the code into a new CopyToTextLikeOneRow().
I named it CopyToTextLike* here, because it seems confusing that some Text*
are used for both CSV and text and others are actually just for text. But if
were to go for that, we should go further.
To test the performnce effects I chose to remove the pointless encoding
"check" we're discussing in the other thread, as it makes it harder to see the
time differences due to the per-attribute code.  I did three runs of pgbench
-t of [1] and chose the fastest result for each.
With turbo mode and power saving disabled:
                          Avg Time
HEAD                       995.349
Remove Encoding Check      870.793
v13-0001                   869.678
Remove out callback        839.508
Greetings,
Andres Freund
[1] COPY (SELECT
1::int2,2::int2,3::int2,4::int2,5::int2,6::int2,7::int2,8::int2,9::int2,10::int2,11::int2,12::int2,13::int2,14::int2,15::int2,16::int2,17::int2,18::int2,19::int2,20::int2,
generate_series(1,1000000::int4)) TO '/dev/null'; 
			
		Вложения
On Tue, Feb 06, 2024 at 03:33:36PM -0800, Andres Freund wrote: > Well, you can't just do that, because there's only one caller, namely > CopyToTextOneRow(). What I am trying to suggest is something like the > attached, just a quick hacky POC. Namely to split out CSV support from > CopyToTextOneRow() by introducing CopyToCSVOneRow(), and to avoid code > duplication by moving the code into a new CopyToTextLikeOneRow(). Ah, OK. Got it now. > I named it CopyToTextLike* here, because it seems confusing that some Text* > are used for both CSV and text and others are actually just for text. But if > were to go for that, we should go further. This can always be argued later. > To test the performnce effects I chose to remove the pointless encoding > "check" we're discussing in the other thread, as it makes it harder to see the > time differences due to the per-attribute code. I did three runs of pgbench > -t of [1] and chose the fastest result for each. > > With turbo mode and power saving disabled: > Avg Time > HEAD 995.349 > Remove Encoding Check 870.793 > v13-0001 869.678 > Remove out callback 839.508 Hmm. That explains why I was not seeing any differences with this callback then. It seems to me that the order of actions to take is clear, like: - Revert 2889fd23be56 to keep a clean state of the tree, now done with 1aa8324b81fa. - Dive into the strlen() issue, as it really looks like this can create more simplifications for the patch discussed on this thread with COPY TO. - Revisit what we have here, looking at more profiles to see how HEAD an v13 compare. It looks like we are on a good path, but let's tackle things one step at a time. -- Michael
Вложения
On Thu, Feb 01, 2024 at 10:57:58AM +0900, Michael Paquier wrote: > CREATE EXTENSION blackhole_am; One thing I have forgotten here is to provide a copy of this AM for future references, so here you go with a blackhole_am.tar.gz attached. -- Michael
Вложения
Hi, In <ZcMIDgkdSrz5ibvf@paquier.xyz> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Wed, 7 Feb 2024 13:33:18 +0900, Michael Paquier <michael@paquier.xyz> wrote: > Hmm. That explains why I was not seeing any differences with this > callback then. It seems to me that the order of actions to take is > clear, like: > - Revert 2889fd23be56 to keep a clean state of the tree, now done with > 1aa8324b81fa. Done. > - Dive into the strlen() issue, as it really looks like this can > create more simplifications for the patch discussed on this thread > with COPY TO. Done: b619852086ed2b5df76631f5678f60d3bebd3745 > - Revisit what we have here, looking at more profiles to see how HEAD > an v13 compare. It looks like we are on a good path, but let's tackle > things one step at a time. Are you already working on this? Do you want me to write the next patch based on the current master? Thanks, -- kou
On Wed, Feb 07, 2024 at 01:33:18PM +0900, Michael Paquier wrote:
> Hmm.  That explains why I was not seeing any differences with this
> callback then.  It seems to me that the order of actions to take is
> clear, like:
> - Revert 2889fd23be56 to keep a clean state of the tree, now done with
> 1aa8324b81fa.
> - Dive into the strlen() issue, as it really looks like this can
> create more simplifications for the patch discussed on this thread
> with COPY TO.
This has been done this morning with b619852086ed.
> - Revisit what we have here, looking at more profiles to see how HEAD
> an v13 compare.  It looks like we are on a good path, but let's tackle
> things one step at a time.
And attached is a v14 that's rebased on HEAD.  While on it, I've
looked at more profiles and did more runtime checks.
Some runtimes, in (ms), average of 15 runs, 30 int attributes on 5M
rows as mentioned above:
COPY FROM  text   binary
HEAD       6066   7110
v14        6087   7105
COPY TO    text   binary
HEAD       6591   10161
v14        6508   10189
And here are some profiles, where I'm not seeing an impact at
row-level with the addition of the callbacks:
COPY FROM, text, master:
-   66.59%    16.10%  postgres  postgres            [.] NextCopyFrom
                                                  ▒    - 50.50% NextCopyFrom 
       - 30.75% NextCopyFromRawFields
          + 15.93% CopyReadLine
            13.73% CopyReadAttributesText
       - 19.43% InputFunctionCallSafe
          + 13.49% int4in
            0.77% pg_strtoint32_safe
    + 16.10% _start
COPY FROM, text, v14:
-   66.42%     0.74%  postgres  postgres            [.] NextCopyFrom
    - 65.67% NextCopyFrom
       - 65.51% CopyFromTextOneRow
          - 30.25% NextCopyFromRawFields
             + 16.14% CopyReadLine
               13.40% CopyReadAttributesText
          - 18.96% InputFunctionCallSafe
             + 13.15% int4in
               0.70% pg_strtoint32_safe
    + 0.74% _start
COPY TO, binary, master
-   90.32%     7.14%  postgres  postgres            [.] CopyOneRowTo
    - 83.18% CopyOneRowTo
       + 60.30% SendFunctionCall
       + 10.99% appendBinaryStringInfo
       + 3.67% MemoryContextReset
       + 2.89% CopySendEndOfRow
         0.89% memcpy@plt
         0.66% 0xffffa052db5c
         0.62% enlargeStringInfo
         0.56% pgstat_progress_update_param
    + 7.14% _start
COPY TO, binary, v14
-   90.96%     0.21%  postgres  postgres            [.] CopyOneRowTo
    - 90.75% CopyOneRowTo
       - 81.86% CopyToBinaryOneRow
          + 59.17% SendFunctionCall
          + 10.56% appendBinaryStringInfo
            1.10% enlargeStringInfo
            0.59% int4send
            0.57% memcpy@plt
       + 3.68% MemoryContextReset
       + 2.83% CopySendEndOfRow
         1.13% appendBinaryStringInfo
         0.58% SendFunctionCall
         0.58% pgstat_progress_update_param
Are there any comments about this v14?  Sutou-san?
A next step I think we could take is to split the binary-only and the
text/csv-only data in each cstate into their own structure to make the
structure, with an opaque pointer that custom formats could use, but a
lot of fields are shared as well.  This patch is already complicated
enough IMO, so I'm OK to leave it out for the moment, and focus on
making this infra pluggable as a next step.
--
Michael
			
		Вложения
On Fri, Feb 09, 2024 at 01:19:50PM +0900, Sutou Kouhei wrote: > Are you already working on this? Do you want me to write the > next patch based on the current master? No need for a new patch, thanks. I've spent some time today doing a rebase and measuring the whole, without seeing a degradation with what should be the worst cases for COPY TO and FROM: https://www.postgresql.org/message-id/ZcWoTr1N0GELFA9E%40paquier.xyz -- Michael
Вложения
Hi,
In <ZcWoTr1N0GELFA9E@paquier.xyz>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 9 Feb 2024 13:21:34 +0900,
  Michael Paquier <michael@paquier.xyz> wrote:
>> - Revisit what we have here, looking at more profiles to see how HEAD
>> an v13 compare.  It looks like we are on a good path, but let's tackle
>> things one step at a time.
> 
> And attached is a v14 that's rebased on HEAD.
Thanks!
> A next step I think we could take is to split the binary-only and the
> text/csv-only data in each cstate into their own structure to make the
> structure, with an opaque pointer that custom formats could use, but a
> lot of fields are shared as well.
It'll make COPY code base cleaner but it may decrease
performance. How about just adding an opaque pointer to each
cstate as the next step and then try the split?
My suggestion:
1. Introduce Copy{To,From}Routine
   (We can do it based on the v14 patch.)
2. Add an opaque pointer to Copy{To,From}Routine
   (This must not have performance impact.)
3.a. Split format specific data to the opaque space
3.b. Add support for registering custom format handler by
     creating a function
4. ...
>                                    This patch is already complicated
> enough IMO, so I'm OK to leave it out for the moment, and focus on
> making this infra pluggable as a next step.
I agree with you.
> Are there any comments about this v14?  Sutou-san?
Here are my comments:
+    /* Set read attribute callback */
+    if (cstate->opts.csv_mode)
+        cstate->copy_read_attributes = CopyReadAttributesCSV;
+    else
+        cstate->copy_read_attributes = CopyReadAttributesText;
I think that we should not use this approach for
performance. We need to use "static inline" and constant
argument instead something like the attached
remove-copy-read-attributes.diff.
We have similar codes for
CopyReadLine()/CopyReadLineText(). The attached
remove-copy-read-attributes-and-optimize-copy-read-line.diff
also applies the same optimization to
CopyReadLine()/CopyReadLineText().
I hope that this improved performance of COPY FROM.
+/*
+ * Routines assigned to each format.
++
Garbage "+"
+ * CSV and text share the same implementation, at the exception of the
+ * copy_read_attributes callback.
+ */
+/*
+ * CopyToTextOneRow
+ *
+ * Process one row for text/CSV format.
+ */
+static void
+CopyToTextOneRow(CopyToState cstate,
+                 TupleTableSlot *slot)
+{
...
+            if (cstate->opts.csv_mode)
+                CopyAttributeOutCSV(cstate, string,
+                                    cstate->opts.force_quote_flags[attnum - 1]);
+            else
+                CopyAttributeOutText(cstate, string);
...
How about use "static inline" and constant argument approach
here too?
static inline void
CopyToTextBasedOneRow(CopyToState cstate,
                      TupleTableSlot *slot,
                      bool csv_mode)
{
...
            if (cstate->opts.csv_mode)
                CopyAttributeOutCSV(cstate, string,
                                    cstate->opts.force_quote_flags[attnum - 1]);
            else
                CopyAttributeOutText(cstate, string);
...
}
static void
CopyToTextOneRow(CopyToState cstate,
                 TupleTableSlot *slot,
                 bool csv_mode)
{
    CopyToTextBasedOneRow(cstate, slot, false);
}
static void
CopyToCSVOneRow(CopyToState cstate,
                TupleTableSlot *slot,
                bool csv_mode)
{
    CopyToTextBasedOneRow(cstate, slot, true);
}
static const CopyToRoutine CopyCSVRoutineText = {
    ...
    .CopyToOneRow = CopyToCSVOneRow,
    ...
};
Thanks,
-- 
kou
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index a90b7189b5..6e244fb443 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -158,12 +158,6 @@ CopyFromTextStart(CopyFromState cstate, TupleDesc tupDesc)
     attr_count = list_length(cstate->attnumlist);
     cstate->max_fields = attr_count;
     cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *));
-
-    /* Set read attribute callback */
-    if (cstate->opts.csv_mode)
-        cstate->copy_read_attributes = CopyReadAttributesCSV;
-    else
-        cstate->copy_read_attributes = CopyReadAttributesText;
 }
 
 /*
@@ -221,9 +215,8 @@ CopyFromBinaryEnd(CopyFromState cstate)
 
 /*
  * Routines assigned to each format.
-+
  * CSV and text share the same implementation, at the exception of the
- * copy_read_attributes callback.
+ * CopyFromOneRow callback.
  */
 static const CopyFromRoutine CopyFromRoutineText = {
     .CopyFromInFunc = CopyFromTextInFunc,
@@ -235,7 +228,7 @@ static const CopyFromRoutine CopyFromRoutineText = {
 static const CopyFromRoutine CopyFromRoutineCSV = {
     .CopyFromInFunc = CopyFromTextInFunc,
     .CopyFromStart = CopyFromTextStart,
-    .CopyFromOneRow = CopyFromTextOneRow,
+    .CopyFromOneRow = CopyFromCSVOneRow,
     .CopyFromEnd = CopyFromTextEnd,
 };
 
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index c45f9ae134..1f8b2ddc6e 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -25,10 +25,10 @@
  *    is copied into 'line_buf', with quotes and escape characters still
  *    intact.
  *
- * 4. CopyReadAttributesText/CSV() function (via copy_read_attribute) takes
- *    the input line from 'line_buf', and splits it into fields, unescaping
- *    the data as required.  The fields are stored in 'attribute_buf', and
- *    'raw_fields' array holds pointers to each field.
+ * 4. CopyReadAttributesText/CSV() function takes the input line from
+ *    'line_buf', and splits it into fields, unescaping the data as required.
+ *    The fields are stored in 'attribute_buf', and 'raw_fields' array holds
+ *    pointers to each field.
  *
  * If encoding conversion is not required, a shortcut is taken in step 2 to
  * avoid copying the data unnecessarily.  The 'input_buf' pointer is set to
@@ -152,6 +152,8 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 /* non-export function prototypes */
 static bool CopyReadLine(CopyFromState cstate);
 static bool CopyReadLineText(CopyFromState cstate);
+static int    CopyReadAttributesText(CopyFromState cstate);
+static int    CopyReadAttributesCSV(CopyFromState cstate);
 static Datum CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
                                      Oid typioparam, int32 typmod,
                                      bool *isnull);
@@ -748,9 +750,14 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes)
  * in the relation.
  *
  * NOTE: force_not_null option are not applied to the returned fields.
+ *
+ * Creating static inline NextCopyFromRawFieldsInternal() and call this with
+ * constant 'csv_mode' value from CopyFromTextOneRow()/CopyFromCSVOneRow()
+ * (via CopyFromTextBasedOneRow()) is for optimization. We can avoid indirect
+ * function call by this.
  */
-bool
-NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
+static inline bool
+NextCopyFromRawFieldsInternal(CopyFromState cstate, char ***fields, int *nfields, bool csv_mode)
 {
     int            fldct;
     bool        done;
@@ -773,7 +780,10 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
         {
             int            fldnum;
 
-            fldct = cstate->copy_read_attributes(cstate);
+            if (csv_mode)
+                fldct = CopyReadAttributesCSV(cstate);
+            else
+                fldct = CopyReadAttributesText(cstate);
 
             if (fldct != list_length(cstate->attnumlist))
                 ereport(ERROR,
@@ -825,7 +835,10 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
         return false;
 
     /* Parse the line into de-escaped field values */
-    fldct = cstate->copy_read_attributes(cstate);
+    if (csv_mode)
+        fldct = CopyReadAttributesCSV(cstate);
+    else
+        fldct = CopyReadAttributesText(cstate);
 
     *fields = cstate->raw_fields;
     *nfields = fldct;
@@ -833,16 +846,26 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
 }
 
 /*
- * CopyFromTextOneRow
+ * See NextCopyFromRawFieldsInternal() for details.
+ */
+bool
+NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
+{
+    return NextCopyFromRawFieldsInternal(cstate, fields, nfields, cstate->opts.csv_mode);
+}
+
+/*
+ * CopyFromTextBasedOneRow
  *
  * Copy one row to a set of `values` and `nulls` for the text and CSV
  * formats.
  */
-bool
-CopyFromTextOneRow(CopyFromState cstate,
-                   ExprContext *econtext,
-                   Datum *values,
-                   bool *nulls)
+static inline bool
+CopyFromTextBasedOneRow(CopyFromState cstate,
+                        ExprContext *econtext,
+                        Datum *values,
+                        bool *nulls,
+                        bool csv_mode)
 {
     TupleDesc    tupDesc;
     AttrNumber    attr_count;
@@ -859,7 +882,7 @@ CopyFromTextOneRow(CopyFromState cstate,
     attr_count = list_length(cstate->attnumlist);
 
     /* read raw fields in the next line */
-    if (!NextCopyFromRawFields(cstate, &field_strings, &fldct))
+    if (!NextCopyFromRawFieldsInternal(cstate, &field_strings, &fldct, csv_mode))
         return false;
 
     /* check for overflowing fields */
@@ -956,6 +979,34 @@ CopyFromTextOneRow(CopyFromState cstate,
     return true;
 }
 
+/*
+ * CopyFromTextOneRow
+ *
+ * Copy one row to a set of `values` and `nulls` for the text format.
+ */
+bool
+CopyFromTextOneRow(CopyFromState cstate,
+                   ExprContext *econtext,
+                   Datum *values,
+                   bool *nulls)
+{
+    return CopyFromTextBasedOneRow(cstate, econtext, values, nulls, false);
+}
+
+/*
+ * CopyFromCSVOneRow
+ *
+ * Copy one row to a set of `values` and `nulls` for the CSV format.
+ */
+bool
+CopyFromCSVOneRow(CopyFromState cstate,
+                  ExprContext *econtext,
+                  Datum *values,
+                  bool *nulls)
+{
+    return CopyFromTextBasedOneRow(cstate, econtext, values, nulls, true);
+}
+
 /*
  * CopyFromBinaryOneRow
  *
@@ -1530,7 +1581,7 @@ GetDecimalFromHex(char hex)
  *
  * The return value is the number of fields actually read.
  */
-int
+static int
 CopyReadAttributesText(CopyFromState cstate)
 {
     char        delimc = cstate->opts.delim[0];
@@ -1784,7 +1835,7 @@ CopyReadAttributesText(CopyFromState cstate)
  * CopyReadAttributesText, except we parse the fields according to
  * "standard" (i.e. common) CSV usage.
  */
-int
+static int
 CopyReadAttributesCSV(CopyFromState cstate)
 {
     char        delimc = cstate->opts.delim[0];
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 5fb52dc629..5d597a3c8e 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -141,12 +141,6 @@ typedef struct CopyFromStateData
     int            max_fields;
     char      **raw_fields;
 
-    /*
-     * Per-format callback to parse lines, then fill raw_fields and
-     * attribute_buf.
-     */
-    CopyReadAttributes copy_read_attributes;
-
     /*
      * Similarly, line_buf holds the whole input line being processed. The
      * input cycle is first to read the whole line into line_buf, and then
@@ -200,13 +194,11 @@ typedef struct CopyFromStateData
 extern void ReceiveCopyBegin(CopyFromState cstate);
 extern void ReceiveCopyBinaryHeader(CopyFromState cstate);
 
-/* Callbacks for copy_read_attributes */
-extern int    CopyReadAttributesCSV(CopyFromState cstate);
-extern int    CopyReadAttributesText(CopyFromState cstate);
-
 /* Callbacks for CopyFromRoutine->CopyFromOneRow */
 extern bool CopyFromTextOneRow(CopyFromState cstate, ExprContext *econtext,
                                Datum *values, bool *nulls);
+extern bool CopyFromCSVOneRow(CopyFromState cstate, ExprContext *econtext,
+                              Datum *values, bool *nulls);
 extern bool CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext,
                                  Datum *values, bool *nulls);
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index a90b7189b5..6e244fb443 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -158,12 +158,6 @@ CopyFromTextStart(CopyFromState cstate, TupleDesc tupDesc)
     attr_count = list_length(cstate->attnumlist);
     cstate->max_fields = attr_count;
     cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *));
-
-    /* Set read attribute callback */
-    if (cstate->opts.csv_mode)
-        cstate->copy_read_attributes = CopyReadAttributesCSV;
-    else
-        cstate->copy_read_attributes = CopyReadAttributesText;
 }
 
 /*
@@ -221,9 +215,8 @@ CopyFromBinaryEnd(CopyFromState cstate)
 
 /*
  * Routines assigned to each format.
-+
  * CSV and text share the same implementation, at the exception of the
- * copy_read_attributes callback.
+ * CopyFromOneRow callback.
  */
 static const CopyFromRoutine CopyFromRoutineText = {
     .CopyFromInFunc = CopyFromTextInFunc,
@@ -235,7 +228,7 @@ static const CopyFromRoutine CopyFromRoutineText = {
 static const CopyFromRoutine CopyFromRoutineCSV = {
     .CopyFromInFunc = CopyFromTextInFunc,
     .CopyFromStart = CopyFromTextStart,
-    .CopyFromOneRow = CopyFromTextOneRow,
+    .CopyFromOneRow = CopyFromCSVOneRow,
     .CopyFromEnd = CopyFromTextEnd,
 };
 
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index c45f9ae134..ea2eb45491 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -25,10 +25,10 @@
  *    is copied into 'line_buf', with quotes and escape characters still
  *    intact.
  *
- * 4. CopyReadAttributesText/CSV() function (via copy_read_attribute) takes
- *    the input line from 'line_buf', and splits it into fields, unescaping
- *    the data as required.  The fields are stored in 'attribute_buf', and
- *    'raw_fields' array holds pointers to each field.
+ * 4. CopyReadAttributesText/CSV() function takes the input line from
+ *    'line_buf', and splits it into fields, unescaping the data as required.
+ *    The fields are stored in 'attribute_buf', and 'raw_fields' array holds
+ *    pointers to each field.
  *
  * If encoding conversion is not required, a shortcut is taken in step 2 to
  * avoid copying the data unnecessarily.  The 'input_buf' pointer is set to
@@ -150,8 +150,10 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 
 
 /* non-export function prototypes */
-static bool CopyReadLine(CopyFromState cstate);
-static bool CopyReadLineText(CopyFromState cstate);
+static inline bool CopyReadLine(CopyFromState cstate, bool csv_mode);
+static inline bool CopyReadLineText(CopyFromState cstate, bool csv_mode);
+static int    CopyReadAttributesText(CopyFromState cstate);
+static int    CopyReadAttributesCSV(CopyFromState cstate);
 static Datum CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
                                      Oid typioparam, int32 typmod,
                                      bool *isnull);
@@ -748,9 +750,14 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes)
  * in the relation.
  *
  * NOTE: force_not_null option are not applied to the returned fields.
+ *
+ * Creating static inline NextCopyFromRawFieldsInternal() and call this with
+ * constant 'csv_mode' value from CopyFromTextOneRow()/CopyFromCSVOneRow()
+ * (via CopyFromTextBasedOneRow()) is for optimization. We can avoid indirect
+ * function call by this.
  */
-bool
-NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
+static inline bool
+NextCopyFromRawFieldsInternal(CopyFromState cstate, char ***fields, int *nfields, bool csv_mode)
 {
     int            fldct;
     bool        done;
@@ -767,13 +774,16 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
         tupDesc = RelationGetDescr(cstate->rel);
 
         cstate->cur_lineno++;
-        done = CopyReadLine(cstate);
+        done = CopyReadLine(cstate, csv_mode);
 
         if (cstate->opts.header_line == COPY_HEADER_MATCH)
         {
             int            fldnum;
 
-            fldct = cstate->copy_read_attributes(cstate);
+            if (csv_mode)
+                fldct = CopyReadAttributesCSV(cstate);
+            else
+                fldct = CopyReadAttributesText(cstate);
 
             if (fldct != list_length(cstate->attnumlist))
                 ereport(ERROR,
@@ -814,7 +824,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
     cstate->cur_lineno++;
 
     /* Actually read the line into memory here */
-    done = CopyReadLine(cstate);
+    done = CopyReadLine(cstate, csv_mode);
 
     /*
      * EOF at start of line means we're done.  If we see EOF after some
@@ -825,7 +835,10 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
         return false;
 
     /* Parse the line into de-escaped field values */
-    fldct = cstate->copy_read_attributes(cstate);
+    if (csv_mode)
+        fldct = CopyReadAttributesCSV(cstate);
+    else
+        fldct = CopyReadAttributesText(cstate);
 
     *fields = cstate->raw_fields;
     *nfields = fldct;
@@ -833,16 +846,26 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
 }
 
 /*
- * CopyFromTextOneRow
+ * See NextCopyFromRawFieldsInternal() for details.
+ */
+bool
+NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
+{
+    return NextCopyFromRawFieldsInternal(cstate, fields, nfields, cstate->opts.csv_mode);
+}
+
+/*
+ * CopyFromTextBasedOneRow
  *
  * Copy one row to a set of `values` and `nulls` for the text and CSV
  * formats.
  */
-bool
-CopyFromTextOneRow(CopyFromState cstate,
-                   ExprContext *econtext,
-                   Datum *values,
-                   bool *nulls)
+static inline bool
+CopyFromTextBasedOneRow(CopyFromState cstate,
+                        ExprContext *econtext,
+                        Datum *values,
+                        bool *nulls,
+                        bool csv_mode)
 {
     TupleDesc    tupDesc;
     AttrNumber    attr_count;
@@ -859,7 +882,7 @@ CopyFromTextOneRow(CopyFromState cstate,
     attr_count = list_length(cstate->attnumlist);
 
     /* read raw fields in the next line */
-    if (!NextCopyFromRawFields(cstate, &field_strings, &fldct))
+    if (!NextCopyFromRawFieldsInternal(cstate, &field_strings, &fldct, csv_mode))
         return false;
 
     /* check for overflowing fields */
@@ -894,7 +917,7 @@ CopyFromTextOneRow(CopyFromState cstate,
         cstate->cur_attname = NameStr(att->attname);
         cstate->cur_attval = string;
 
-        if (cstate->opts.csv_mode)
+        if (csv_mode)
         {
             if (string == NULL &&
                 cstate->opts.force_notnull_flags[m])
@@ -956,6 +979,34 @@ CopyFromTextOneRow(CopyFromState cstate,
     return true;
 }
 
+/*
+ * CopyFromTextOneRow
+ *
+ * Copy one row to a set of `values` and `nulls` for the text format.
+ */
+bool
+CopyFromTextOneRow(CopyFromState cstate,
+                   ExprContext *econtext,
+                   Datum *values,
+                   bool *nulls)
+{
+    return CopyFromTextBasedOneRow(cstate, econtext, values, nulls, false);
+}
+
+/*
+ * CopyFromCSVOneRow
+ *
+ * Copy one row to a set of `values` and `nulls` for the CSV format.
+ */
+bool
+CopyFromCSVOneRow(CopyFromState cstate,
+                  ExprContext *econtext,
+                  Datum *values,
+                  bool *nulls)
+{
+    return CopyFromTextBasedOneRow(cstate, econtext, values, nulls, true);
+}
+
 /*
  * CopyFromBinaryOneRow
  *
@@ -1089,8 +1140,8 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
  * by newline.  The terminating newline or EOF marker is not included
  * in the final value of line_buf.
  */
-static bool
-CopyReadLine(CopyFromState cstate)
+static inline bool
+CopyReadLine(CopyFromState cstate, bool csv_mode)
 {
     bool        result;
 
@@ -1098,7 +1149,7 @@ CopyReadLine(CopyFromState cstate)
     cstate->line_buf_valid = false;
 
     /* Parse data and transfer into line_buf */
-    result = CopyReadLineText(cstate);
+    result = CopyReadLineText(cstate, csv_mode);
 
     if (result)
     {
@@ -1165,8 +1216,8 @@ CopyReadLine(CopyFromState cstate)
 /*
  * CopyReadLineText - inner loop of CopyReadLine for text mode
  */
-static bool
-CopyReadLineText(CopyFromState cstate)
+static inline bool
+CopyReadLineText(CopyFromState cstate, bool csv_mode)
 {
     char       *copy_input_buf;
     int            input_buf_ptr;
@@ -1182,7 +1233,7 @@ CopyReadLineText(CopyFromState cstate)
     char        quotec = '\0';
     char        escapec = '\0';
 
-    if (cstate->opts.csv_mode)
+    if (csv_mode)
     {
         quotec = cstate->opts.quote[0];
         escapec = cstate->opts.escape[0];
@@ -1262,7 +1313,7 @@ CopyReadLineText(CopyFromState cstate)
         prev_raw_ptr = input_buf_ptr;
         c = copy_input_buf[input_buf_ptr++];
 
-        if (cstate->opts.csv_mode)
+        if (csv_mode)
         {
             /*
              * If character is '\\' or '\r', we may need to look ahead below.
@@ -1301,7 +1352,7 @@ CopyReadLineText(CopyFromState cstate)
         }
 
         /* Process \r */
-        if (c == '\r' && (!cstate->opts.csv_mode || !in_quote))
+        if (c == '\r' && (!csv_mode || !in_quote))
         {
             /* Check for \r\n on first line, _and_ handle \r\n. */
             if (cstate->eol_type == EOL_UNKNOWN ||
@@ -1329,10 +1380,10 @@ CopyReadLineText(CopyFromState cstate)
                     if (cstate->eol_type == EOL_CRNL)
                         ereport(ERROR,
                                 (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                                 !cstate->opts.csv_mode ?
+                                 !csv_mode ?
                                  errmsg("literal carriage return found in data") :
                                  errmsg("unquoted carriage return found in data"),
-                                 !cstate->opts.csv_mode ?
+                                 !csv_mode ?
                                  errhint("Use \"\\r\" to represent carriage return.") :
                                  errhint("Use quoted CSV field to represent carriage return.")));
 
@@ -1346,10 +1397,10 @@ CopyReadLineText(CopyFromState cstate)
             else if (cstate->eol_type == EOL_NL)
                 ereport(ERROR,
                         (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                         !cstate->opts.csv_mode ?
+                         !csv_mode ?
                          errmsg("literal carriage return found in data") :
                          errmsg("unquoted carriage return found in data"),
-                         !cstate->opts.csv_mode ?
+                         !csv_mode ?
                          errhint("Use \"\\r\" to represent carriage return.") :
                          errhint("Use quoted CSV field to represent carriage return.")));
             /* If reach here, we have found the line terminator */
@@ -1357,15 +1408,15 @@ CopyReadLineText(CopyFromState cstate)
         }
 
         /* Process \n */
-        if (c == '\n' && (!cstate->opts.csv_mode || !in_quote))
+        if (c == '\n' && (!csv_mode || !in_quote))
         {
             if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL)
                 ereport(ERROR,
                         (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                         !cstate->opts.csv_mode ?
+                         !csv_mode ?
                          errmsg("literal newline found in data") :
                          errmsg("unquoted newline found in data"),
-                         !cstate->opts.csv_mode ?
+                         !csv_mode ?
                          errhint("Use \"\\n\" to represent newline.") :
                          errhint("Use quoted CSV field to represent newline.")));
             cstate->eol_type = EOL_NL;    /* in case not set yet */
@@ -1377,7 +1428,7 @@ CopyReadLineText(CopyFromState cstate)
          * In CSV mode, we only recognize \. alone on a line.  This is because
          * \. is a valid CSV data value.
          */
-        if (c == '\\' && (!cstate->opts.csv_mode || first_char_in_line))
+        if (c == '\\' && (!csv_mode || first_char_in_line))
         {
             char        c2;
 
@@ -1410,7 +1461,7 @@ CopyReadLineText(CopyFromState cstate)
 
                     if (c2 == '\n')
                     {
-                        if (!cstate->opts.csv_mode)
+                        if (!csv_mode)
                             ereport(ERROR,
                                     (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
                                      errmsg("end-of-copy marker does not match previous newline style")));
@@ -1419,7 +1470,7 @@ CopyReadLineText(CopyFromState cstate)
                     }
                     else if (c2 != '\r')
                     {
-                        if (!cstate->opts.csv_mode)
+                        if (!csv_mode)
                             ereport(ERROR,
                                     (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
                                      errmsg("end-of-copy marker corrupt")));
@@ -1435,7 +1486,7 @@ CopyReadLineText(CopyFromState cstate)
 
                 if (c2 != '\r' && c2 != '\n')
                 {
-                    if (!cstate->opts.csv_mode)
+                    if (!csv_mode)
                         ereport(ERROR,
                                 (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
                                  errmsg("end-of-copy marker corrupt")));
@@ -1464,7 +1515,7 @@ CopyReadLineText(CopyFromState cstate)
                 result = true;    /* report EOF */
                 break;
             }
-            else if (!cstate->opts.csv_mode)
+            else if (!csv_mode)
             {
                 /*
                  * If we are here, it means we found a backslash followed by
@@ -1530,7 +1581,7 @@ GetDecimalFromHex(char hex)
  *
  * The return value is the number of fields actually read.
  */
-int
+static int
 CopyReadAttributesText(CopyFromState cstate)
 {
     char        delimc = cstate->opts.delim[0];
@@ -1784,7 +1835,7 @@ CopyReadAttributesText(CopyFromState cstate)
  * CopyReadAttributesText, except we parse the fields according to
  * "standard" (i.e. common) CSV usage.
  */
-int
+static int
 CopyReadAttributesCSV(CopyFromState cstate)
 {
     char        delimc = cstate->opts.delim[0];
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 5fb52dc629..5d597a3c8e 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -141,12 +141,6 @@ typedef struct CopyFromStateData
     int            max_fields;
     char      **raw_fields;
 
-    /*
-     * Per-format callback to parse lines, then fill raw_fields and
-     * attribute_buf.
-     */
-    CopyReadAttributes copy_read_attributes;
-
     /*
      * Similarly, line_buf holds the whole input line being processed. The
      * input cycle is first to read the whole line into line_buf, and then
@@ -200,13 +194,11 @@ typedef struct CopyFromStateData
 extern void ReceiveCopyBegin(CopyFromState cstate);
 extern void ReceiveCopyBinaryHeader(CopyFromState cstate);
 
-/* Callbacks for copy_read_attributes */
-extern int    CopyReadAttributesCSV(CopyFromState cstate);
-extern int    CopyReadAttributesText(CopyFromState cstate);
-
 /* Callbacks for CopyFromRoutine->CopyFromOneRow */
 extern bool CopyFromTextOneRow(CopyFromState cstate, ExprContext *econtext,
                                Datum *values, bool *nulls);
+extern bool CopyFromCSVOneRow(CopyFromState cstate, ExprContext *econtext,
+                              Datum *values, bool *nulls);
 extern bool CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext,
                                  Datum *values, bool *nulls);
			
		On Fri, Feb 09, 2024 at 04:32:05PM +0900, Sutou Kouhei wrote:
> In <ZcWoTr1N0GELFA9E@paquier.xyz>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 9 Feb 2024 13:21:34 +0900,
>   Michael Paquier <michael@paquier.xyz> wrote:
>> A next step I think we could take is to split the binary-only and the
>> text/csv-only data in each cstate into their own structure to make the
>> structure, with an opaque pointer that custom formats could use, but a
>> lot of fields are shared as well.
>
> It'll make COPY code base cleaner but it may decrease
> performance.
Perhaps, but I'm not sure, TBH.  But perhaps others can comment on
this point.  This surely needs to be studied closely.
> My suggestion:
> 1. Introduce Copy{To,From}Routine
>    (We can do it based on the v14 patch.)
> 2. Add an opaque pointer to Copy{To,From}Routine
>    (This must not have performance impact.)
> 3.a. Split format specific data to the opaque space
> 3.b. Add support for registering custom format handler by
>      creating a function
> 4. ...
4. is going to need 3.  At this point 3.b sounds like the main thing
to tackle first if we want to get something usable for the end-user
into this release, at least.  Still 2 is important for pluggability
as we pass the cstates across all the routines and custom formats want
to save their own data, so this split sounds OK.  I am not sure how
much of 3.a we really need to do for the in-core formats.
> I think that we should not use this approach for
> performance. We need to use "static inline" and constant
> argument instead something like the attached
> remove-copy-read-attributes.diff.
FWIW, using inlining did not show any performance change here.
Perhaps that's only because this is called in the COPY FROM path once
per row (even for the case of using 1 attribute with blackhole_am).
--
Michael
			
		Вложения
Hi,
On 2024-02-09 13:21:34 +0900, Michael Paquier wrote:
> +static void
> +CopyFromTextInFunc(CopyFromState cstate, Oid atttypid,
> +                   FmgrInfo *finfo, Oid *typioparam)
> +{
> +    Oid            func_oid;
> +
> +    getTypeInputInfo(atttypid, &func_oid, typioparam);
> +    fmgr_info(func_oid, finfo);
> +}
FWIW, we should really change the copy code to initialize FunctionCallInfoData
instead of re-initializing that on every call, realy makes a difference
performance wise.
> +/*
> + * CopyFromTextStart
> + *
> + * Start of COPY FROM for text/CSV format.
> + */
> +static void
> +CopyFromTextStart(CopyFromState cstate, TupleDesc tupDesc)
> +{
> +    AttrNumber    attr_count;
> +
> +    /*
> +     * If encoding conversion is needed, we need another buffer to hold the
> +     * converted input data.  Otherwise, we can just point input_buf to the
> +     * same buffer as raw_buf.
> +     */
> +    if (cstate->need_transcoding)
> +    {
> +        cstate->input_buf = (char *) palloc(INPUT_BUF_SIZE + 1);
> +        cstate->input_buf_index = cstate->input_buf_len = 0;
> +    }
> +    else
> +        cstate->input_buf = cstate->raw_buf;
> +    cstate->input_reached_eof = false;
> +
> +    initStringInfo(&cstate->line_buf);
Seems kinda odd that we have a supposedly extensible API that then stores all
this stuff in the non-extensible CopyFromState.
> +    /* create workspace for CopyReadAttributes results */
> +    attr_count = list_length(cstate->attnumlist);
> +    cstate->max_fields = attr_count;
Why is this here? This seems like generic code, not text format specific.
> +    cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *));
> +    /* Set read attribute callback */
> +    if (cstate->opts.csv_mode)
> +        cstate->copy_read_attributes = CopyReadAttributesCSV;
> +    else
> +        cstate->copy_read_attributes = CopyReadAttributesText;
> +}
Isn't this precisely repeating the mistake of 2889fd23be56?
And, why is this done here? Shouldn't this decision have been made prior to
even calling CopyFromTextStart()?
> +/*
> + * CopyFromTextOneRow
> + *
> + * Copy one row to a set of `values` and `nulls` for the text and CSV
> + * formats.
> + */
I'm very doubtful it's a good idea to combine text and CSV here. They have
basically no shared parsing code, so what's the point in sending them through
one input routine?
> +bool
> +CopyFromTextOneRow(CopyFromState cstate,
> +                   ExprContext *econtext,
> +                   Datum *values,
> +                   bool *nulls)
> +{
> +    TupleDesc    tupDesc;
> +    AttrNumber    attr_count;
> +    FmgrInfo   *in_functions = cstate->in_functions;
> +    Oid           *typioparams = cstate->typioparams;
> +    ExprState **defexprs = cstate->defexprs;
> +    char      **field_strings;
> +    ListCell   *cur;
> +    int            fldct;
> +    int            fieldno;
> +    char       *string;
> +
> +    tupDesc = RelationGetDescr(cstate->rel);
> +    attr_count = list_length(cstate->attnumlist);
> +
> +    /* read raw fields in the next line */
> +    if (!NextCopyFromRawFields(cstate, &field_strings, &fldct))
> +        return false;
> +
> +    /* check for overflowing fields */
> +    if (attr_count > 0 && fldct > attr_count)
> +        ereport(ERROR,
> +                (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
> +                 errmsg("extra data after last expected column")));
It bothers me that we look to be ending up with different error handling
across the various output formats, particularly if they're ending up in
extensions. That'll make it harder to evolve this code in the future.
> +    fieldno = 0;
> +
> +    /* Loop to read the user attributes on the line. */
> +    foreach(cur, cstate->attnumlist)
> +    {
> +        int            attnum = lfirst_int(cur);
> +        int            m = attnum - 1;
> +        Form_pg_attribute att = TupleDescAttr(tupDesc, m);
> +
> +        if (fieldno >= fldct)
> +            ereport(ERROR,
> +                    (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
> +                     errmsg("missing data for column \"%s\"",
> +                            NameStr(att->attname))));
> +        string = field_strings[fieldno++];
> +
> +        if (cstate->convert_select_flags &&
> +            !cstate->convert_select_flags[m])
> +        {
> +            /* ignore input field, leaving column as NULL */
> +            continue;
> +        }
> +
> +        cstate->cur_attname = NameStr(att->attname);
> +        cstate->cur_attval = string;
> +
> +        if (cstate->opts.csv_mode)
> +        {
More unfortunate intermingling of multiple formats in a single routine.
> +
> +        if (cstate->defaults[m])
> +        {
> +            /*
> +             * The caller must supply econtext and have switched into the
> +             * per-tuple memory context in it.
> +             */
> +            Assert(econtext != NULL);
> +            Assert(CurrentMemoryContext == econtext->ecxt_per_tuple_memory);
> +
> +            values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]);
> +        }
I don't think it's good that we end up with this code in different copy
implementations.
Greetings,
Andres Freund
			
		On Fri, Feb 09, 2024 at 11:27:05AM -0800, Andres Freund wrote:
> On 2024-02-09 13:21:34 +0900, Michael Paquier wrote:
>> +static void
>> +CopyFromTextInFunc(CopyFromState cstate, Oid atttypid,
>> +                   FmgrInfo *finfo, Oid *typioparam)
>> +{
>> +    Oid            func_oid;
>> +
>> +    getTypeInputInfo(atttypid, &func_oid, typioparam);
>> +    fmgr_info(func_oid, finfo);
>> +}
>
> FWIW, we should really change the copy code to initialize FunctionCallInfoData
> instead of re-initializing that on every call, realy makes a difference
> performance wise.
You mean to initialize once its memory and let the internal routines
call InitFunctionCallInfoData for each attribute.  Sounds like a good
idea, doing that for HEAD before the main patch.  More impact with
more attributes.
>> +/*
>> + * CopyFromTextStart
>> + *
>> + * Start of COPY FROM for text/CSV format.
>> + */
>> +static void
>> +CopyFromTextStart(CopyFromState cstate, TupleDesc tupDesc)
>> +{
>> +    AttrNumber    attr_count;
>> +
>> +    /*
>> +     * If encoding conversion is needed, we need another buffer to hold the
>> +     * converted input data.  Otherwise, we can just point input_buf to the
>> +     * same buffer as raw_buf.
>> +     */
>> +    if (cstate->need_transcoding)
>> +    {
>> +        cstate->input_buf = (char *) palloc(INPUT_BUF_SIZE + 1);
>> +        cstate->input_buf_index = cstate->input_buf_len = 0;
>> +    }
>> +    else
>> +        cstate->input_buf = cstate->raw_buf;
>> +    cstate->input_reached_eof = false;
>> +
>> +    initStringInfo(&cstate->line_buf);
>
> Seems kinda odd that we have a supposedly extensible API that then stores all
> this stuff in the non-extensible CopyFromState.
That relates to the introduction of the the opaque pointer mentioned
upthread to point to a per-format structure, where we'd store data
specific to each format.
>> +    /* create workspace for CopyReadAttributes results */
>> +    attr_count = list_length(cstate->attnumlist);
>> +    cstate->max_fields = attr_count;
>
> Why is this here? This seems like generic code, not text format specific.
We don't care about that for binary.
>> +/*
>> + * CopyFromTextOneRow
>> + *
>> + * Copy one row to a set of `values` and `nulls` for the text and CSV
>> + * formats.
>> + */
>
> I'm very doubtful it's a good idea to combine text and CSV here. They have
> basically no shared parsing code, so what's the point in sending them through
> one input routine?
The code shared between text and csv involves a path called once per
attribute.  TBH, I am not sure how much of the NULL handling should be
put outside the per-row routine as these options are embedded in the
core options.  So I don't have a better idea on this one than what's
proposed here if we cannot dispatch the routine calls once per
attribute.
>> +    /* read raw fields in the next line */
>> +    if (!NextCopyFromRawFields(cstate, &field_strings, &fldct))
>> +        return false;
>> +
>> +    /* check for overflowing fields */
>> +    if (attr_count > 0 && fldct > attr_count)
>> +        ereport(ERROR,
>> +                (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
>> +                 errmsg("extra data after last expected column")));
>
> It bothers me that we look to be ending up with different error handling
> across the various output formats, particularly if they're ending up in
> extensions. That'll make it harder to evolve this code in the future.
But different formats may have different requirements, including the
number of attributes detected vs expected.  That was not really
nothing me.
>> +        if (cstate->opts.csv_mode)
>> +        {
>
> More unfortunate intermingling of multiple formats in a single
> routine.
Similar answer as a few paragraphs above.  Sutou-san was suggesting to
use an internal routine with fixed arguments instead, which would be
enough at the end with some inline instructions?
>> +
>> +        if (cstate->defaults[m])
>> +        {
>> +            /*
>> +             * The caller must supply econtext and have switched into the
>> +             * per-tuple memory context in it.
>> +             */
>> +            Assert(econtext != NULL);
>> +            Assert(CurrentMemoryContext == econtext->ecxt_per_tuple_memory);
>> +
>> +            values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]);
>> +        }
>
> I don't think it's good that we end up with this code in different copy
> implementations.
Yeah, still we don't care about that for binary.
--
Michael
			
		Вложения
Hi,
In <20240209192705.5qdilvviq3py2voq@awork3.anarazel.de>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 9 Feb 2024 11:27:05 -0800,
  Andres Freund <andres@anarazel.de> wrote:
>> +static void
>> +CopyFromTextInFunc(CopyFromState cstate, Oid atttypid,
>> +                   FmgrInfo *finfo, Oid *typioparam)
>> +{
>> +    Oid            func_oid;
>> +
>> +    getTypeInputInfo(atttypid, &func_oid, typioparam);
>> +    fmgr_info(func_oid, finfo);
>> +}
> 
> FWIW, we should really change the copy code to initialize FunctionCallInfoData
> instead of re-initializing that on every call, realy makes a difference
> performance wise.
How about the attached patch approach? If it's a desired
approach, I can also write a separated patch for COPY TO.
>> +    cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *));
>> +    /* Set read attribute callback */
>> +    if (cstate->opts.csv_mode)
>> +        cstate->copy_read_attributes = CopyReadAttributesCSV;
>> +    else
>> +        cstate->copy_read_attributes = CopyReadAttributesText;
>> +}
> 
> Isn't this precisely repeating the mistake of 2889fd23be56?
What do you think about the approach in my previous mail's
attachments?
https://www.postgresql.org/message-id/flat/20240209.163205.704848659612151781.kou%40clear-code.com#dbb1f8d7f2f0e8fe3c7e37a757fcfc54
If it's a desired approach, I can prepare a v15 patch set
based on the v14 patch set and the approach.
I'll reply other comments later...
Thanks,
-- 
kou
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 41f6bc43e4..a43c853e99 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1691,6 +1691,10 @@ BeginCopyFrom(ParseState *pstate,
     /* We keep those variables in cstate. */
     cstate->in_functions = in_functions;
     cstate->typioparams = typioparams;
+    if (cstate->opts.binary)
+        cstate->fcinfo = PrepareInputFunctionCallInfo();
+    else
+        cstate->fcinfo = PrepareReceiveFunctionCallInfo();
     cstate->defmap = defmap;
     cstate->defexprs = defexprs;
     cstate->volatile_defexprs = volatile_defexprs;
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 906756362e..e372e5efb8 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -853,6 +853,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
                 num_defaults = cstate->num_defaults;
     FmgrInfo   *in_functions = cstate->in_functions;
     Oid           *typioparams = cstate->typioparams;
+    FunctionCallInfoBaseData *fcinfo = cstate->fcinfo;
     int            i;
     int           *defmap = cstate->defmap;
     ExprState **defexprs = cstate->defexprs;
@@ -953,12 +954,13 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
              * If ON_ERROR is specified with IGNORE, skip rows with soft
              * errors
              */
-            else if (!InputFunctionCallSafe(&in_functions[m],
-                                            string,
-                                            typioparams[m],
-                                            att->atttypmod,
-                                            (Node *) cstate->escontext,
-                                            &values[m]))
+            else if (!PreparedInputFunctionCallSafe(fcinfo,
+                                                    &in_functions[m],
+                                                    string,
+                                                    typioparams[m],
+                                                    att->atttypmod,
+                                                    (Node *) cstate->escontext,
+                                                    &values[m]))
             {
                 cstate->num_errors++;
                 return true;
@@ -1958,7 +1960,7 @@ CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
     if (fld_size == -1)
     {
         *isnull = true;
-        return ReceiveFunctionCall(flinfo, NULL, typioparam, typmod);
+        return PreparedReceiveFunctionCall(cstate->fcinfo, flinfo, NULL, typioparam, typmod);
     }
     if (fld_size < 0)
         ereport(ERROR,
@@ -1979,8 +1981,8 @@ CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
     cstate->attribute_buf.data[fld_size] = '\0';
 
     /* Call the column type's binary input converter */
-    result = ReceiveFunctionCall(flinfo, &cstate->attribute_buf,
-                                 typioparam, typmod);
+    result = PreparedReceiveFunctionCall(cstate->fcinfo, flinfo, &cstate->attribute_buf,
+                                         typioparam, typmod);
 
     /* Trouble if it didn't eat the whole buffer */
     if (cstate->attribute_buf.cursor != cstate->attribute_buf.len)
diff --git a/src/backend/utils/fmgr/fmgr.c b/src/backend/utils/fmgr/fmgr.c
index e48a86be54..b0b5310219 100644
--- a/src/backend/utils/fmgr/fmgr.c
+++ b/src/backend/utils/fmgr/fmgr.c
@@ -1672,6 +1672,73 @@ DirectInputFunctionCallSafe(PGFunction func, char *str,
     return true;
 }
 
+/*
+ * Prepare callinfo for PreparedInputFunctionCallSafe to reuse one callinfo
+ * instead of initializing it for each call. This is for performance.
+ */
+FunctionCallInfoBaseData *
+PrepareInputFunctionCallInfo(void)
+{
+    FunctionCallInfoBaseData *fcinfo;
+
+    fcinfo = (FunctionCallInfoBaseData *) palloc(SizeForFunctionCallInfo(3));
+    InitFunctionCallInfoData(*fcinfo, NULL, 3, InvalidOid, NULL, NULL);
+    fcinfo->args[0].isnull = false;
+    fcinfo->args[1].isnull = false;
+    fcinfo->args[2].isnull = false;
+    return fcinfo;
+}
+
+/*
+ * Call a previously-looked-up datatype input function, with prepared callinfo
+ * and non-exception handling of "soft" errors.
+ *
+ * This is basically like InputFunctionCallSafe, but it reuses prepared
+ * callinfo.
+ */
+bool
+PreparedInputFunctionCallSafe(FunctionCallInfoBaseData *fcinfo,
+                              FmgrInfo *flinfo, char *str,
+                              Oid typioparam, int32 typmod,
+                              fmNodePtr escontext,
+                              Datum *result)
+{
+    if (str == NULL && flinfo->fn_strict)
+    {
+        *result = (Datum) 0;    /* just return null result */
+        return true;
+    }
+
+    fcinfo->flinfo = flinfo;
+    fcinfo->context = escontext;
+    fcinfo->isnull = false;
+    fcinfo->args[0].value = CStringGetDatum(str);
+    fcinfo->args[1].value = ObjectIdGetDatum(typioparam);
+    fcinfo->args[2].value = Int32GetDatum(typmod);
+
+    *result = FunctionCallInvoke(fcinfo);
+
+    /* Result value is garbage, and could be null, if an error was reported */
+    if (SOFT_ERROR_OCCURRED(escontext))
+        return false;
+
+    /* Otherwise, should get null result if and only if str is NULL */
+    if (str == NULL)
+    {
+        if (!fcinfo->isnull)
+            elog(ERROR, "input function %u returned non-NULL",
+                 flinfo->fn_oid);
+    }
+    else
+    {
+        if (fcinfo->isnull)
+            elog(ERROR, "input function %u returned NULL",
+                 flinfo->fn_oid);
+    }
+
+    return true;
+}
+
 /*
  * Call a previously-looked-up datatype output function.
  *
@@ -1731,6 +1798,65 @@ ReceiveFunctionCall(FmgrInfo *flinfo, StringInfo buf,
     return result;
 }
 
+/*
+ * Prepare callinfo for PreparedReceiveFunctionCall to reuse one callinfo
+ * instead of initializing it for each call. This is for performance.
+ */
+FunctionCallInfoBaseData *
+PrepareReceiveFunctionCallInfo(void)
+{
+    FunctionCallInfoBaseData *fcinfo;
+
+    fcinfo = (FunctionCallInfoBaseData *) palloc(SizeForFunctionCallInfo(3));
+    InitFunctionCallInfoData(*fcinfo, NULL, 3, InvalidOid, NULL, NULL);
+    fcinfo->args[0].isnull = false;
+    fcinfo->args[1].isnull = false;
+    fcinfo->args[2].isnull = false;
+    return fcinfo;
+}
+
+/*
+ * Call a previously-looked-up datatype binary-input function, with prepared
+ * callinfo.
+ *
+ * This is basically like ReceiveFunctionCall, but it reuses prepared
+ * callinfo.
+ */
+Datum
+PreparedReceiveFunctionCall(FunctionCallInfoBaseData *fcinfo,
+                            FmgrInfo *flinfo, StringInfo buf,
+                            Oid typioparam, int32 typmod)
+{
+    Datum        result;
+
+    if (buf == NULL && flinfo->fn_strict)
+        return (Datum) 0;        /* just return null result */
+
+    fcinfo->flinfo = flinfo;
+    fcinfo->isnull = false;
+    fcinfo->args[0].value = PointerGetDatum(buf);
+    fcinfo->args[1].value = ObjectIdGetDatum(typioparam);
+    fcinfo->args[2].value = Int32GetDatum(typmod);
+
+    result = FunctionCallInvoke(fcinfo);
+
+    /* Should get null result if and only if buf is NULL */
+    if (buf == NULL)
+    {
+        if (!fcinfo->isnull)
+            elog(ERROR, "receive function %u returned non-NULL",
+                 flinfo->fn_oid);
+    }
+    else
+    {
+        if (fcinfo->isnull)
+            elog(ERROR, "receive function %u returned NULL",
+                 flinfo->fn_oid);
+    }
+
+    return result;
+}
+
 /*
  * Call a previously-looked-up datatype binary-output function.
  *
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 759f8e3d09..4d7928b3ac 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -104,6 +104,7 @@ typedef struct CopyFromStateData
     Oid           *typioparams;    /* array of element types for in_functions */
     ErrorSaveContext *escontext;    /* soft error trapper during in_functions
                                      * execution */
+    FunctionCallInfoBaseData *fcinfo;    /* reusable callinfo for in_functions */
     uint64        num_errors;        /* total number of rows which contained soft
                                  * errors */
     int           *defmap;            /* array of default att numbers related to
diff --git a/src/include/fmgr.h b/src/include/fmgr.h
index ccb4070a25..994d8ce487 100644
--- a/src/include/fmgr.h
+++ b/src/include/fmgr.h
@@ -708,12 +708,24 @@ extern bool DirectInputFunctionCallSafe(PGFunction func, char *str,
                                         Oid typioparam, int32 typmod,
                                         fmNodePtr escontext,
                                         Datum *result);
+extern FunctionCallInfoBaseData *PrepareInputFunctionCallInfo(void);
+extern bool
+            PreparedInputFunctionCallSafe(FunctionCallInfoBaseData *fcinfo,
+                                          FmgrInfo *flinfo, char *str,
+                                          Oid typioparam, int32 typmod,
+                                          fmNodePtr escontext,
+                                          Datum *result);
 extern Datum OidInputFunctionCall(Oid functionId, char *str,
                                   Oid typioparam, int32 typmod);
 extern char *OutputFunctionCall(FmgrInfo *flinfo, Datum val);
 extern char *OidOutputFunctionCall(Oid functionId, Datum val);
 extern Datum ReceiveFunctionCall(FmgrInfo *flinfo, fmStringInfo buf,
                                  Oid typioparam, int32 typmod);
+extern FunctionCallInfoBaseData *PrepareReceiveFunctionCallInfo(void);
+extern Datum
+            PreparedReceiveFunctionCall(FunctionCallInfoBaseData *fcinfo,
+                                        FmgrInfo *flinfo, fmStringInfo buf,
+                                        Oid typioparam, int32 typmod);
 extern Datum OidReceiveFunctionCall(Oid functionId, fmStringInfo buf,
                                     Oid typioparam, int32 typmod);
 extern bytea *SendFunctionCall(FmgrInfo *flinfo, Datum val);
			
		On Tue, Feb 13, 2024 at 05:33:40PM +0900, Sutou Kouhei wrote:
> Hi,
>
> In <20240209192705.5qdilvviq3py2voq@awork3.anarazel.de>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 9 Feb 2024 11:27:05 -0800,
>   Andres Freund <andres@anarazel.de> wrote:
>
>>> +static void
>>> +CopyFromTextInFunc(CopyFromState cstate, Oid atttypid,
>>> +                   FmgrInfo *finfo, Oid *typioparam)
>>> +{
>>> +    Oid            func_oid;
>>> +
>>> +    getTypeInputInfo(atttypid, &func_oid, typioparam);
>>> +    fmgr_info(func_oid, finfo);
>>> +}
>>
>> FWIW, we should really change the copy code to initialize FunctionCallInfoData
>> instead of re-initializing that on every call, realy makes a difference
>> performance wise.
>
> How about the attached patch approach? If it's a desired
> approach, I can also write a separated patch for COPY TO.
Hmm, I have not studied that much, but my first impression was that we
would not require any new facility in fmgr.c, but perhaps you're right
and it's more elegant to pass a InitFunctionCallInfoData this way.
PrepareInputFunctionCallInfo() looks OK as a name, but I'm less a fan
of PreparedInputFunctionCallSafe() and its "Prepared" part.  How about
something like ExecuteInputFunctionCallSafe()?
I may be able to look more at that next week, and I would surely check
the impact of that with a simple COPY query throttled by CPU (more
rows and more attributes the better).
>>> +    cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *));
>>> +    /* Set read attribute callback */
>>> +    if (cstate->opts.csv_mode)
>>> +        cstate->copy_read_attributes = CopyReadAttributesCSV;
>>> +    else
>>> +        cstate->copy_read_attributes = CopyReadAttributesText;
>>> +}
>>
>> Isn't this precisely repeating the mistake of 2889fd23be56?
>
> What do you think about the approach in my previous mail's
> attachments?
>
https://www.postgresql.org/message-id/flat/20240209.163205.704848659612151781.kou%40clear-code.com#dbb1f8d7f2f0e8fe3c7e37a757fcfc54
>
> If it's a desired approach, I can prepare a v15 patch set
> based on the v14 patch set and the approach.
Yes, this one looks like it's using the right angle: we don't rely
anymore in cstate to decide which CopyReadAttributes to use, the
routines do that instead.  Note that I've reverted 06bd311bce24 for
the moment, as this is just getting in the way of the main patch, and
that was non-optimal once there is a per-row callback.
> diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
> index 41f6bc43e4..a43c853e99 100644
> --- a/src/backend/commands/copyfrom.c
> +++ b/src/backend/commands/copyfrom.c
> @@ -1691,6 +1691,10 @@ BeginCopyFrom(ParseState *pstate,
>      /* We keep those variables in cstate. */
>      cstate->in_functions = in_functions;
>      cstate->typioparams = typioparams;
> +    if (cstate->opts.binary)
> +        cstate->fcinfo = PrepareInputFunctionCallInfo();
> +    else
> +        cstate->fcinfo = PrepareReceiveFunctionCallInfo();
Perhaps we'd better avoid more callbacks like that, for now.
--
Michael
			
		Вложения
Hi,
In <ZcwzZrrsTEJ7oJyq@paquier.xyz>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Wed, 14 Feb 2024 12:28:38 +0900,
  Michael Paquier <michael@paquier.xyz> wrote:
>> How about the attached patch approach? If it's a desired
>> approach, I can also write a separated patch for COPY TO.
> 
> Hmm, I have not studied that much, but my first impression was that we
> would not require any new facility in fmgr.c, but perhaps you're right
> and it's more elegant to pass a InitFunctionCallInfoData this way.
I'm not familiar with the fmgr.c related code base but it
seems that we abstract {,binary-}input function call by
fmgr.c. So I think that it's better that we follow the
design. (If there is a person who knows the fmgr.c related
code base, please help us.)
> PrepareInputFunctionCallInfo() looks OK as a name, but I'm less a fan
> of PreparedInputFunctionCallSafe() and its "Prepared" part.  How about
> something like ExecuteInputFunctionCallSafe()?
I understand the feeling. SQL uses "prepared" for "prepared
statement". There are similar function names such as
InputFunctionCall()/InputFunctionCallSafe()/DirectInputFunctionCallSafe(). They
execute (call) an input function but they use "call" not
"execute" for it... So "Execute...Call..." may be
redundant...
How about InputFunctionCallSafeWithInfo(),
InputFunctionCallSafeInfo() or
InputFunctionCallInfoCallSafe()?
> I may be able to look more at that next week, and I would surely check
> the impact of that with a simple COPY query throttled by CPU (more
> rows and more attributes the better).
Thanks!
>                            Note that I've reverted 06bd311bce24 for
> the moment, as this is just getting in the way of the main patch, and
> that was non-optimal once there is a per-row callback.
Thanks for sharing the information. I'll rebase on master
when I create the v15 patch.
>> diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
>> index 41f6bc43e4..a43c853e99 100644
>> --- a/src/backend/commands/copyfrom.c
>> +++ b/src/backend/commands/copyfrom.c
>> @@ -1691,6 +1691,10 @@ BeginCopyFrom(ParseState *pstate,
>>      /* We keep those variables in cstate. */
>>      cstate->in_functions = in_functions;
>>      cstate->typioparams = typioparams;
>> +    if (cstate->opts.binary)
>> +        cstate->fcinfo = PrepareInputFunctionCallInfo();
>> +    else
>> +        cstate->fcinfo = PrepareReceiveFunctionCallInfo();
> 
> Perhaps we'd better avoid more callbacks like that, for now.
I'll not use a callback for this. I'll not change this part
after we introduce Copy{To,From}Routine. cstate->fcinfo
isn't used some custom COPY format handlers such as Apache
Arrow handler like cstate->in_functions and
cstate->typioparams. But they will be always allocated. It's
a bit wasteful for those handlers but we may not care about
it. So we can always use "if (state->opts.binary)" condition
here.
BTW... This part was wrong... Sorry... It should be:
    if (cstate->opts.binary)
        cstate->fcinfo = PrepareReceiveFunctionCallInfo();
    else
        cstate->fcinfo = PrepareInputFunctionCallInfo();
Thanks,
-- 
kou
			
		On Wed, Feb 14, 2024 at 02:08:51PM +0900, Sutou Kouhei wrote: > I understand the feeling. SQL uses "prepared" for "prepared > statement". There are similar function names such as > InputFunctionCall()/InputFunctionCallSafe()/DirectInputFunctionCallSafe(). They > execute (call) an input function but they use "call" not > "execute" for it... So "Execute...Call..." may be > redundant... > > How about InputFunctionCallSafeWithInfo(), > InputFunctionCallSafeInfo() or > InputFunctionCallInfoCallSafe()? WithInfo() would not be a new thing. There are a couple of APIs named like this when manipulating catalogs, so that sounds kind of a good choice from here. -- Michael
Вложения
Hi,
In <ZcxjNDtqNLvdz0f5@paquier.xyz>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Wed, 14 Feb 2024 15:52:36 +0900,
  Michael Paquier <michael@paquier.xyz> wrote:
>> How about InputFunctionCallSafeWithInfo(),
>> InputFunctionCallSafeInfo() or
>> InputFunctionCallInfoCallSafe()?
> 
> WithInfo() would not be a new thing.  There are a couple of APIs named
> like this when manipulating catalogs, so that sounds kind of a good
> choice from here.
Thanks for the info. Let's use InputFunctionCallSafeWithInfo().
See that attached patch:
v2-0001-Reuse-fcinfo-used-in-COPY-FROM.patch
I also attach a patch for COPY TO:
v1-0001-Reuse-fcinfo-used-in-COPY-TO.patch
I measured the COPY TO patch on my environment with:
COPY (SELECT
1::int2,2::int2,3::int2,4::int2,5::int2,6::int2,7::int2,8::int2,9::int2,10::int2,11::int2,12::int2,13::int2,14::int2,15::int2,16::int2,17::int2,18::int2,19::int2,20::int2,
generate_series(1,1000000::int4)) TO '/dev/null' \watch c=5
 
master:
740.066ms
734.884ms
738.579ms
734.170ms
727.953ms
patched:
730.714ms
741.483ms
714.149ms
715.436ms
713.578ms
It seems that it improves performance a bit but my
environment isn't suitable for benchmark. So they may not
be valid numbers.
Thanks,
-- 
kou
From b677732f46f735a5601b8890000f79671e91be41 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Thu, 15 Feb 2024 15:01:08 +0900
Subject: [PATCH v2] Reuse fcinfo used in COPY FROM
Each NextCopyFrom() calls input functions or binary-input
functions. We can reuse fcinfo for them instead of creating a local
fcinfo for each call. This will improve performance.
---
 src/backend/commands/copyfrom.c          |   4 +
 src/backend/commands/copyfromparse.c     |  20 ++--
 src/backend/utils/fmgr/fmgr.c            | 126 +++++++++++++++++++++++
 src/include/commands/copyfrom_internal.h |   1 +
 src/include/fmgr.h                       |  12 +++
 5 files changed, 154 insertions(+), 9 deletions(-)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 1fe70b9133..ed375c012e 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1691,6 +1691,10 @@ BeginCopyFrom(ParseState *pstate,
     /* We keep those variables in cstate. */
     cstate->in_functions = in_functions;
     cstate->typioparams = typioparams;
+    if (cstate->opts.binary)
+        cstate->fcinfo = PrepareReceiveFunctionCallInfo();
+    else
+        cstate->fcinfo = PrepareInputFunctionCallInfo();
     cstate->defmap = defmap;
     cstate->defexprs = defexprs;
     cstate->volatile_defexprs = volatile_defexprs;
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 7cacd0b752..7907e16ea8 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -861,6 +861,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
                 num_defaults = cstate->num_defaults;
     FmgrInfo   *in_functions = cstate->in_functions;
     Oid           *typioparams = cstate->typioparams;
+    FunctionCallInfoBaseData *fcinfo = cstate->fcinfo;
     int            i;
     int           *defmap = cstate->defmap;
     ExprState **defexprs = cstate->defexprs;
@@ -961,12 +962,13 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
              * If ON_ERROR is specified with IGNORE, skip rows with soft
              * errors
              */
-            else if (!InputFunctionCallSafe(&in_functions[m],
-                                            string,
-                                            typioparams[m],
-                                            att->atttypmod,
-                                            (Node *) cstate->escontext,
-                                            &values[m]))
+            else if (!InputFunctionCallSafeWithInfo(fcinfo,
+                                                    &in_functions[m],
+                                                    string,
+                                                    typioparams[m],
+                                                    att->atttypmod,
+                                                    (Node *) cstate->escontext,
+                                                    &values[m]))
             {
                 cstate->num_errors++;
                 return true;
@@ -1966,7 +1968,7 @@ CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
     if (fld_size == -1)
     {
         *isnull = true;
-        return ReceiveFunctionCall(flinfo, NULL, typioparam, typmod);
+        return ReceiveFunctionCallWithInfo(cstate->fcinfo, flinfo, NULL, typioparam, typmod);
     }
     if (fld_size < 0)
         ereport(ERROR,
@@ -1987,8 +1989,8 @@ CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
     cstate->attribute_buf.data[fld_size] = '\0';
 
     /* Call the column type's binary input converter */
-    result = ReceiveFunctionCall(flinfo, &cstate->attribute_buf,
-                                 typioparam, typmod);
+    result = ReceiveFunctionCallWithInfo(cstate->fcinfo, flinfo, &cstate->attribute_buf,
+                                         typioparam, typmod);
 
     /* Trouble if it didn't eat the whole buffer */
     if (cstate->attribute_buf.cursor != cstate->attribute_buf.len)
diff --git a/src/backend/utils/fmgr/fmgr.c b/src/backend/utils/fmgr/fmgr.c
index e48a86be54..14c3ed2bdb 100644
--- a/src/backend/utils/fmgr/fmgr.c
+++ b/src/backend/utils/fmgr/fmgr.c
@@ -1672,6 +1672,73 @@ DirectInputFunctionCallSafe(PGFunction func, char *str,
     return true;
 }
 
+/*
+ * Prepare callinfo for InputFunctionCallSafeWithInfo to reuse one callinfo
+ * instead of initializing it for each call. This is for performance.
+ */
+FunctionCallInfoBaseData *
+PrepareInputFunctionCallInfo(void)
+{
+    FunctionCallInfoBaseData *fcinfo;
+
+    fcinfo = (FunctionCallInfoBaseData *) palloc(SizeForFunctionCallInfo(3));
+    InitFunctionCallInfoData(*fcinfo, NULL, 3, InvalidOid, NULL, NULL);
+    fcinfo->args[0].isnull = false;
+    fcinfo->args[1].isnull = false;
+    fcinfo->args[2].isnull = false;
+    return fcinfo;
+}
+
+/*
+ * Call a previously-looked-up datatype input function, with prepared callinfo
+ * and non-exception handling of "soft" errors.
+ *
+ * This is basically like InputFunctionCallSafe, but it reuses prepared
+ * callinfo.
+ */
+bool
+InputFunctionCallSafeWithInfo(FunctionCallInfoBaseData *fcinfo,
+                              FmgrInfo *flinfo, char *str,
+                              Oid typioparam, int32 typmod,
+                              fmNodePtr escontext,
+                              Datum *result)
+{
+    if (str == NULL && flinfo->fn_strict)
+    {
+        *result = (Datum) 0;    /* just return null result */
+        return true;
+    }
+
+    fcinfo->flinfo = flinfo;
+    fcinfo->context = escontext;
+    fcinfo->isnull = false;
+    fcinfo->args[0].value = CStringGetDatum(str);
+    fcinfo->args[1].value = ObjectIdGetDatum(typioparam);
+    fcinfo->args[2].value = Int32GetDatum(typmod);
+
+    *result = FunctionCallInvoke(fcinfo);
+
+    /* Result value is garbage, and could be null, if an error was reported */
+    if (SOFT_ERROR_OCCURRED(escontext))
+        return false;
+
+    /* Otherwise, should get null result if and only if str is NULL */
+    if (str == NULL)
+    {
+        if (!fcinfo->isnull)
+            elog(ERROR, "input function %u returned non-NULL",
+                 flinfo->fn_oid);
+    }
+    else
+    {
+        if (fcinfo->isnull)
+            elog(ERROR, "input function %u returned NULL",
+                 flinfo->fn_oid);
+    }
+
+    return true;
+}
+
 /*
  * Call a previously-looked-up datatype output function.
  *
@@ -1731,6 +1798,65 @@ ReceiveFunctionCall(FmgrInfo *flinfo, StringInfo buf,
     return result;
 }
 
+/*
+ * Prepare callinfo for ReceiveFunctionCallWithInfo to reuse one callinfo
+ * instead of initializing it for each call. This is for performance.
+ */
+FunctionCallInfoBaseData *
+PrepareReceiveFunctionCallInfo(void)
+{
+    FunctionCallInfoBaseData *fcinfo;
+
+    fcinfo = (FunctionCallInfoBaseData *) palloc(SizeForFunctionCallInfo(3));
+    InitFunctionCallInfoData(*fcinfo, NULL, 3, InvalidOid, NULL, NULL);
+    fcinfo->args[0].isnull = false;
+    fcinfo->args[1].isnull = false;
+    fcinfo->args[2].isnull = false;
+    return fcinfo;
+}
+
+/*
+ * Call a previously-looked-up datatype binary-input function, with prepared
+ * callinfo.
+ *
+ * This is basically like ReceiveFunctionCall, but it reuses prepared
+ * callinfo.
+ */
+Datum
+ReceiveFunctionCallWithInfo(FunctionCallInfoBaseData *fcinfo,
+                            FmgrInfo *flinfo, StringInfo buf,
+                            Oid typioparam, int32 typmod)
+{
+    Datum        result;
+
+    if (buf == NULL && flinfo->fn_strict)
+        return (Datum) 0;        /* just return null result */
+
+    fcinfo->flinfo = flinfo;
+    fcinfo->isnull = false;
+    fcinfo->args[0].value = PointerGetDatum(buf);
+    fcinfo->args[1].value = ObjectIdGetDatum(typioparam);
+    fcinfo->args[2].value = Int32GetDatum(typmod);
+
+    result = FunctionCallInvoke(fcinfo);
+
+    /* Should get null result if and only if buf is NULL */
+    if (buf == NULL)
+    {
+        if (!fcinfo->isnull)
+            elog(ERROR, "receive function %u returned non-NULL",
+                 flinfo->fn_oid);
+    }
+    else
+    {
+        if (fcinfo->isnull)
+            elog(ERROR, "receive function %u returned NULL",
+                 flinfo->fn_oid);
+    }
+
+    return result;
+}
+
 /*
  * Call a previously-looked-up datatype binary-output function.
  *
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index cad52fcc78..8c1a227c02 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -97,6 +97,7 @@ typedef struct CopyFromStateData
     Oid           *typioparams;    /* array of element types for in_functions */
     ErrorSaveContext *escontext;    /* soft error trapper during in_functions
                                      * execution */
+    FunctionCallInfoBaseData *fcinfo;    /* reusable callinfo for in_functions */
     uint64        num_errors;        /* total number of rows which contained soft
                                  * errors */
     int           *defmap;            /* array of default att numbers related to
diff --git a/src/include/fmgr.h b/src/include/fmgr.h
index ccb4070a25..3d3a12205b 100644
--- a/src/include/fmgr.h
+++ b/src/include/fmgr.h
@@ -708,12 +708,24 @@ extern bool DirectInputFunctionCallSafe(PGFunction func, char *str,
                                         Oid typioparam, int32 typmod,
                                         fmNodePtr escontext,
                                         Datum *result);
+extern FunctionCallInfoBaseData *PrepareInputFunctionCallInfo(void);
+extern bool
+            InputFunctionCallSafeWithInfo(FunctionCallInfoBaseData *fcinfo,
+                                          FmgrInfo *flinfo, char *str,
+                                          Oid typioparam, int32 typmod,
+                                          fmNodePtr escontext,
+                                          Datum *result);
 extern Datum OidInputFunctionCall(Oid functionId, char *str,
                                   Oid typioparam, int32 typmod);
 extern char *OutputFunctionCall(FmgrInfo *flinfo, Datum val);
 extern char *OidOutputFunctionCall(Oid functionId, Datum val);
 extern Datum ReceiveFunctionCall(FmgrInfo *flinfo, fmStringInfo buf,
                                  Oid typioparam, int32 typmod);
+extern FunctionCallInfoBaseData *PrepareReceiveFunctionCallInfo(void);
+extern Datum
+            ReceiveFunctionCallWithInfo(FunctionCallInfoBaseData *fcinfo,
+                                        FmgrInfo *flinfo, fmStringInfo buf,
+                                        Oid typioparam, int32 typmod);
 extern Datum OidReceiveFunctionCall(Oid functionId, fmStringInfo buf,
                                     Oid typioparam, int32 typmod);
 extern bytea *SendFunctionCall(FmgrInfo *flinfo, Datum val);
-- 
2.43.0
From dbf04dec457ad2c61d00538514cc5356e94074e1 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Thu, 15 Feb 2024 15:26:31 +0900
Subject: [PATCH v1] Reuse fcinfo used in COPY TO
Each CopyOneRowTo() calls output functions or binary-output
functions. We can reuse fcinfo for them instead of creating a local
fcinfo for each call. This will improve performance.
---
 src/backend/commands/copyto.c | 14 +++++--
 src/backend/utils/fmgr/fmgr.c | 79 +++++++++++++++++++++++++++++++++++
 src/include/fmgr.h            |  6 +++
 3 files changed, 95 insertions(+), 4 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 20ffc90363..21442861f3 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -97,6 +97,7 @@ typedef struct CopyToStateData
     MemoryContext copycontext;    /* per-copy execution context */
 
     FmgrInfo   *out_functions;    /* lookup info for output functions */
+    FunctionCallInfoBaseData *fcinfo;    /* reusable callinfo for out_functions */
     MemoryContext rowcontext;    /* per-row evaluation context */
     uint64        bytes_processed;    /* number of bytes processed so far */
 } CopyToStateData;
@@ -786,6 +787,10 @@ DoCopyTo(CopyToState cstate)
                               &isvarlena);
         fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
     }
+    if (cstate->opts.binary)
+        cstate->fcinfo = PrepareSendFunctionCallInfo();
+    else
+        cstate->fcinfo = PrepareOutputFunctionCallInfo();
 
     /*
      * Create a temporary memory context that we can reset once per row to
@@ -909,6 +914,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 {
     bool        need_delim = false;
     FmgrInfo   *out_functions = cstate->out_functions;
+    FunctionCallInfoBaseData *fcinfo = cstate->fcinfo;
     MemoryContext oldcontext;
     ListCell   *cur;
     char       *string;
@@ -949,8 +955,8 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
         {
             if (!cstate->opts.binary)
             {
-                string = OutputFunctionCall(&out_functions[attnum - 1],
-                                            value);
+                string = OutputFunctionCallWithInfo(fcinfo, &out_functions[attnum - 1],
+                                                    value);
                 if (cstate->opts.csv_mode)
                     CopyAttributeOutCSV(cstate, string,
                                         cstate->opts.force_quote_flags[attnum - 1]);
@@ -961,8 +967,8 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
             {
                 bytea       *outputbytes;
 
-                outputbytes = SendFunctionCall(&out_functions[attnum - 1],
-                                               value);
+                outputbytes = SendFunctionCallWithInfo(fcinfo, &out_functions[attnum - 1],
+                                                       value);
                 CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
                 CopySendData(cstate, VARDATA(outputbytes),
                              VARSIZE(outputbytes) - VARHDRSZ);
diff --git a/src/backend/utils/fmgr/fmgr.c b/src/backend/utils/fmgr/fmgr.c
index e48a86be54..ab74a643f2 100644
--- a/src/backend/utils/fmgr/fmgr.c
+++ b/src/backend/utils/fmgr/fmgr.c
@@ -1685,6 +1685,45 @@ OutputFunctionCall(FmgrInfo *flinfo, Datum val)
     return DatumGetCString(FunctionCall1(flinfo, val));
 }
 
+/*
+ * Prepare callinfo for OutputFunctionCallWithInfo to reuse one callinfo
+ * instead of initializing it for each call. This is for performance.
+ */
+FunctionCallInfoBaseData *
+PrepareOutputFunctionCallInfo(void)
+{
+    FunctionCallInfoBaseData *fcinfo;
+
+    fcinfo = (FunctionCallInfoBaseData *) palloc(SizeForFunctionCallInfo(1));
+    InitFunctionCallInfoData(*fcinfo, NULL, 1, InvalidOid, NULL, NULL);
+    fcinfo->args[0].isnull = false;
+    return fcinfo;
+}
+
+/*
+ * Call a previously-looked-up datatype output function, with prepared callinfo.
+ *
+ * This is basically like OutputFunctionCall, but it reuses prepared callinfo.
+ */
+char *
+OutputFunctionCallWithInfo(FunctionCallInfoBaseData *fcinfo,
+                           FmgrInfo *flinfo, Datum val)
+{
+    Datum        result;
+
+    fcinfo->flinfo = flinfo;
+    fcinfo->isnull = false;
+    fcinfo->args[0].value = val;
+
+    result = FunctionCallInvoke(fcinfo);
+
+    /* Check for null result, since caller is clearly not expecting one */
+    if (fcinfo->isnull)
+        elog(ERROR, "function %u returned NULL", flinfo->fn_oid);
+
+    return DatumGetCString(result);
+}
+
 /*
  * Call a previously-looked-up datatype binary-input function.
  *
@@ -1746,6 +1785,46 @@ SendFunctionCall(FmgrInfo *flinfo, Datum val)
     return DatumGetByteaP(FunctionCall1(flinfo, val));
 }
 
+/*
+ * Prepare callinfo for SendFunctionCallWithInfo to reuse one callinfo
+ * instead of initializing it for each call. This is for performance.
+ */
+FunctionCallInfoBaseData *
+PrepareSendFunctionCallInfo(void)
+{
+    FunctionCallInfoBaseData *fcinfo;
+
+    fcinfo = (FunctionCallInfoBaseData *) palloc(SizeForFunctionCallInfo(1));
+    InitFunctionCallInfoData(*fcinfo, NULL, 1, InvalidOid, NULL, NULL);
+    fcinfo->args[0].isnull = false;
+    return fcinfo;
+}
+
+/*
+ * Call a previously-looked-up datatype binary-output function, with prepared
+ * callinfo.
+ *
+ * This is basically like SendFunctionCall, but it reuses prepared callinfo.
+ */
+bytea *
+SendFunctionCallWithInfo(FunctionCallInfoBaseData *fcinfo,
+                         FmgrInfo *flinfo, Datum val)
+{
+    Datum        result;
+
+    fcinfo->flinfo = flinfo;
+    fcinfo->isnull = false;
+    fcinfo->args[0].value = val;
+
+    result = FunctionCallInvoke(fcinfo);
+
+    /* Check for null result, since caller is clearly not expecting one */
+    if (fcinfo->isnull)
+        elog(ERROR, "function %u returned NULL", flinfo->fn_oid);
+
+    return DatumGetByteaP(result);
+}
+
 /*
  * As above, for I/O functions identified by OID.  These are only to be used
  * in seldom-executed code paths.  They are not only slow but leak memory.
diff --git a/src/include/fmgr.h b/src/include/fmgr.h
index ccb4070a25..816ed31b05 100644
--- a/src/include/fmgr.h
+++ b/src/include/fmgr.h
@@ -711,12 +711,18 @@ extern bool DirectInputFunctionCallSafe(PGFunction func, char *str,
 extern Datum OidInputFunctionCall(Oid functionId, char *str,
                                   Oid typioparam, int32 typmod);
 extern char *OutputFunctionCall(FmgrInfo *flinfo, Datum val);
+extern FunctionCallInfoBaseData *PrepareOutputFunctionCallInfo(void);
+extern char *OutputFunctionCallWithInfo(FunctionCallInfoBaseData *fcinfo,
+                                        FmgrInfo *flinfo, Datum val);
 extern char *OidOutputFunctionCall(Oid functionId, Datum val);
 extern Datum ReceiveFunctionCall(FmgrInfo *flinfo, fmStringInfo buf,
                                  Oid typioparam, int32 typmod);
 extern Datum OidReceiveFunctionCall(Oid functionId, fmStringInfo buf,
                                     Oid typioparam, int32 typmod);
 extern bytea *SendFunctionCall(FmgrInfo *flinfo, Datum val);
+extern FunctionCallInfoBaseData *PrepareSendFunctionCallInfo(void);
+extern bytea *SendFunctionCallWithInfo(FunctionCallInfoBaseData *fcinfo,
+                                       FmgrInfo *flinfo, Datum val);
 extern bytea *OidSendFunctionCall(Oid functionId, Datum val);
 
 
-- 
2.43.0
			
		Hi, In <20240213.173340.1518143507526518973.kou@clear-code.com> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Tue, 13 Feb 2024 17:33:40 +0900 (JST), Sutou Kouhei <kou@clear-code.com> wrote: > I'll reply other comments later... I've read other comments and my answers for them are same as Michael's one. I'll prepare the v15 patch with static inline functions and fixed arguments after the fcinfo cache patches are merged. I think that the v15 patch will be conflicted with fcinfo cache patches. Thanks, -- kou
On Thu, Feb 15, 2024 at 2:34 PM Sutou Kouhei <kou@clear-code.com> wrote:
>
>
> Thanks for the info. Let's use InputFunctionCallSafeWithInfo().
> See that attached patch:
> v2-0001-Reuse-fcinfo-used-in-COPY-FROM.patch
>
> I also attach a patch for COPY TO:
> v1-0001-Reuse-fcinfo-used-in-COPY-TO.patch
>
> I measured the COPY TO patch on my environment with:
> COPY (SELECT
1::int2,2::int2,3::int2,4::int2,5::int2,6::int2,7::int2,8::int2,9::int2,10::int2,11::int2,12::int2,13::int2,14::int2,15::int2,16::int2,17::int2,18::int2,19::int2,20::int2,
generate_series(1,1000000::int4)) TO '/dev/null' \watch c=5 
>
> master:
> 740.066ms
> 734.884ms
> 738.579ms
> 734.170ms
> 727.953ms
>
> patched:
> 730.714ms
> 741.483ms
> 714.149ms
> 715.436ms
> 713.578ms
>
> It seems that it improves performance a bit but my
> environment isn't suitable for benchmark. So they may not
> be valid numbers.
My environment is slow (around 10x) but consistent.
I see around 2-3 percent increase consistently.
(with patch 7369.068 ms, without patch 7574.802 ms)
the patchset looks good in my eyes, i can understand it.
however I cannot apply it cleanly against the HEAD.
+/*
+ * Prepare callinfo for InputFunctionCallSafeWithInfo to reuse one callinfo
+ * instead of initializing it for each call. This is for performance.
+ */
+FunctionCallInfoBaseData *
+PrepareInputFunctionCallInfo(void)
+{
+ FunctionCallInfoBaseData *fcinfo;
+
+ fcinfo = (FunctionCallInfoBaseData *) palloc(SizeForFunctionCallInfo(3));
just wondering, I saw other similar places using palloc0,
do we need to use palloc0?
			
		Hi,
In <CACJufxE=m8kMC92JpaqNMg02P_Pi1sZJ1w=xNec0=j_W6d9GDw@mail.gmail.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Thu, 15 Feb 2024 17:09:20 +0800,
  jian he <jian.universality@gmail.com> wrote:
> My environment is slow (around 10x) but consistent.
> I see around 2-3 percent increase consistently.
> (with patch 7369.068 ms, without patch 7574.802 ms)
Thanks for sharing your numbers! It will help us to
determine whether these changes improve performance or not.
> the patchset looks good in my eyes, i can understand it.
> however I cannot apply it cleanly against the HEAD.
Hmm, I used 9bc1eee988c31e66a27e007d41020664df490214 as the
base version. But both patches based on the same
revision. So we may not be able to apply both patches at
once cleanly.
> +/*
> + * Prepare callinfo for InputFunctionCallSafeWithInfo to reuse one callinfo
> + * instead of initializing it for each call. This is for performance.
> + */
> +FunctionCallInfoBaseData *
> +PrepareInputFunctionCallInfo(void)
> +{
> + FunctionCallInfoBaseData *fcinfo;
> +
> + fcinfo = (FunctionCallInfoBaseData *) palloc(SizeForFunctionCallInfo(3));
> 
> just wondering, I saw other similar places using palloc0,
> do we need to use palloc0?
I think that we don't need to use palloc0() here because the
following InitFunctionCallInfoData() call initializes all
members explicitly.
Thanks,
-- 
kou
			
		On Thu, Feb 15, 2024 at 03:34:21PM +0900, Sutou Kouhei wrote:
> It seems that it improves performance a bit but my
> environment isn't suitable for benchmark. So they may not
> be valid numbers.
I was comparing what you have here, and what's been attached by Andres
at [1] and the top of the changes on my development branch at [2]
(v3-0008, mostly).  And, it strikes me that there is no need to do any
major changes in any of the callbacks proposed up to v13 and v14 in
this thread, as all the changes proposed want to plug in more data
into each StateData for COPY FROM and COPY TO, the best part being
that v3-0008 can just reuse the proposed callbacks as-is.  v1-0001
from Sutou-san would need one slight tweak in the per-row callback,
still that's minor.
I have been spending more time on the patch to introduce the COPY
APIs, leading me to the v15 attached, where I have replaced the
previous attribute callbacks for the output representation and the
reads with hardcoded routines that should be optimized by compilers,
and I have done more profiling with -O2.  I'm aware of the disparities
in the per-row and start callbacks for the text/csv cases as well as
the default expressions, but these are really format-dependent with
their own assumptions so splitting them is something that makes
limited sense to me.  I've also looks at externalizing some of the
error handling, though the result was not that beautiful, so what I
got here is what makes the callbacks leaner and easier to work with.
First, some results for COPY FROM using the previous tests (30 int
attributes, running on scissors, data sent to blackhole_am, etc.) in
NextCopyFrom() which becomes the hot-spot:
* Using v15:
  Children      Self  Command   Shared Object       Symbol
-   66.42%     0.71%  postgres  postgres            [.] NextCopyFrom
    - 65.70% NextCopyFrom
       - 65.49% CopyFromTextLikeOneRow
          + 19.29% InputFunctionCallSafe
          + 15.81% CopyReadLine
            13.89% CopyReadAttributesText
    + 0.71% _start
* Using HEAD (today's 011d60c4352c):
  Children      Self  Command   Shared Object       Symbol
-   67.09%    16.64%  postgres  postgres            [.] NextCopyFrom
    - 50.45% NextCopyFrom
       - 30.89% NextCopyFromRawFields
          + 16.26% CopyReadLine
            13.59% CopyReadAttributesText
       + 19.24% InputFunctionCallSafe
    + 16.64% _start
In this case, I have been able to limit the effects of the per-row
callback by making NextCopyFromRawFields() local to copyfromparse.c
while applying some inlining to it.  This brings me to a different
point, why don't we do this change independently on HEAD?  It's not
really complicated to make NextCopyFromRawFields show high in the
profiles.  I was looking at external projects, and noticed that
there's nothing calling NextCopyFromRawFields() directly.
Second, some profiles with COPY TO (30 int integers, running on
scissors) where data is sent /dev/null:
* Using v15:
  Children      Self  Command   Shared Object       Symbol
-   85.61%     0.34%  postgres  postgres            [.] CopyOneRowTo
    - 85.26% CopyOneRowTo
       - 75.86% CopyToTextOneRow
          + 36.49% OutputFunctionCall
          + 10.53% appendBinaryStringInfo
            9.66% CopyAttributeOutText
            1.34% int4out
            0.92% 0xffffa9803be8
            0.79% enlargeStringInfo
            0.77% memcpy@plt
            0.69% 0xffffa9803be4
       + 3.12% CopySendEndOfRow
         2.81% CopySendChar
         0.95% pgstat_progress_update_param
         0.95% appendBinaryStringInfo
         0.55% MemoryContextReset
* Using HEAD (today's 011d60c4352c):
  Children      Self  Command   Shared Object       Symbol
-   80.35%    14.23%  postgres  postgres            [.] CopyOneRowTo
    - 66.12% CopyOneRowTo
       + 35.40% OutputFunctionCall
       + 11.00% appendBinaryStringInfo
         8.38% CopyAttributeOutText
       + 2.98% CopySendEndOfRow
         1.52% int4out
         0.88% pgstat_progress_update_param
         0.87% 0xffff8ab32be8
         0.74% memcpy@plt
         0.68% enlargeStringInfo
         0.61% 0xffff8ab32be4
         0.51% MemoryContextReset
    + 14.23% _start
The increase in CopyOneRowTo from 80% to 85% worries me but I am not
quite sure how to optimize that with the current structure of the
code, so the dispatch caused by per-row callback is noticeable in
what's my worst test case.  I am not quite sure how to avoid that,
TBH.  A result that has been puzzling me is that I am getting faster
runtimes with v15 (6232ms in average) vs HEAD (6550ms) at 5M rows with
COPY TO for what led to these profiles (for tests without perf
attached to the backends).
Any thoughts overall?
[1]: https://www.postgresql.org/message-id/20240218015955.rmw5mcmobt5hbene%40awork3.anarazel.de
[2]: https://www.postgresql.org/message-id/ZcWoTr1N0GELFA9E@paquier.xyz
--
Michael
			
		Вложения
Hi, In <ZdbtQJ-p5H1_EDwE@paquier.xyz> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Thu, 22 Feb 2024 15:44:16 +0900, Michael Paquier <michael@paquier.xyz> wrote: > I was comparing what you have here, and what's been attached by Andres > at [1] and the top of the changes on my development branch at [2] > (v3-0008, mostly). And, it strikes me that there is no need to do any > major changes in any of the callbacks proposed up to v13 and v14 in > this thread, as all the changes proposed want to plug in more data > into each StateData for COPY FROM and COPY TO, the best part being > that v3-0008 can just reuse the proposed callbacks as-is. v1-0001 > from Sutou-san would need one slight tweak in the per-row callback, > still that's minor. I think so too. But I thought that some minor conflicts will be happen with this and the v15. So I worked on this before the v15. We agreed that this optimization doesn't block v15: [1] So we can work on the v15 without this optimization for now. [1] https://www.postgresql.org/message-id/flat/20240219195351.5vy7cdl3wxia66kg%40awork3.anarazel.de#20f9677e074fb0f8c5bb3994ef059a15 > I have been spending more time on the patch to introduce the COPY > APIs, leading me to the v15 attached, where I have replaced the > previous attribute callbacks for the output representation and the > reads with hardcoded routines that should be optimized by compilers, > and I have done more profiling with -O2. Thanks! I wanted to work on it but I didn't have enough time for it in a few days... I've reviewed the v15. ---- > @@ -751,8 +751,9 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes) > * > * NOTE: force_not_null option are not applied to the returned fields. > */ > -bool > -NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields) > +static bool "inline" is missing here. > +NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields, > + bool is_csv) > { > int fldct; ---- How about adding "is_csv" to CopyReadline() and CopyReadLineText() too? ---- diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index 25b8d4bc52..79fabecc69 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -150,8 +150,8 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0"; /* non-export function prototypes */ -static bool CopyReadLine(CopyFromState cstate); -static bool CopyReadLineText(CopyFromState cstate); +static inline bool CopyReadLine(CopyFromState cstate, bool is_csv); +static inline bool CopyReadLineText(CopyFromState cstate, bool is_csv); static inline int CopyReadAttributesText(CopyFromState cstate); static inline int CopyReadAttributesCSV(CopyFromState cstate); static Datum CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo, @@ -770,7 +770,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields, tupDesc = RelationGetDescr(cstate->rel); cstate->cur_lineno++; - done = CopyReadLine(cstate); + done = CopyReadLine(cstate, is_csv); if (cstate->opts.header_line == COPY_HEADER_MATCH) { @@ -823,7 +823,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields, cstate->cur_lineno++; /* Actually read the line into memory here */ - done = CopyReadLine(cstate); + done = CopyReadLine(cstate, is_csv); /* * EOF at start of line means we're done. If we see EOF after some @@ -1133,8 +1133,8 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext, * by newline. The terminating newline or EOF marker is not included * in the final value of line_buf. */ -static bool -CopyReadLine(CopyFromState cstate) +static inline bool +CopyReadLine(CopyFromState cstate, bool is_csv) { bool result; @@ -1142,7 +1142,7 @@ CopyReadLine(CopyFromState cstate) cstate->line_buf_valid = false; /* Parse data and transfer into line_buf */ - result = CopyReadLineText(cstate); + result = CopyReadLineText(cstate, is_csv); if (result) { @@ -1209,8 +1209,8 @@ CopyReadLine(CopyFromState cstate) /* * CopyReadLineText - inner loop of CopyReadLine for text mode */ -static bool -CopyReadLineText(CopyFromState cstate) +static inline bool +CopyReadLineText(CopyFromState cstate, bool is_csv) { char *copy_input_buf; int input_buf_ptr; @@ -1226,7 +1226,7 @@ CopyReadLineText(CopyFromState cstate) char quotec = '\0'; char escapec = '\0'; - if (cstate->opts.csv_mode) + if (is_csv) { quotec = cstate->opts.quote[0]; escapec = cstate->opts.escape[0]; @@ -1306,7 +1306,7 @@ CopyReadLineText(CopyFromState cstate) prev_raw_ptr = input_buf_ptr; c = copy_input_buf[input_buf_ptr++]; - if (cstate->opts.csv_mode) + if (is_csv) { /* * If character is '\\' or '\r', we may need to look ahead below. @@ -1345,7 +1345,7 @@ CopyReadLineText(CopyFromState cstate) } /* Process \r */ - if (c == '\r' && (!cstate->opts.csv_mode || !in_quote)) + if (c == '\r' && (!is_csv || !in_quote)) { /* Check for \r\n on first line, _and_ handle \r\n. */ if (cstate->eol_type == EOL_UNKNOWN || @@ -1373,10 +1373,10 @@ CopyReadLineText(CopyFromState cstate) if (cstate->eol_type == EOL_CRNL) ereport(ERROR, (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - !cstate->opts.csv_mode ? + !is_csv ? errmsg("literal carriage return found in data") : errmsg("unquoted carriage return found in data"), - !cstate->opts.csv_mode ? + !is_csv ? errhint("Use \"\\r\" to represent carriage return.") : errhint("Use quoted CSV field to represent carriage return."))); @@ -1390,10 +1390,10 @@ CopyReadLineText(CopyFromState cstate) else if (cstate->eol_type == EOL_NL) ereport(ERROR, (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - !cstate->opts.csv_mode ? + !is_csv ? errmsg("literal carriage return found in data") : errmsg("unquoted carriage return found in data"), - !cstate->opts.csv_mode ? + !is_csv ? errhint("Use \"\\r\" to represent carriage return.") : errhint("Use quoted CSV field to represent carriage return."))); /* If reach here, we have found the line terminator */ @@ -1401,15 +1401,15 @@ CopyReadLineText(CopyFromState cstate) } /* Process \n */ - if (c == '\n' && (!cstate->opts.csv_mode || !in_quote)) + if (c == '\n' && (!is_csv || !in_quote)) { if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL) ereport(ERROR, (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - !cstate->opts.csv_mode ? + !is_csv ? errmsg("literal newline found in data") : errmsg("unquoted newline found in data"), - !cstate->opts.csv_mode ? + !is_csv ? errhint("Use \"\\n\" to represent newline.") : errhint("Use quoted CSV field to represent newline."))); cstate->eol_type = EOL_NL; /* in case not set yet */ @@ -1421,7 +1421,7 @@ CopyReadLineText(CopyFromState cstate) * In CSV mode, we only recognize \. alone on a line. This is because * \. is a valid CSV data value. */ - if (c == '\\' && (!cstate->opts.csv_mode || first_char_in_line)) + if (c == '\\' && (!is_csv || first_char_in_line)) { char c2; @@ -1454,7 +1454,7 @@ CopyReadLineText(CopyFromState cstate) if (c2 == '\n') { - if (!cstate->opts.csv_mode) + if (!is_csv) ereport(ERROR, (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), errmsg("end-of-copy marker does not match previous newline style"))); @@ -1463,7 +1463,7 @@ CopyReadLineText(CopyFromState cstate) } else if (c2 != '\r') { - if (!cstate->opts.csv_mode) + if (!is_csv) ereport(ERROR, (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), errmsg("end-of-copy marker corrupt"))); @@ -1479,7 +1479,7 @@ CopyReadLineText(CopyFromState cstate) if (c2 != '\r' && c2 != '\n') { - if (!cstate->opts.csv_mode) + if (!is_csv) ereport(ERROR, (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), errmsg("end-of-copy marker corrupt"))); @@ -1508,7 +1508,7 @@ CopyReadLineText(CopyFromState cstate) result = true; /* report EOF */ break; } - else if (!cstate->opts.csv_mode) + else if (!is_csv) { /* * If we are here, it means we found a backslash followed by ---- > In this case, I have been able to limit the effects of the per-row > callback by making NextCopyFromRawFields() local to copyfromparse.c > while applying some inlining to it. This brings me to a different > point, why don't we do this change independently on HEAD? Does this mean that changing bool NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields) to (adding "static") static bool NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields) not (adding "static" and "bool is_csv") static bool NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields, bool is_csv) improves performance? If so, adding the change independently on HEAD makes sense. But I don't know why that improves performance... Inlining? > It's not > really complicated to make NextCopyFromRawFields show high in the > profiles. I was looking at external projects, and noticed that > there's nothing calling NextCopyFromRawFields() directly. It means that we can hide NextCopyFromRawFields() without breaking compatibility (because nobody uses it), right? If so, I also think that we can change NextCopyFromRawFields() directly. If we assume that someone (not public code) may use it, we can create a new internal function and use it something like: ---- diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index 7cacd0b752..b1515ead82 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -751,8 +751,8 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes) * * NOTE: force_not_null option are not applied to the returned fields. */ -bool -NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields) +static bool +NextCopyFromRawFieldsInternal(CopyFromState cstate, char ***fields, int *nfields) { int fldct; bool done; @@ -840,6 +840,12 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields) return true; } +bool +NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields) +{ + return NextCopyFromRawFieldsInternal(cstate, fields, nfields); +} + /* * Read next tuple from file for COPY FROM. Return false if no more tuples. * ---- Thanks, -- kou
On Thu, Feb 22, 2024 at 06:39:48PM +0900, Sutou Kouhei wrote: > If so, adding the change independently on HEAD makes > sense. But I don't know why that improves > performance... Inlining? I guess so. It does not make much of a difference, though. The thing is that the dispatch caused by the custom callbacks called for each row is noticeable in any profiles I'm taking (not that much in the worst-case scenarios, still a few percents), meaning that this impacts the performance for all the in-core formats (text, csv, binary) as long as we refactor text/csv/binary to use the routines of copyapi.h. I don't really see a way forward, except if we don't dispatch the in-core formats to not impact the default cases. That makes the code a bit less elegant, but equally efficient for the existing formats. -- Michael
Вложения
Hi,
In <20240222.183948.518018047578925034.kou@clear-code.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Thu, 22 Feb 2024 18:39:48 +0900 (JST),
  Sutou Kouhei <kou@clear-code.com> wrote:
> How about adding "is_csv" to CopyReadline() and
> CopyReadLineText() too?
I tried this on my environment. This is a change for COPY
FROM not COPY TO but this decreases COPY TO
performance with [1]... Hmm...
master:   697.693 msec (the best case)
v15:      576.374 msec (the best case)
v15+this: 593.559 msec (the best case)
[1] COPY (SELECT
1::int2,2::int2,3::int2,4::int2,5::int2,6::int2,7::int2,8::int2,9::int2,10::int2,11::int2,12::int2,13::int2,14::int2,15::int2,16::int2,17::int2,18::int2,19::int2,20::int2,
generate_series(1,1000000::int4)) TO '/dev/null' \watch c=15
 
So I think that v15 is good.
perf result of master:
# Children      Self  Command   Shared Object      Symbol                                   
# ........  ........  ........  .................  .........................................
#
    31.39%    14.54%  postgres  postgres           [.] CopyOneRowTo
            |--17.00%--CopyOneRowTo
            |          |--10.61%--FunctionCall1Coll
            |          |           --8.40%--int2out
            |          |                     |--2.58%--pg_ltoa
            |          |                     |           --0.68%--pg_ultoa_n
            |          |                     |--1.11%--pg_ultoa_n
            |          |                     |--0.83%--AllocSetAlloc
            |          |                     |--0.69%--__memcpy_avx_unaligned_erms (inlined)
            |          |                     |--0.58%--FunctionCall1Coll
            |          |                      --0.55%--memcpy@plt
            |          |--3.25%--appendBinaryStringInfo
            |          |           --0.56%--pg_ultoa_n
            |           --0.69%--CopyAttributeOutText
perf result of v15:
# Children      Self  Command   Shared Object      Symbol                                   
# ........  ........  ........  .................  .........................................
#
    25.60%    10.47%  postgres  postgres           [.] CopyToTextOneRow
            |--15.39%--CopyToTextOneRow
            |          |--10.44%--FunctionCall1Coll
            |          |          |--7.25%--int2out
            |          |          |          |--2.60%--pg_ltoa
            |          |          |          |           --0.71%--pg_ultoa_n
            |          |          |          |--0.90%--FunctionCall1Coll
            |          |          |          |--0.84%--pg_ultoa_n
            |          |          |           --0.66%--AllocSetAlloc
            |          |          |--0.79%--ExecProjectSet
            |          |           --0.68%--int4out
            |          |--2.50%--appendBinaryStringInfo
            |           --0.53%--CopyAttributeOutText
The profiles on Michael's environment [2] showed that
CopyOneRow() % was increased by v15. But it
(CopyToTextOneRow() % not CopyOneRow() %) wasn't increased
by v15. It's decreased instead.
[2] https://www.postgresql.org/message-id/flat/ZdbtQJ-p5H1_EDwE%40paquier.xyz#6439e6ad574f2d47cd7220e9bfed3889
So I think that v15 doesn't have performance regression but
my environment isn't suitable for benchmark...
Thanks,
-- 
kou
			
		Hi, In <ZeFoOprWyKU6gpkP@paquier.xyz> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 1 Mar 2024 14:31:38 +0900, Michael Paquier <michael@paquier.xyz> wrote: > I guess so. It does not make much of a difference, though. The thing > is that the dispatch caused by the custom callbacks called for each > row is noticeable in any profiles I'm taking (not that much in the > worst-case scenarios, still a few percents), meaning that this impacts > the performance for all the in-core formats (text, csv, binary) as > long as we refactor text/csv/binary to use the routines of copyapi.h. > I don't really see a way forward, except if we don't dispatch the > in-core formats to not impact the default cases. That makes the code > a bit less elegant, but equally efficient for the existing formats. It's an option based on your profile result but your execution result also shows that v15 is faster than HEAD [1]: > I am getting faster runtimes with v15 (6232ms in average) > vs HEAD (6550ms) at 5M rows with COPY TO [1] https://www.postgresql.org/message-id/flat/ZdbtQJ-p5H1_EDwE%40paquier.xyz#6439e6ad574f2d47cd7220e9bfed3889 I think that faster runtime is beneficial than mysterious profile for users. So I think that we can merge v15 to master. Thanks, -- kou
Hi, In <20240301.154443.618034282613922707.kou@clear-code.com> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 01 Mar 2024 15:44:43 +0900 (JST), Sutou Kouhei <kou@clear-code.com> wrote: >> I guess so. It does not make much of a difference, though. The thing >> is that the dispatch caused by the custom callbacks called for each >> row is noticeable in any profiles I'm taking (not that much in the >> worst-case scenarios, still a few percents), meaning that this impacts >> the performance for all the in-core formats (text, csv, binary) as >> long as we refactor text/csv/binary to use the routines of copyapi.h. >> I don't really see a way forward, except if we don't dispatch the >> in-core formats to not impact the default cases. That makes the code >> a bit less elegant, but equally efficient for the existing formats. > > It's an option based on your profile result but your > execution result also shows that v15 is faster than HEAD [1]: > >> I am getting faster runtimes with v15 (6232ms in average) >> vs HEAD (6550ms) at 5M rows with COPY TO > > [1] https://www.postgresql.org/message-id/flat/ZdbtQJ-p5H1_EDwE%40paquier.xyz#6439e6ad574f2d47cd7220e9bfed3889 > > I think that faster runtime is beneficial than mysterious > profile for users. So I think that we can merge v15 to > master. If this is a blocker of making COPY format extendable, can we defer moving the existing text/csv/binary format implementations to Copy{From,To}Routine for now as Michael suggested to proceed making COPY format extendable? (Can we add Copy{From,To}Routine without changing the existing text/csv/binary format implementations?) I attach a patch for it. There is a large hunk for CopyOneRowTo() that is caused by indent change. I also attach "...-w.patch" that uses "git -w" to remove space only changes. "...-w.patch" is only for review. We should use .patch without -w for push. Thanks, -- kou From 6a5bfc8e104f0a339b421028e9fec69a4d092671 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Mon, 4 Mar 2024 13:52:34 +0900 Subject: [PATCH v16] Add CopyFromRoutine/CopyToRountine They are for implementing custom COPY FROM/TO format. But this is not enough to implement custom COPY FROM/TO format yet. We'll export some APIs to receive/send data and add "format" option to COPY FROM/TO later. Existing text/csv/binary format implementations don't use CopyFromRoutine/CopyToRoutine for now. We have a patch for it but we defer it. Because there are some mysterious profile results in spite of we get faster runtimes. See [1] for details. [1] https://www.postgresql.org/message-id/ZdbtQJ-p5H1_EDwE%40paquier.xyz Note that this doesn't change existing text/csv/binary format implementations. There are many diffs for CopyOneRowTo() but they're caused by indentation. They don't change implementations. --- src/backend/commands/copyfrom.c | 24 +++++- src/backend/commands/copyfromparse.c | 5 ++ src/backend/commands/copyto.c | 103 ++++++++++++++--------- src/include/commands/copyapi.h | 100 ++++++++++++++++++++++ src/include/commands/copyfrom_internal.h | 4 + src/tools/pgindent/typedefs.list | 2 + 6 files changed, 193 insertions(+), 45 deletions(-) create mode 100644 src/include/commands/copyapi.h diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index c3bc897028..9bf2f6497e 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -1623,12 +1623,22 @@ BeginCopyFrom(ParseState *pstate, /* Fetch the input function and typioparam info */ if (cstate->opts.binary) + { getTypeBinaryInputInfo(att->atttypid, &in_func_oid, &typioparams[attnum - 1]); + fmgr_info(in_func_oid, &in_functions[attnum - 1]); + } + else if (cstate->routine) + cstate->routine->CopyFromInFunc(cstate, att->atttypid, + &in_functions[attnum - 1], + &typioparams[attnum - 1]); + else + { getTypeInputInfo(att->atttypid, &in_func_oid, &typioparams[attnum - 1]); - fmgr_info(in_func_oid, &in_functions[attnum - 1]); + fmgr_info(in_func_oid, &in_functions[attnum - 1]); + } /* Get default info if available */ defexprs[attnum - 1] = NULL; @@ -1768,10 +1778,13 @@ BeginCopyFrom(ParseState *pstate, /* Read and verify binary header */ ReceiveCopyBinaryHeader(cstate); } - - /* create workspace for CopyReadAttributes results */ - if (!cstate->opts.binary) + else if (cstate->routine) { + cstate->routine->CopyFromStart(cstate, tupDesc); + } + else + { + /* create workspace for CopyReadAttributes results */ AttrNumber attr_count = list_length(cstate->attnumlist); cstate->max_fields = attr_count; @@ -1789,6 +1802,9 @@ BeginCopyFrom(ParseState *pstate, void EndCopyFrom(CopyFromState cstate) { + if (cstate->routine) + cstate->routine->CopyFromEnd(cstate); + /* No COPY FROM related resources except memory. */ if (cstate->is_program) { diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index 7cacd0b752..8b15080585 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -978,6 +978,11 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext, Assert(fieldno == attr_count); } + else if (cstate->routine) + { + if (!cstate->routine->CopyFromOneRow(cstate, econtext, values, nulls)) + return false; + } else { /* binary */ diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c index 20ffc90363..6080627c83 100644 --- a/src/backend/commands/copyto.c +++ b/src/backend/commands/copyto.c @@ -24,6 +24,7 @@ #include "access/xact.h" #include "access/xlog.h" #include "commands/copy.h" +#include "commands/copyapi.h" #include "commands/progress.h" #include "executor/execdesc.h" #include "executor/executor.h" @@ -71,6 +72,9 @@ typedef enum CopyDest */ typedef struct CopyToStateData { + /* format routine */ + const CopyToRoutine *routine; + /* low-level state data */ CopyDest copy_dest; /* type of copy source/destination */ FILE *copy_file; /* used if copy_dest == COPY_FILE */ @@ -777,14 +781,22 @@ DoCopyTo(CopyToState cstate) Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1); if (cstate->opts.binary) + { getTypeBinaryOutputInfo(attr->atttypid, &out_func_oid, &isvarlena); + fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]); + } + else if (cstate->routine) + cstate->routine->CopyToOutFunc(cstate, attr->atttypid, + &cstate->out_functions[attnum - 1]); else + { getTypeOutputInfo(attr->atttypid, &out_func_oid, &isvarlena); - fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]); + fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]); + } } /* @@ -811,6 +823,8 @@ DoCopyTo(CopyToState cstate) tmp = 0; CopySendInt32(cstate, tmp); } + else if (cstate->routine) + cstate->routine->CopyToStart(cstate, tupDesc); else { /* @@ -892,6 +906,8 @@ DoCopyTo(CopyToState cstate) /* Need to flush out the trailer */ CopySendEndOfRow(cstate); } + else if (cstate->routine) + cstate->routine->CopyToEnd(cstate); MemoryContextDelete(cstate->rowcontext); @@ -916,61 +932,66 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot) MemoryContextReset(cstate->rowcontext); oldcontext = MemoryContextSwitchTo(cstate->rowcontext); - if (cstate->opts.binary) + if (cstate->routine) + cstate->routine->CopyToOneRow(cstate, slot); + else { - /* Binary per-tuple header */ - CopySendInt16(cstate, list_length(cstate->attnumlist)); - } - - /* Make sure the tuple is fully deconstructed */ - slot_getallattrs(slot); - - foreach(cur, cstate->attnumlist) - { - int attnum = lfirst_int(cur); - Datum value = slot->tts_values[attnum - 1]; - bool isnull = slot->tts_isnull[attnum - 1]; - - if (!cstate->opts.binary) + if (cstate->opts.binary) { - if (need_delim) - CopySendChar(cstate, cstate->opts.delim[0]); - need_delim = true; + /* Binary per-tuple header */ + CopySendInt16(cstate, list_length(cstate->attnumlist)); } - if (isnull) - { - if (!cstate->opts.binary) - CopySendString(cstate, cstate->opts.null_print_client); - else - CopySendInt32(cstate, -1); - } - else + /* Make sure the tuple is fully deconstructed */ + slot_getallattrs(slot); + + foreach(cur, cstate->attnumlist) { + int attnum = lfirst_int(cur); + Datum value = slot->tts_values[attnum - 1]; + bool isnull = slot->tts_isnull[attnum - 1]; + if (!cstate->opts.binary) { - string = OutputFunctionCall(&out_functions[attnum - 1], - value); - if (cstate->opts.csv_mode) - CopyAttributeOutCSV(cstate, string, - cstate->opts.force_quote_flags[attnum - 1]); + if (need_delim) + CopySendChar(cstate, cstate->opts.delim[0]); + need_delim = true; + } + + if (isnull) + { + if (!cstate->opts.binary) + CopySendString(cstate, cstate->opts.null_print_client); else - CopyAttributeOutText(cstate, string); + CopySendInt32(cstate, -1); } else { - bytea *outputbytes; + if (!cstate->opts.binary) + { + string = OutputFunctionCall(&out_functions[attnum - 1], + value); + if (cstate->opts.csv_mode) + CopyAttributeOutCSV(cstate, string, + cstate->opts.force_quote_flags[attnum - 1]); + else + CopyAttributeOutText(cstate, string); + } + else + { + bytea *outputbytes; - outputbytes = SendFunctionCall(&out_functions[attnum - 1], - value); - CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ); - CopySendData(cstate, VARDATA(outputbytes), - VARSIZE(outputbytes) - VARHDRSZ); + outputbytes = SendFunctionCall(&out_functions[attnum - 1], + value); + CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ); + CopySendData(cstate, VARDATA(outputbytes), + VARSIZE(outputbytes) - VARHDRSZ); + } } } - } - CopySendEndOfRow(cstate); + CopySendEndOfRow(cstate); + } MemoryContextSwitchTo(oldcontext); } diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h new file mode 100644 index 0000000000..635c4cbff2 --- /dev/null +++ b/src/include/commands/copyapi.h @@ -0,0 +1,100 @@ +/*------------------------------------------------------------------------- + * + * copyapi.h + * API for COPY TO/FROM handlers + * + * + * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + * src/include/commands/copyapi.h + * + *------------------------------------------------------------------------- + */ +#ifndef COPYAPI_H +#define COPYAPI_H + +#include "executor/tuptable.h" +#include "nodes/execnodes.h" + +/* These are private in commands/copy[from|to].c */ +typedef struct CopyFromStateData *CopyFromState; +typedef struct CopyToStateData *CopyToState; + +/* + * API structure for a COPY FROM format implementation. Note this must be + * allocated in a server-lifetime manner, typically as a static const struct. + */ +typedef struct CopyFromRoutine +{ + /* + * Called when COPY FROM is started to set up the input functions + * associated to the relation's attributes writing to. `finfo` can be + * optionally filled to provide the catalog information of the input + * function. `typioparam` can be optionally filled to define the OID of + * the type to pass to the input function. `atttypid` is the OID of data + * type used by the relation's attribute. + */ + void (*CopyFromInFunc) (CopyFromState cstate, Oid atttypid, + FmgrInfo *finfo, Oid *typioparam); + + /* + * Called when COPY FROM is started. + * + * `tupDesc` is the tuple descriptor of the relation where the data needs + * to be copied. This can be used for any initialization steps required + * by a format. + */ + void (*CopyFromStart) (CopyFromState cstate, TupleDesc tupDesc); + + /* + * Copy one row to a set of `values` and `nulls` of size tupDesc->natts. + * + * 'econtext' is used to evaluate default expression for each column that + * is either not read from the file or is using the DEFAULT option of COPY + * FROM. It is NULL if no default values are used. + * + * Returns false if there are no more tuples to copy. + */ + bool (*CopyFromOneRow) (CopyFromState cstate, ExprContext *econtext, + Datum *values, bool *nulls); + + /* Called when COPY FROM has ended. */ + void (*CopyFromEnd) (CopyFromState cstate); +} CopyFromRoutine; + +/* + * API structure for a COPY TO format implementation. Note this must be + * allocated in a server-lifetime manner, typically as a static const struct. + */ +typedef struct CopyToRoutine +{ + /* + * Called when COPY TO is started to set up the output functions + * associated to the relation's attributes reading from. `finfo` can be + * optionally filled. `atttypid` is the OID of data type used by the + * relation's attribute. + */ + void (*CopyToOutFunc) (CopyToState cstate, Oid atttypid, + FmgrInfo *finfo); + + /* + * Called when COPY TO is started. + * + * `tupDesc` is the tuple descriptor of the relation from where the data + * is read. + */ + void (*CopyToStart) (CopyToState cstate, TupleDesc tupDesc); + + /* + * Copy one row for COPY TO. + * + * `slot` is the tuple slot where the data is emitted. + */ + void (*CopyToOneRow) (CopyToState cstate, TupleTableSlot *slot); + + /* Called when COPY TO has ended */ + void (*CopyToEnd) (CopyToState cstate); +} CopyToRoutine; + +#endif /* COPYAPI_H */ diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h index cad52fcc78..509b9e92a1 100644 --- a/src/include/commands/copyfrom_internal.h +++ b/src/include/commands/copyfrom_internal.h @@ -15,6 +15,7 @@ #define COPYFROM_INTERNAL_H #include "commands/copy.h" +#include "commands/copyapi.h" #include "commands/trigger.h" #include "nodes/miscnodes.h" @@ -58,6 +59,9 @@ typedef enum CopyInsertMethod */ typedef struct CopyFromStateData { + /* format routine */ + const CopyFromRoutine *routine; + /* low-level state data */ CopySource copy_src; /* type of copy source */ FILE *copy_file; /* used if copy_src == COPY_FILE */ diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list index ee40a341d3..a5ae161ca5 100644 --- a/src/tools/pgindent/typedefs.list +++ b/src/tools/pgindent/typedefs.list @@ -475,6 +475,7 @@ ConvertRowtypeExpr CookedConstraint CopyDest CopyFormatOptions +CopyFromRoutine CopyFromState CopyFromStateData CopyHeaderChoice @@ -484,6 +485,7 @@ CopyMultiInsertInfo CopyOnErrorChoice CopySource CopyStmt +CopyToRoutine CopyToState CopyToStateData Cost -- 2.43.0 From 6a5bfc8e104f0a339b421028e9fec69a4d092671 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Mon, 4 Mar 2024 13:52:34 +0900 Subject: [PATCH v16] Add CopyFromRoutine/CopyToRountine They are for implementing custom COPY FROM/TO format. But this is not enough to implement custom COPY FROM/TO format yet. We'll export some APIs to receive/send data and add "format" option to COPY FROM/TO later. Existing text/csv/binary format implementations don't use CopyFromRoutine/CopyToRoutine for now. We have a patch for it but we defer it. Because there are some mysterious profile results in spite of we get faster runtimes. See [1] for details. [1] https://www.postgresql.org/message-id/ZdbtQJ-p5H1_EDwE%40paquier.xyz Note that this doesn't change existing text/csv/binary format implementations. There are many diffs for CopyOneRowTo() but they're caused by indentation. They don't change implementations. --- src/backend/commands/copyfrom.c | 22 ++++- src/backend/commands/copyfromparse.c | 5 ++ src/backend/commands/copyto.c | 21 +++++ src/include/commands/copyapi.h | 100 +++++++++++++++++++++++ src/include/commands/copyfrom_internal.h | 4 + src/tools/pgindent/typedefs.list | 2 + 6 files changed, 151 insertions(+), 3 deletions(-) create mode 100644 src/include/commands/copyapi.h diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index c3bc897028..9bf2f6497e 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -1623,12 +1623,22 @@ BeginCopyFrom(ParseState *pstate, /* Fetch the input function and typioparam info */ if (cstate->opts.binary) + { getTypeBinaryInputInfo(att->atttypid, &in_func_oid, &typioparams[attnum - 1]); + fmgr_info(in_func_oid, &in_functions[attnum - 1]); + } + else if (cstate->routine) + cstate->routine->CopyFromInFunc(cstate, att->atttypid, + &in_functions[attnum - 1], + &typioparams[attnum - 1]); + else + { getTypeInputInfo(att->atttypid, &in_func_oid, &typioparams[attnum - 1]); fmgr_info(in_func_oid, &in_functions[attnum - 1]); + } /* Get default info if available */ defexprs[attnum - 1] = NULL; @@ -1768,10 +1778,13 @@ BeginCopyFrom(ParseState *pstate, /* Read and verify binary header */ ReceiveCopyBinaryHeader(cstate); } - - /* create workspace for CopyReadAttributes results */ - if (!cstate->opts.binary) + else if (cstate->routine) { + cstate->routine->CopyFromStart(cstate, tupDesc); + } + else + { + /* create workspace for CopyReadAttributes results */ AttrNumber attr_count = list_length(cstate->attnumlist); cstate->max_fields = attr_count; @@ -1789,6 +1802,9 @@ BeginCopyFrom(ParseState *pstate, void EndCopyFrom(CopyFromState cstate) { + if (cstate->routine) + cstate->routine->CopyFromEnd(cstate); + /* No COPY FROM related resources except memory. */ if (cstate->is_program) { diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index 7cacd0b752..8b15080585 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -978,6 +978,11 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext, Assert(fieldno == attr_count); } + else if (cstate->routine) + { + if (!cstate->routine->CopyFromOneRow(cstate, econtext, values, nulls)) + return false; + } else { /* binary */ diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c index 20ffc90363..6080627c83 100644 --- a/src/backend/commands/copyto.c +++ b/src/backend/commands/copyto.c @@ -24,6 +24,7 @@ #include "access/xact.h" #include "access/xlog.h" #include "commands/copy.h" +#include "commands/copyapi.h" #include "commands/progress.h" #include "executor/execdesc.h" #include "executor/executor.h" @@ -71,6 +72,9 @@ typedef enum CopyDest */ typedef struct CopyToStateData { + /* format routine */ + const CopyToRoutine *routine; + /* low-level state data */ CopyDest copy_dest; /* type of copy source/destination */ FILE *copy_file; /* used if copy_dest == COPY_FILE */ @@ -777,15 +781,23 @@ DoCopyTo(CopyToState cstate) Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1); if (cstate->opts.binary) + { getTypeBinaryOutputInfo(attr->atttypid, &out_func_oid, &isvarlena); + fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]); + } + else if (cstate->routine) + cstate->routine->CopyToOutFunc(cstate, attr->atttypid, + &cstate->out_functions[attnum - 1]); else + { getTypeOutputInfo(attr->atttypid, &out_func_oid, &isvarlena); fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]); } + } /* * Create a temporary memory context that we can reset once per row to @@ -811,6 +823,8 @@ DoCopyTo(CopyToState cstate) tmp = 0; CopySendInt32(cstate, tmp); } + else if (cstate->routine) + cstate->routine->CopyToStart(cstate, tupDesc); else { /* @@ -892,6 +906,8 @@ DoCopyTo(CopyToState cstate) /* Need to flush out the trailer */ CopySendEndOfRow(cstate); } + else if (cstate->routine) + cstate->routine->CopyToEnd(cstate); MemoryContextDelete(cstate->rowcontext); @@ -916,6 +932,10 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot) MemoryContextReset(cstate->rowcontext); oldcontext = MemoryContextSwitchTo(cstate->rowcontext); + if (cstate->routine) + cstate->routine->CopyToOneRow(cstate, slot); + else + { if (cstate->opts.binary) { /* Binary per-tuple header */ @@ -971,6 +991,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot) } CopySendEndOfRow(cstate); + } MemoryContextSwitchTo(oldcontext); } diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h new file mode 100644 index 0000000000..635c4cbff2 --- /dev/null +++ b/src/include/commands/copyapi.h @@ -0,0 +1,100 @@ +/*------------------------------------------------------------------------- + * + * copyapi.h + * API for COPY TO/FROM handlers + * + * + * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + * src/include/commands/copyapi.h + * + *------------------------------------------------------------------------- + */ +#ifndef COPYAPI_H +#define COPYAPI_H + +#include "executor/tuptable.h" +#include "nodes/execnodes.h" + +/* These are private in commands/copy[from|to].c */ +typedef struct CopyFromStateData *CopyFromState; +typedef struct CopyToStateData *CopyToState; + +/* + * API structure for a COPY FROM format implementation. Note this must be + * allocated in a server-lifetime manner, typically as a static const struct. + */ +typedef struct CopyFromRoutine +{ + /* + * Called when COPY FROM is started to set up the input functions + * associated to the relation's attributes writing to. `finfo` can be + * optionally filled to provide the catalog information of the input + * function. `typioparam` can be optionally filled to define the OID of + * the type to pass to the input function. `atttypid` is the OID of data + * type used by the relation's attribute. + */ + void (*CopyFromInFunc) (CopyFromState cstate, Oid atttypid, + FmgrInfo *finfo, Oid *typioparam); + + /* + * Called when COPY FROM is started. + * + * `tupDesc` is the tuple descriptor of the relation where the data needs + * to be copied. This can be used for any initialization steps required + * by a format. + */ + void (*CopyFromStart) (CopyFromState cstate, TupleDesc tupDesc); + + /* + * Copy one row to a set of `values` and `nulls` of size tupDesc->natts. + * + * 'econtext' is used to evaluate default expression for each column that + * is either not read from the file or is using the DEFAULT option of COPY + * FROM. It is NULL if no default values are used. + * + * Returns false if there are no more tuples to copy. + */ + bool (*CopyFromOneRow) (CopyFromState cstate, ExprContext *econtext, + Datum *values, bool *nulls); + + /* Called when COPY FROM has ended. */ + void (*CopyFromEnd) (CopyFromState cstate); +} CopyFromRoutine; + +/* + * API structure for a COPY TO format implementation. Note this must be + * allocated in a server-lifetime manner, typically as a static const struct. + */ +typedef struct CopyToRoutine +{ + /* + * Called when COPY TO is started to set up the output functions + * associated to the relation's attributes reading from. `finfo` can be + * optionally filled. `atttypid` is the OID of data type used by the + * relation's attribute. + */ + void (*CopyToOutFunc) (CopyToState cstate, Oid atttypid, + FmgrInfo *finfo); + + /* + * Called when COPY TO is started. + * + * `tupDesc` is the tuple descriptor of the relation from where the data + * is read. + */ + void (*CopyToStart) (CopyToState cstate, TupleDesc tupDesc); + + /* + * Copy one row for COPY TO. + * + * `slot` is the tuple slot where the data is emitted. + */ + void (*CopyToOneRow) (CopyToState cstate, TupleTableSlot *slot); + + /* Called when COPY TO has ended */ + void (*CopyToEnd) (CopyToState cstate); +} CopyToRoutine; + +#endif /* COPYAPI_H */ diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h index cad52fcc78..509b9e92a1 100644 --- a/src/include/commands/copyfrom_internal.h +++ b/src/include/commands/copyfrom_internal.h @@ -15,6 +15,7 @@ #define COPYFROM_INTERNAL_H #include "commands/copy.h" +#include "commands/copyapi.h" #include "commands/trigger.h" #include "nodes/miscnodes.h" @@ -58,6 +59,9 @@ typedef enum CopyInsertMethod */ typedef struct CopyFromStateData { + /* format routine */ + const CopyFromRoutine *routine; + /* low-level state data */ CopySource copy_src; /* type of copy source */ FILE *copy_file; /* used if copy_src == COPY_FILE */ diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list index ee40a341d3..a5ae161ca5 100644 --- a/src/tools/pgindent/typedefs.list +++ b/src/tools/pgindent/typedefs.list @@ -475,6 +475,7 @@ ConvertRowtypeExpr CookedConstraint CopyDest CopyFormatOptions +CopyFromRoutine CopyFromState CopyFromStateData CopyHeaderChoice @@ -484,6 +485,7 @@ CopyMultiInsertInfo CopyOnErrorChoice CopySource CopyStmt +CopyToRoutine CopyToState CopyToStateData Cost -- 2.43.0
On Mon, Mar 04, 2024 at 02:11:08PM +0900, Sutou Kouhei wrote:
> If this is a blocker of making COPY format extendable, can
> we defer moving the existing text/csv/binary format
> implementations to Copy{From,To}Routine for now as Michael
> suggested to proceed making COPY format extendable? (Can we
> add Copy{From,To}Routine without changing the existing
> text/csv/binary format implementations?)
Yeah, I assume that it would be the way to go so as we don't do any
dispatching in default cases.  A different approach that could be done
is to hide some of the parts of binary and text/csv in inline static
functions that are equivalent to the routine callbacks.  That's
similar to the previous versions of the patch set, but if we come back
to the argument that there is a risk of blocking optimizations of more
of the local areas of the per-row processing in NextCopyFrom() and
CopyOneRowTo(), what you have sounds like a good balance.
CopyOneRowTo() could do something like that to avoid the extra
indentation:
if (cstate->routine)
{
    cstate->routine->CopyToOneRow(cstate, slot);
    MemoryContextSwitchTo(oldcontext);
    return;
}
NextCopyFrom() does not need to be concerned by that.
> I attach a patch for it.
> There is a large hunk for CopyOneRowTo() that is caused by
> indent change. I also attach "...-w.patch" that uses "git
> -w" to remove space only changes. "...-w.patch" is only for
> review. We should use .patch without -w for push.
I didn't know this trick.  That's indeed nice..  I may use that for
other stuff to make patches more presentable to the eyes.  And that's
available as well with `git diff`.
If we basically agree about this part, how would the rest work out
with this set of APIs and the possibility to plug in a custom value
for FORMAT to do a pg_proc lookup, including an example of how these
APIs can be used?
--
Michael
			
		Вложения
Hi,
In <Zea4wXxpYaX64e_p@paquier.xyz>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Tue, 5 Mar 2024 15:16:33 +0900,
  Michael Paquier <michael@paquier.xyz> wrote:
> CopyOneRowTo() could do something like that to avoid the extra
> indentation:
> if (cstate->routine)
> {
>     cstate->routine->CopyToOneRow(cstate, slot);
>     MemoryContextSwitchTo(oldcontext);
>     return;
> }
OK. The v17 patch uses this style. Others are same as the
v16.
> I didn't know this trick.  That's indeed nice..  I may use that for
> other stuff to make patches more presentable to the eyes.  And that's
> available as well with `git diff`.
:-)
> If we basically agree about this part, how would the rest work out
> with this set of APIs and the possibility to plug in a custom value
> for FORMAT to do a pg_proc lookup, including an example of how these
> APIs can be used?
I'll send the following patches after this patch is
merged. They are based on the v6 patch[1]:
1. Add copy_handler
   * This also adds a pg_proc lookup for custom FORMAT
   * This also adds a test for copy_handler
2. Export CopyToStateData
   * We need it to implement custom copy TO handler
3. Add needed APIs to implement custom copy TO handler
   * Add CopyToStateData::opaque
   * Export CopySendEndOfRow()
4. Export CopyFromStateData
   * We need it to implement custom copy FROM handler
5. Add needed APIs to implement custom copy FROM handler
   * Add CopyFromStateData::opaque
   * Export CopyReadBinaryData()
[1]
https://www.postgresql.org/message-id/flat/20240124.144936.67229716500876806.kou%40clear-code.com#f1ad092fc5e81fe38d3c376559efd52c
Thanks,
-- 
kou
From a78b8ee88575e2c2873afc3acf3c8c4e535becf0 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Mon, 4 Mar 2024 13:52:34 +0900
Subject: [PATCH v17] Add CopyFromRoutine/CopyToRountine
They are for implementing custom COPY FROM/TO format. But this is not
enough to implement custom COPY FROM/TO format yet. We'll export some
APIs to receive/send data and add "format" option to COPY FROM/TO
later.
Existing text/csv/binary format implementations don't use
CopyFromRoutine/CopyToRoutine for now. We have a patch for it but we
defer it. Because there are some mysterious profile results in spite
of we get faster runtimes. See [1] for details.
[1] https://www.postgresql.org/message-id/ZdbtQJ-p5H1_EDwE%40paquier.xyz
Note that this doesn't change existing text/csv/binary format
implementations.
---
 src/backend/commands/copyfrom.c          |  24 +++++-
 src/backend/commands/copyfromparse.c     |   5 ++
 src/backend/commands/copyto.c            |  25 +++++-
 src/include/commands/copyapi.h           | 100 +++++++++++++++++++++++
 src/include/commands/copyfrom_internal.h |   4 +
 src/tools/pgindent/typedefs.list         |   2 +
 6 files changed, 155 insertions(+), 5 deletions(-)
 create mode 100644 src/include/commands/copyapi.h
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index c3bc897028..9bf2f6497e 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1623,12 +1623,22 @@ BeginCopyFrom(ParseState *pstate,
 
         /* Fetch the input function and typioparam info */
         if (cstate->opts.binary)
+        {
             getTypeBinaryInputInfo(att->atttypid,
                                    &in_func_oid, &typioparams[attnum - 1]);
+            fmgr_info(in_func_oid, &in_functions[attnum - 1]);
+        }
+        else if (cstate->routine)
+            cstate->routine->CopyFromInFunc(cstate, att->atttypid,
+                                            &in_functions[attnum - 1],
+                                            &typioparams[attnum - 1]);
+
         else
+        {
             getTypeInputInfo(att->atttypid,
                              &in_func_oid, &typioparams[attnum - 1]);
-        fmgr_info(in_func_oid, &in_functions[attnum - 1]);
+            fmgr_info(in_func_oid, &in_functions[attnum - 1]);
+        }
 
         /* Get default info if available */
         defexprs[attnum - 1] = NULL;
@@ -1768,10 +1778,13 @@ BeginCopyFrom(ParseState *pstate,
         /* Read and verify binary header */
         ReceiveCopyBinaryHeader(cstate);
     }
-
-    /* create workspace for CopyReadAttributes results */
-    if (!cstate->opts.binary)
+    else if (cstate->routine)
     {
+        cstate->routine->CopyFromStart(cstate, tupDesc);
+    }
+    else
+    {
+        /* create workspace for CopyReadAttributes results */
         AttrNumber    attr_count = list_length(cstate->attnumlist);
 
         cstate->max_fields = attr_count;
@@ -1789,6 +1802,9 @@ BeginCopyFrom(ParseState *pstate,
 void
 EndCopyFrom(CopyFromState cstate)
 {
+    if (cstate->routine)
+        cstate->routine->CopyFromEnd(cstate);
+
     /* No COPY FROM related resources except memory. */
     if (cstate->is_program)
     {
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 7cacd0b752..8b15080585 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -978,6 +978,11 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 
         Assert(fieldno == attr_count);
     }
+    else if (cstate->routine)
+    {
+        if (!cstate->routine->CopyFromOneRow(cstate, econtext, values, nulls))
+            return false;
+    }
     else
     {
         /* binary */
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 20ffc90363..b4a7c9c8b9 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -24,6 +24,7 @@
 #include "access/xact.h"
 #include "access/xlog.h"
 #include "commands/copy.h"
+#include "commands/copyapi.h"
 #include "commands/progress.h"
 #include "executor/execdesc.h"
 #include "executor/executor.h"
@@ -71,6 +72,9 @@ typedef enum CopyDest
  */
 typedef struct CopyToStateData
 {
+    /* format routine */
+    const CopyToRoutine *routine;
+
     /* low-level state data */
     CopyDest    copy_dest;        /* type of copy source/destination */
     FILE       *copy_file;        /* used if copy_dest == COPY_FILE */
@@ -777,14 +781,22 @@ DoCopyTo(CopyToState cstate)
         Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
 
         if (cstate->opts.binary)
+        {
             getTypeBinaryOutputInfo(attr->atttypid,
                                     &out_func_oid,
                                     &isvarlena);
+            fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
+        }
+        else if (cstate->routine)
+            cstate->routine->CopyToOutFunc(cstate, attr->atttypid,
+                                           &cstate->out_functions[attnum - 1]);
         else
+        {
             getTypeOutputInfo(attr->atttypid,
                               &out_func_oid,
                               &isvarlena);
-        fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
+            fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
+        }
     }
 
     /*
@@ -811,6 +823,8 @@ DoCopyTo(CopyToState cstate)
         tmp = 0;
         CopySendInt32(cstate, tmp);
     }
+    else if (cstate->routine)
+        cstate->routine->CopyToStart(cstate, tupDesc);
     else
     {
         /*
@@ -892,6 +906,8 @@ DoCopyTo(CopyToState cstate)
         /* Need to flush out the trailer */
         CopySendEndOfRow(cstate);
     }
+    else if (cstate->routine)
+        cstate->routine->CopyToEnd(cstate);
 
     MemoryContextDelete(cstate->rowcontext);
 
@@ -916,6 +932,13 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
     MemoryContextReset(cstate->rowcontext);
     oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
 
+    if (cstate->routine)
+    {
+        cstate->routine->CopyToOneRow(cstate, slot);
+        MemoryContextSwitchTo(oldcontext);
+        return;
+    }
+
     if (cstate->opts.binary)
     {
         /* Binary per-tuple header */
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
new file mode 100644
index 0000000000..635c4cbff2
--- /dev/null
+++ b/src/include/commands/copyapi.h
@@ -0,0 +1,100 @@
+/*-------------------------------------------------------------------------
+ *
+ * copyapi.h
+ *      API for COPY TO/FROM handlers
+ *
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/commands/copyapi.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef COPYAPI_H
+#define COPYAPI_H
+
+#include "executor/tuptable.h"
+#include "nodes/execnodes.h"
+
+/* These are private in commands/copy[from|to].c */
+typedef struct CopyFromStateData *CopyFromState;
+typedef struct CopyToStateData *CopyToState;
+
+/*
+ * API structure for a COPY FROM format implementation.  Note this must be
+ * allocated in a server-lifetime manner, typically as a static const struct.
+ */
+typedef struct CopyFromRoutine
+{
+    /*
+     * Called when COPY FROM is started to set up the input functions
+     * associated to the relation's attributes writing to.  `finfo` can be
+     * optionally filled to provide the catalog information of the input
+     * function.  `typioparam` can be optionally filled to define the OID of
+     * the type to pass to the input function.  `atttypid` is the OID of data
+     * type used by the relation's attribute.
+     */
+    void        (*CopyFromInFunc) (CopyFromState cstate, Oid atttypid,
+                                   FmgrInfo *finfo, Oid *typioparam);
+
+    /*
+     * Called when COPY FROM is started.
+     *
+     * `tupDesc` is the tuple descriptor of the relation where the data needs
+     * to be copied.  This can be used for any initialization steps required
+     * by a format.
+     */
+    void        (*CopyFromStart) (CopyFromState cstate, TupleDesc tupDesc);
+
+    /*
+     * Copy one row to a set of `values` and `nulls` of size tupDesc->natts.
+     *
+     * 'econtext' is used to evaluate default expression for each column that
+     * is either not read from the file or is using the DEFAULT option of COPY
+     * FROM.  It is NULL if no default values are used.
+     *
+     * Returns false if there are no more tuples to copy.
+     */
+    bool        (*CopyFromOneRow) (CopyFromState cstate, ExprContext *econtext,
+                                   Datum *values, bool *nulls);
+
+    /* Called when COPY FROM has ended. */
+    void        (*CopyFromEnd) (CopyFromState cstate);
+} CopyFromRoutine;
+
+/*
+ * API structure for a COPY TO format implementation.   Note this must be
+ * allocated in a server-lifetime manner, typically as a static const struct.
+ */
+typedef struct CopyToRoutine
+{
+    /*
+     * Called when COPY TO is started to set up the output functions
+     * associated to the relation's attributes reading from.  `finfo` can be
+     * optionally filled.  `atttypid` is the OID of data type used by the
+     * relation's attribute.
+     */
+    void        (*CopyToOutFunc) (CopyToState cstate, Oid atttypid,
+                                  FmgrInfo *finfo);
+
+    /*
+     * Called when COPY TO is started.
+     *
+     * `tupDesc` is the tuple descriptor of the relation from where the data
+     * is read.
+     */
+    void        (*CopyToStart) (CopyToState cstate, TupleDesc tupDesc);
+
+    /*
+     * Copy one row for COPY TO.
+     *
+     * `slot` is the tuple slot where the data is emitted.
+     */
+    void        (*CopyToOneRow) (CopyToState cstate, TupleTableSlot *slot);
+
+    /* Called when COPY TO has ended */
+    void        (*CopyToEnd) (CopyToState cstate);
+} CopyToRoutine;
+
+#endif                            /* COPYAPI_H */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index cad52fcc78..509b9e92a1 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -15,6 +15,7 @@
 #define COPYFROM_INTERNAL_H
 
 #include "commands/copy.h"
+#include "commands/copyapi.h"
 #include "commands/trigger.h"
 #include "nodes/miscnodes.h"
 
@@ -58,6 +59,9 @@ typedef enum CopyInsertMethod
  */
 typedef struct CopyFromStateData
 {
+    /* format routine */
+    const CopyFromRoutine *routine;
+
     /* low-level state data */
     CopySource    copy_src;        /* type of copy source */
     FILE       *copy_file;        /* used if copy_src == COPY_FILE */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index ee40a341d3..a5ae161ca5 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -475,6 +475,7 @@ ConvertRowtypeExpr
 CookedConstraint
 CopyDest
 CopyFormatOptions
+CopyFromRoutine
 CopyFromState
 CopyFromStateData
 CopyHeaderChoice
@@ -484,6 +485,7 @@ CopyMultiInsertInfo
 CopyOnErrorChoice
 CopySource
 CopyStmt
+CopyToRoutine
 CopyToState
 CopyToStateData
 Cost
-- 
2.43.0
			
		On Tue, Mar 05, 2024 at 05:18:08PM +0900, Sutou Kouhei wrote: > I'll send the following patches after this patch is > merged. I am not sure that my schedule is on track to allow that for this release, unfortunately, especially with all the other items to review and discuss to make this thread feature-complete. There should be a bit more than four weeks until the feature freeze (date not set in stone, should be around the 8th of April AoE), but I have less than the half due to personal issues. Perhaps if somebody jumps on this thread, that will be possible.. > They are based on the v6 patch[1]: > > 1. Add copy_handler > * This also adds a pg_proc lookup for custom FORMAT > * This also adds a test for copy_handler > 2. Export CopyToStateData > * We need it to implement custom copy TO handler > 3. Add needed APIs to implement custom copy TO handler > * Add CopyToStateData::opaque > * Export CopySendEndOfRow() > 4. Export CopyFromStateData > * We need it to implement custom copy FROM handler > 5. Add needed APIs to implement custom copy FROM handler > * Add CopyFromStateData::opaque > * Export CopyReadBinaryData() Hmm. Sounds like a good plan for a split. -- Michael
Вложения
On Wed, Mar 06, 2024 at 03:34:04PM +0900, Michael Paquier wrote:
> I am not sure that my schedule is on track to allow that for this
> release, unfortunately, especially with all the other items to review
> and discuss to make this thread feature-complete.  There should be
> a bit more than four weeks until the feature freeze (date not set in
> stone, should be around the 8th of April AoE), but I have less than
> the half due to personal issues.  Perhaps if somebody jumps on this
> thread, that will be possible..
While on it, here are some profiles based on HEAD and v17 with the
previous tests (COPY TO /dev/null, COPY FROM data sent to the void).
COPY FROM, text format with 30 attributes and HEAD:
-   66.53%    16.33%  postgres  postgres            [.] NextCopyFrom
    - 50.20% NextCopyFrom
       - 30.83% NextCopyFromRawFields
          + 16.09% CopyReadLine
            13.72% CopyReadAttributesText
       + 19.11% InputFunctionCallSafe
    + 16.33% _start
COPY FROM, text format with 30 attributes and v17:
-   66.60%    16.10%  postgres  postgres            [.] NextCopyFrom
    - 50.50% NextCopyFrom
       - 30.44% NextCopyFromRawFields
          + 15.71% CopyReadLine
            13.73% CopyReadAttributesText
       + 19.81% InputFunctionCallSafe
    + 16.10% _start
COPY TO, text format with 30 attributes and HEAD:
-   79.55%    15.54%  postgres  postgres            [.] CopyOneRowTo
    - 64.01% CopyOneRowTo
       + 30.01% OutputFunctionCall
       + 11.71% appendBinaryStringInfo
         9.36% CopyAttributeOutText
       + 3.03% CopySendEndOfRow
         1.65% int4out
         1.01% 0xffff83e46be4
         0.93% 0xffff83e46be8
         0.93% memcpy@plt
         0.87% pgstat_progress_update_param
         0.78% enlargeStringInfo
         0.67% 0xffff83e46bb4
         0.66% 0xffff83e46bcc
         0.57% MemoryContextReset
    + 15.54% _start
COPY TO, text format with 30 attributes and v17:
-   79.35%    16.08%  postgres  postgres            [.] CopyOneRowTo
    - 62.27% CopyOneRowTo
       + 28.92% OutputFunctionCall
       + 10.88% appendBinaryStringInfo
         9.54% CopyAttributeOutText
       + 3.03% CopySendEndOfRow
         1.60% int4out
         0.97% pgstat_progress_update_param
         0.95% 0xffff8c46cbe8
         0.89% memcpy@plt
         0.87% 0xffff8c46cbe4
         0.79% enlargeStringInfo
         0.64% 0xffff8c46cbcc
         0.61% 0xffff8c46cbb4
         0.58% MemoryContextReset
    + 16.08% _start
So, in short, and that's not really a surprise, there is no effect
once we use the dispatching with the routines only when a format would
want to plug-in with the APIs, but a custom format would still have a
penalty of a few percents for both if bottlenecked on CPU.
--
Michael
			
		Вложения
Hi,
In <ZelfYatRdVZN3FbE@paquier.xyz>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Thu, 7 Mar 2024 15:32:01 +0900,
  Michael Paquier <michael@paquier.xyz> wrote:
> While on it, here are some profiles based on HEAD and v17 with the
> previous tests (COPY TO /dev/null, COPY FROM data sent to the void).
> 
...
> 
> So, in short, and that's not really a surprise, there is no effect
> once we use the dispatching with the routines only when a format would
> want to plug-in with the APIs, but a custom format would still have a
> penalty of a few percents for both if bottlenecked on CPU.
Thanks for sharing these profiles!
I agree with you.
This shows that the v17 approach doesn't affect the current
text/csv/binary implementations. (The v17 approach just adds
2 new structs, Copy{From,To}Rountine, without changing the
current text/csv/binary implementations.)
Can we push the v17 patch and proceed following
implementations? Could someone (especially a PostgreSQL
committer) take a look at this for double-check?
Thanks,
-- 
kou
			
		On Fri, Mar 8, 2024 at 8:23 AM Sutou Kouhei <kou@clear-code.com> wrote:
>
>
> This shows that the v17 approach doesn't affect the current
> text/csv/binary implementations. (The v17 approach just adds
> 2 new structs, Copy{From,To}Rountine, without changing the
> current text/csv/binary implementations.)
>
> Can we push the v17 patch and proceed following
> implementations? Could someone (especially a PostgreSQL
> committer) take a look at this for double-check?
>
Hi, here are my cents:
Currently in v17, we have 3 extra functions within DoCopyTo
CopyToStart, one time, start, doing some preliminary work.
CopyToOneRow, doing the repetitive work, called many times, row by row.
CopyToEnd, one time doing the closing work.
seems to need a function pointer for processing the format and other options.
or maybe the reason is we need a one time function call before doing DoCopyTo,
like one time initialization.
We can placed the function pointer after:
`
cstate = BeginCopyTo(pstate, rel, query, relid,
stmt->filename, stmt->is_program,
NULL, stmt->attlist, stmt->options);
`
generally in v17, the code pattern looks like this.
if (cstate->opts.binary)
{
/* handle binary format */
}
else if (cstate->routine)
{
/* custom code, make the copy format extensible */
}
else
{
/* handle non-binary, (csv or text) format */
}
maybe we need another bool flag like `bool buildin_format`.
if the copy format is {csv|text|binary}  then buildin_format is true else false.
so the code pattern would be:
if (cstate->opts.binary)
{
/* handle binary format */
}
else if (cstate->routine && !buildin_format)
{
/* custom code, make the copy format extensible */
}
else
{
/* handle non-binary, (csv or text) format */
}
otherwise the {CopyToRoutine| CopyFromRoutine} needs a function pointer
to distinguish native copy format and extensible supported format,
like I mentioned above?
			
		Hi,
In <CACJufxEgn3=j-UWg-f2-DbLO+uVSKGcofpkX5trx+=YX6icSFg@mail.gmail.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 11 Mar 2024 08:00:00 +0800,
  jian he <jian.universality@gmail.com> wrote:
> Hi, here are my cents:
> Currently in v17, we have 3 extra functions within DoCopyTo
> CopyToStart, one time, start, doing some preliminary work.
> CopyToOneRow, doing the repetitive work, called many times, row by row.
> CopyToEnd, one time doing the closing work.
> 
> seems to need a function pointer for processing the format and other options.
> or maybe the reason is we need a one time function call before doing DoCopyTo,
> like one time initialization.
I know that JSON format wants it but can we defer it? We can
add more options later. I want to proceed this improvement
step by step.
More use cases will help us which callbacks are needed. We
will be able to collect more use cases by providing basic
callbacks.
> generally in v17, the code pattern looks like this.
> if (cstate->opts.binary)
> {
> /* handle binary format */
> }
> else if (cstate->routine)
> {
> /* custom code, make the copy format extensible */
> }
> else
> {
> /* handle non-binary, (csv or text) format */
> }
> maybe we need another bool flag like `bool buildin_format`.
> if the copy format is {csv|text|binary}  then buildin_format is true else false.
> 
> so the code pattern would be:
> if (cstate->opts.binary)
> {
> /* handle binary format */
> }
> else if (cstate->routine && !buildin_format)
> {
> /* custom code, make the copy format extensible */
> }
> else
> {
> /* handle non-binary, (csv or text) format */
> }
> 
> otherwise the {CopyToRoutine| CopyFromRoutine} needs a function pointer
> to distinguish native copy format and extensible supported format,
> like I mentioned above?
Hmm. I may miss something but I think that we don't need the
bool flag. Because we don't set cstate->routine for native
copy formats. So we can distinguish native copy format and
extensible supported format by checking only
cstate->routine.
Thanks,
-- 
kou
			
		On Mon, Mar 11, 2024 at 8:56 AM Sutou Kouhei <kou@clear-code.com> wrote: > > Hi, > > In <CACJufxEgn3=j-UWg-f2-DbLO+uVSKGcofpkX5trx+=YX6icSFg@mail.gmail.com> > "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 11 Mar 2024 08:00:00 +0800, > jian he <jian.universality@gmail.com> wrote: > > > Hi, here are my cents: > > Currently in v17, we have 3 extra functions within DoCopyTo > > CopyToStart, one time, start, doing some preliminary work. > > CopyToOneRow, doing the repetitive work, called many times, row by row. > > CopyToEnd, one time doing the closing work. > > > > seems to need a function pointer for processing the format and other options. > > or maybe the reason is we need a one time function call before doing DoCopyTo, > > like one time initialization. > > I know that JSON format wants it but can we defer it? We can > add more options later. I want to proceed this improvement > step by step. > > More use cases will help us which callbacks are needed. We > will be able to collect more use cases by providing basic > callbacks. I guess one of the ultimate goals would be that COPY can export data to a customized format. Let's say the customized format is "csv1", but it is just analogous to the csv format. people should be able to create an extension, with serval C functions, then they can do `copy (select 1 ) to stdout (format 'csv1');` but the output will be exact same as `copy (select 1 ) to stdout (format 'csv');` In such a scenario, we require a function akin to ProcessCopyOptions to handle situations where CopyFormatOptions->csv_mode is true, while the format is "csv1". but CopyToStart is already within the DoCopyTo function, so you do need an extra function pointer? I do agree with the incremental improvement method.
Hi, In <CACJufxFbffGaxW1LiTNEQAPcuvP1s7GL1Ghi--kbSqsjwh7XeA@mail.gmail.com> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Wed, 13 Mar 2024 16:00:46 +0800, jian he <jian.universality@gmail.com> wrote: >> More use cases will help us which callbacks are needed. We >> will be able to collect more use cases by providing basic >> callbacks. > Let's say the customized format is "csv1", but it is just analogous to > the csv format. > people should be able to create an extension, with serval C functions, > then they can do `copy (select 1 ) to stdout (format 'csv1');` > but the output will be exact same as `copy (select 1 ) to stdout > (format 'csv');` Thanks for sharing one use case but I think that we need real-world use cases to consider our APIs. For example, JSON support that is currently discussing in another thread is a real-world use case. My Apache Arrow support is also another real-world use case. Thanks, -- kou
Hi, Could someone review the v17 patch to proceed this? The v17 patch: https://www.postgresql.org/message-id/flat/20240305.171808.667980402249336456.kou%40clear-code.com#d2ee079b75ebcf00c410300ecc4a357a Some profiles by Michael: https://www.postgresql.org/message-id/flat/ZelfYatRdVZN3FbE%40paquier.xyz#eccfd1a0131af93c48026d691cc247f4 Thanks, -- kou In <20240308.092254.359611633589181574.kou@clear-code.com> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 08 Mar 2024 09:22:54 +0900 (JST), Sutou Kouhei <kou@clear-code.com> wrote: > Hi, > > In <ZelfYatRdVZN3FbE@paquier.xyz> > "Re: Make COPY format extendable: Extract COPY TO format implementations" on Thu, 7 Mar 2024 15:32:01 +0900, > Michael Paquier <michael@paquier.xyz> wrote: > >> While on it, here are some profiles based on HEAD and v17 with the >> previous tests (COPY TO /dev/null, COPY FROM data sent to the void). >> > ... >> >> So, in short, and that's not really a surprise, there is no effect >> once we use the dispatching with the routines only when a format would >> want to plug-in with the APIs, but a custom format would still have a >> penalty of a few percents for both if bottlenecked on CPU. > > Thanks for sharing these profiles! > I agree with you. > > This shows that the v17 approach doesn't affect the current > text/csv/binary implementations. (The v17 approach just adds > 2 new structs, Copy{From,To}Rountine, without changing the > current text/csv/binary implementations.) > > Can we push the v17 patch and proceed following > implementations? Could someone (especially a PostgreSQL > committer) take a look at this for double-check? > > > Thanks, > -- > kou > >
Hi Andres, Could you take a look at this? I think that you don't want to touch the current text/csv/binary implementations. The v17 patch approach doesn't touch the current text/csv/binary implementations. What do you think about this approach? Thanks, -- kou In <20240320.232732.488684985873786799.kou@clear-code.com> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Wed, 20 Mar 2024 23:27:32 +0900 (JST), Sutou Kouhei <kou@clear-code.com> wrote: > Hi, > > Could someone review the v17 patch to proceed this? > > The v17 patch: > https://www.postgresql.org/message-id/flat/20240305.171808.667980402249336456.kou%40clear-code.com#d2ee079b75ebcf00c410300ecc4a357a > > Some profiles by Michael: > https://www.postgresql.org/message-id/flat/ZelfYatRdVZN3FbE%40paquier.xyz#eccfd1a0131af93c48026d691cc247f4 > > Thanks, > -- > kou > > In <20240308.092254.359611633589181574.kou@clear-code.com> > "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 08 Mar 2024 09:22:54 +0900 (JST), > Sutou Kouhei <kou@clear-code.com> wrote: > >> Hi, >> >> In <ZelfYatRdVZN3FbE@paquier.xyz> >> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Thu, 7 Mar 2024 15:32:01 +0900, >> Michael Paquier <michael@paquier.xyz> wrote: >> >>> While on it, here are some profiles based on HEAD and v17 with the >>> previous tests (COPY TO /dev/null, COPY FROM data sent to the void). >>> >> ... >>> >>> So, in short, and that's not really a surprise, there is no effect >>> once we use the dispatching with the routines only when a format would >>> want to plug-in with the APIs, but a custom format would still have a >>> penalty of a few percents for both if bottlenecked on CPU. >> >> Thanks for sharing these profiles! >> I agree with you. >> >> This shows that the v17 approach doesn't affect the current >> text/csv/binary implementations. (The v17 approach just adds >> 2 new structs, Copy{From,To}Rountine, without changing the >> current text/csv/binary implementations.) >> >> Can we push the v17 patch and proceed following >> implementations? Could someone (especially a PostgreSQL >> committer) take a look at this for double-check? >> >> >> Thanks, >> -- >> kou >> >> > >
Hello Kouhei-san, I think it'd be helpful if you could post a patch status, i.e. a message re-explaininig what it aims to achieve, summary of the discussion so far, and what you think are the open questions. Otherwise every reviewer has to read the whole thread to learn this. FWIW I realize there are other related patches, and maybe some of the discussion is happening on those threads. But that's just another reason to post the summary here - as a reviewer I'm not going to read random other patches that "might" have relevant info. ----- The way I understand it, the ultimate goal is to allow extensions to define formats using CREATE XYZ. And I agree that would be a very valuable feature. But the proposed patch does not do that, right? It only does some basic things at the C level, there's no DDL etc. Per the commit message, none of the existing formats (text/csv/binary) is implemented as "copy routine". IMHO that's a bit strange, because that's exactly what I'd expect this patch to do - to define all the infrastructure (catalogs, ...) and switch the existing formats to it. Yes, the patch will be larger, but it'll also simplify some of the code (right now there's a bunch of branches to handle these "old" formats). How would you even know the new code is correct, when there's nothing using using the "copy routine" branch? In fact, doesn't this mean that the benchmarks presented earlier are not very useful? We still use the old code, except there are a couple "if" branches that are never taken? I don't think this measures the new approach would not be slower once everything gets to be copy routine. Or what am I missing? Also, how do we know this API is suitable for the alternative formats? For example you mentioned Arrow, and I suppose people will want to add support for other column-oriented formats. I assume that will require stashing a batch of rows (or some other internal state) somewhere, but does the proposed API plan for that? My guess would be we'll need to add a "private_data" pointer to the CopyFromStateData/CopyToStateData structs, but maybe I'm wrong. Also, won't the alternative formats require custom parameters. For example, for column-oriented-formats it might be useful to specify a stripe size (rows per batch), etc. I'm not saying this patch needs to implement that, but maybe the API should expect it? ----- To sum this up, what I think needs to happen for this patch to move forward: 1) Switch the existing formats to the new API, to validate the API works at least for them, allow testing and benchmarking the code. 2) Try implementing some of the more exotic formats (column-oriented) to test the API works for those too. 3) Maybe try implementing a PoC version to do the DDL, so that it actually is extensible. It's not my intent to "move the goalposts" - I think it's fine if the patches (2) and (3) are just PoC, to validate (1) goes in the right direction. For example, it's fine if (2) just hard-codes the new format next to the build-in ones - that's not something we'd commit, I think, but for validation of (1) it's good enough. Most of the DDL stuff can probably be "copied" from FDW handlers. It's pretty similar, and the "routine" idea is what FDW does too. It probably also shows a good way to "initialize" the routine, etc. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi Kou, I tried to follow the thread but had to skip quite some discussions in the middle part of the thread. From what I read, itappears to me that there were a lot of back-and-forth discussions on the specific implementation details (i.e. do not touchexisting format implementation), performance concerns and how to split the patches to make it more manageable. My understanding is that the provided v17 patch aims to achieve the followings: - Retain existing format implementations as built-in formats, and do not go through the new interface for them. - Make sure that there is no sign of performance degradation. - Refactoring the existing code to make it easier and possible to make copy handlers extensible. However, some of the infrastructurework that are required to make copy handler extensible are intentionally delayed for future patches. Some ofthe work were proposed as patches in earlier messages, but they were not explicitly referenced in recent messages. Overall, the current v17 patch applies cleanly to HEAD. “make check-world” also runs cleanly. If my understanding of thecurrent status of the patch is correct, the patch looks good to me. Regards, Yong
Hi Tomas, Thanks for joining this thread! In <257d5573-07da-48c3-ac07-e047e7a65e99@enterprisedb.com> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 19 Jul 2024 14:40:05 +0200, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > I think it'd be helpful if you could post a patch status, i.e. a message > re-explaininig what it aims to achieve, summary of the discussion so > far, and what you think are the open questions. Otherwise every reviewer > has to read the whole thread to learn this. It makes sense. It seems your questions covers all important points in this thread. So my answers of your questions summarize the latest information. > FWIW I realize there are other related patches, and maybe some of the > discussion is happening on those threads. But that's just another reason > to post the summary here - as a reviewer I'm not going to read random > other patches that "might" have relevant info. It makes sense too. To clarify it, other threads are unrelated. We can focus on only this thread for this propose. > The way I understand it, the ultimate goal is to allow extensions to > define formats using CREATE XYZ. Right. > But the proposed patch does not do that, right? It > only does some basic things at the C level, there's no DDL etc. Right. The latest patch set includes only the basic things for the first implementation. > Per the commit message, none of the existing formats (text/csv/binary) > is implemented as "copy routine". Right. > IMHO that's a bit strange, because > that's exactly what I'd expect this patch to do - to define all the > infrastructure (catalogs, ...) and switch the existing formats to it. We did it in the v1-v15 patch sets. But the v16/v17 patch sets remove it because of a profiling result. (It's described later.) In general, we don't want to decrease the current performance of the existing formats: https://www.postgresql.org/message-id/flat/10025bac-158c-ffe7-fbec-32b42629121f%40dunslane.net#81cf82c219f2f2d77a616bbf5e511a5c > We've spent quite a lot of blood sweat and tears over the years to make > COPY fast, and we should not sacrifice any of that lightly. The v15 patch set is faster than HEAD but there is a mysterious profiling result: https://www.postgresql.org/message-id/flat/ZdbtQJ-p5H1_EDwE%40paquier.xyz#6439e6ad574f2d47cd7220e9bfed3889 > The increase in CopyOneRowTo from 80% to 85% worries me ... > I am getting faster > runtimes with v15 (6232ms in average) vs HEAD (6550ms). I think that it's not a blocker because the v15 patch set approach is faster. But someone may think that it's a blocker. So the v16 or later patch sets don't include codes to use this extension mechanism for the existing formats. We can work on it after we introduce the basic features if it's valuable. > How would you even know the new code is correct, when there's nothing > using using the "copy routine" branch? We can't test it only with the v16/v17 patch set changes. But we can do it by adding more changes we did in the v6 patch set. https://www.postgresql.org/message-id/flat/20240124.144936.67229716500876806.kou%40clear-code.com#f1ad092fc5e81fe38d3c376559efd52c If we should commit the basic changes with tests, I can adjust the test mechanism in v6 patch set and add it to the latest patch set. But it needs CREATE XYZ mechanism and so on too. Is it OK? > In fact, doesn't this mean that the benchmarks presented earlier are not > very useful? We still use the old code, except there are a couple "if" > branches that are never taken? I don't think this measures the new > approach would not be slower once everything gets to be copy routine. Here is a benchmark result with the v17 and HEAD: https://www.postgresql.org/message-id/flat/ZelfYatRdVZN3FbE%40paquier.xyz#eccfd1a0131af93c48026d691cc247f4 It shows that no performance difference for the existing formats. The added mechanism may be slower than the existing formats mechanism but it's not a blocker. Because it's never performance regression. (Because this is a new feature.) We can improve it later if it's needed. > Also, how do we know this API is suitable for the alternative formats? The v6 patch set has more APIs built on this API. These APIs are for implementing the alternative formats. https://www.postgresql.org/message-id/flat/20240124.144936.67229716500876806.kou%40clear-code.com#f1ad092fc5e81fe38d3c376559efd52c This is an Apache Arrow format implementation based on the v6 patch set: https://github.com/kou/pg-copy-arrow > For example you mentioned Arrow, and I suppose people will want to add > support for other column-oriented formats. I assume that will require > stashing a batch of rows (or some other internal state) somewhere, but > does the proposed API plan for that? > > My guess would be we'll need to add a "private_data" pointer to the > CopyFromStateData/CopyToStateData structs, but maybe I'm wrong. I think so too. The v6 patch set has a "private_data" pointer. But the v17 patch set doesn't have it because the v17 patch set has only basic changes. We'll add it and other features in the following patches: https://www.postgresql.org/message-id/flat/20240305.171808.667980402249336456.kou%40clear-code.com > I'll send the following patches after this patch is > merged. They are based on the v6 patch[1]: > > 1. Add copy_handler > * This also adds a pg_proc lookup for custom FORMAT > * This also adds a test for copy_handler > 2. Export CopyToStateData > * We need it to implement custom copy TO handler > 3. Add needed APIs to implement custom copy TO handler > * Add CopyToStateData::opaque > * Export CopySendEndOfRow() > 4. Export CopyFromStateData > * We need it to implement custom copy FROM handler > 5. Add needed APIs to implement custom copy FROM handler > * Add CopyFromStateData::opaque > * Export CopyReadBinaryData() "Copy{To,From}StateDate::opaque" are the "private_data" pointer in the v6 patch. > Also, won't the alternative formats require custom parameters. For > example, for column-oriented-formats it might be useful to specify a > stripe size (rows per batch), etc. I'm not saying this patch needs to > implement that, but maybe the API should expect it? Yes. The v6 patch set also has the API. But we want to minimize API set as much as possible in the first implementation. https://www.postgresql.org/message-id/flat/Zbi1TwPfAvUpKqTd%40paquier.xyz#00abc60c5a1ad9eee395849b7b5a5e0d > I am really worried about the complexities > this thread is getting into because we are trying to shape the > callbacks in the most generic way possible based on *two* use cases. > This is going to be a never-ending discussion. I'd rather get some > simple basics, and then we can discuss if tweaking the callbacks is > really necessary or not. And I agree with this approach. > 1) Switch the existing formats to the new API, to validate the API works > at least for them, allow testing and benchmarking the code. I want to keep the current style for the first implementation to avoid affecting the existing formats performance. If it's not allowed to move forward this proposal, could someone help us to solve the mysterious result (why are %s of CopyOneRowTo() different?) in the following v15 patch set benchmark result? https://www.postgresql.org/message-id/flat/ZdbtQJ-p5H1_EDwE%40paquier.xyz#6439e6ad574f2d47cd7220e9bfed3889 > 2) Try implementing some of the more exotic formats (column-oriented) to > test the API works for those too. > > 3) Maybe try implementing a PoC version to do the DDL, so that it > actually is extensible. > > It's not my intent to "move the goalposts" - I think it's fine if the > patches (2) and (3) are just PoC, to validate (1) goes in the right > direction. For example, it's fine if (2) just hard-codes the new format > next to the build-in ones - that's not something we'd commit, I think, > but for validation of (1) it's good enough. > > Most of the DDL stuff can probably be "copied" from FDW handlers. It's > pretty similar, and the "routine" idea is what FDW does too. It probably > also shows a good way to "initialize" the routine, etc. Is the v6 patch set enough for it? https://www.postgresql.org/message-id/flat/20240124.144936.67229716500876806.kou%40clear-code.com#f1ad092fc5e81fe38d3c376559efd52c Or should we do it based on the v17 patch set? If so, I'll work on it now. It was a plan that I'll do after the v17 patch set is merged. Thanks, -- kou
Hi Yong, Thanks for joining this thread! In <453D52D4-2AC5-49F6-928D-79F8A4C0850E@ebay.com> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 22 Jul 2024 07:11:15 +0000, "Li, Yong" <yoli@ebay.com> wrote: > My understanding is that the provided v17 patch aims to achieve the followings: > - Retain existing format implementations as built-in formats, and do not go through the new interface for them. > - Make sure that there is no sign of performance degradation. > - Refactoring the existing code to make it easier and possible to make copy handlers extensible. However, some of the infrastructurework that are required to make copy handler extensible are intentionally delayed for future patches. Some ofthe work were proposed as patches in earlier messages, but they were not explicitly referenced in recent messages. Right. Sorry for bothering you. As Tomas suggested, I should have prepared the current summary. My last e-mail summarized the current information: https://www.postgresql.org/message-id/flat/20240722.164540.889091645042390373.kou%40clear-code.com#0be14c4eeb041e70438ab7a423b728da It also shows that your understanding is right. Thanks, -- kou
On 7/22/24 09:45, Sutou Kouhei wrote: > Hi Tomas, > > Thanks for joining this thread! > > In <257d5573-07da-48c3-ac07-e047e7a65e99@enterprisedb.com> > "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 19 Jul 2024 14:40:05 +0200, > Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > >> I think it'd be helpful if you could post a patch status, i.e. a message >> re-explaininig what it aims to achieve, summary of the discussion so >> far, and what you think are the open questions. Otherwise every reviewer >> has to read the whole thread to learn this. > > It makes sense. It seems your questions covers all important > points in this thread. So my answers of your questions > summarize the latest information. > Thanks for the summary/responses. I still think it'd be better to post a summary as a separate message, not as yet another post responding to someone else. If I was reading the thread, I would not have noticed this is meant to be a summary. I'd even consider putting a "THREAD SUMMARY" title on the first line, or something like that. Up to you, of course. As for the patch / decisions, thanks for the responses and explanations. But I still find it hard to review / make judgements about the approach based on the current version of the patch :-( Yes, it's entirely possible earlier versions did something interesting - e.g. it might have implemented the existing formats to the new approach. Or it might have a private pointer in v6. But how do I know why it was removed? Was it because it's unnecessary for the initial version? Or was it because it turned out to not work? And when reviewing a patch, I really don't want to scavenge through old patch versions, looking for random parts. Not only because I don't know what to look for, but also because it'll be harder and harder to make those old versions work, as the patch moves evolves. My suggestions would be to maintain this as a series of patches, making incremental changes, with the "more complex" or "more experimental" parts larger in the series. For example, I can imagine doing this: 0001 - minimal version of the patch (e.g. current v17) 0002 - switch existing formats to the new interface 0003 - extend the interface to add bits needed for columnar formats 0004 - add DML to create/alter/drop custom implementations 0005 - minimal patch with extension adding support for Arrow Or something like that. The idea is that we still have a coherent story of what we're trying to do, and can discuss the incremental changes (easier than looking at a large patch). It's even possible to commit earlier parts before the later parts are quite cleanup up for commit. And some changes changes may not be even meant for commit (e.g. the extension) but as guidance / validation for the earlier parts. I do realize this might look like I'm requiring you to do more work. Sorry about that. I'm just thinking about how to move the patch forward and convince myself the approach is OK. Also, it's what I think works quite well for other patches discussed on this mailing list (I do this for various patches I submitted, for example). And I'm not even sure it actually is more work. As for the performance / profiling issues, I've read the reports and I'm not sure I see something tremendously wrong. Yes, there are differences, but 5% change can easily be noise, shift in binary layout, etc. Unfortunately, there's not much information about what exactly the tests did, context (hardware, ...). So I don't know, really. But if you share enough information on how to reproduce this, I'm willing to take a look and investigate. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi,
In <9172d4eb-6de0-4c6d-beab-8210b7a2219b@enterprisedb.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 22 Jul 2024 14:36:40 +0200,
  Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
> Thanks for the summary/responses. I still think it'd be better to post a
> summary as a separate message, not as yet another post responding to
> someone else. If I was reading the thread, I would not have noticed this
> is meant to be a summary. I'd even consider putting a "THREAD SUMMARY"
> title on the first line, or something like that. Up to you, of course.
It makes sense. I'll do it as a separated e-mail.
> My suggestions would be to maintain this as a series of patches, making
> incremental changes, with the "more complex" or "more experimental"
> parts larger in the series. For example, I can imagine doing this:
> 
> 0001 - minimal version of the patch (e.g. current v17)
> 0002 - switch existing formats to the new interface
> 0003 - extend the interface to add bits needed for columnar formats
> 0004 - add DML to create/alter/drop custom implementations
> 0005 - minimal patch with extension adding support for Arrow
> 
> Or something like that. The idea is that we still have a coherent story
> of what we're trying to do, and can discuss the incremental changes
> (easier than looking at a large patch). It's even possible to commit
> earlier parts before the later parts are quite cleanup up for commit.
> And some changes changes may not be even meant for commit (e.g. the
> extension) but as guidance / validation for the earlier parts.
OK. I attach the v18 patch set:
0001: add a basic feature (Copy{From,To}Routine)
      (same as the v17 but it's based on the current master)
0002: use Copy{From,To}Rountine for the existing formats
      (this may not be committed because there is a
      profiling related concern)
0003: add support for specifying custom format by "COPY
      ... WITH (format 'my-format')"
      (this also has a test)
0004: export Copy{From,To}StateData
      (but this isn't enough to implement custom COPY
      FROM/TO handlers as an extension)
0005: add opaque member to Copy{From,To}StateData and export
      some functions to read the next data and flush the buffer
      (we can implement a PoC Apache Arrow COPY FROM/TO
      handler as an extension with this)
https://github.com/kou/pg-copy-arrow is a PoC Apache Arrow
COPY FROM/TO handler as an extension.
Notes:
* 0002: We use "static inline" and "constant argument" for
  optimization.
* 0002: This hides NextCopyFromRawFields() in a public
  header because it's not used in PostgreSQL and we want to
  use "static inline" for it. If it's a problem, we can keep
  it and create an internal function for "static inline".
* 0003: We use "CREATE FUNCTION" to register a custom COPY
  FROM/TO handler. It's the same approach as tablesample.
* 0004 and 0005: We can mix them but this patch set split
  them for easy to review. 0004 just moves the existing
  codes. It doesn't change the existing codes.
* PoC: I provide it as a separated repository instead of a
  patch because an extension exists as a separated project
  in general. If it's a problem, I can provide it as a patch
  for contrib/.
* This patch set still has minimal Copy{From,To}Routine. For
  example, custom COPY FROM/TO handlers can't process their
  own options with this patch set. We may add more callbacks
  to Copy{From,To}Routine later based on real world use-cases.
> Unfortunately, there's not much information about what exactly the tests
> did, context (hardware, ...). So I don't know, really. But if you share
> enough information on how to reproduce this, I'm willing to take a look
> and investigate.
Thanks. Here is related information based on the past
e-mails from Michael:
* Use -O2 for optimization build flag
  ("meson setup --buildtype=release" may be used)
* Use tmpfs for PGDATA
* Disable fsync
* Run on scissors (what is "scissors" in this context...?)
  https://www.postgresql.org/message-id/flat/Zbr6piWuVHDtFFOl%40paquier.xyz#dbbec4d5c54ef2317be01a54abaf495c
* Unlogged table may be used
* Use a table that has 30 integer columns (*1)
* Use 5M rows (*2)
* Use '/dev/null' for COPY TO (*3)
* Use blackhole_am for COPY FROM (*4)
  https://github.com/michaelpq/pg_plugins/tree/main/blackhole_am
* perf is used but used options are unknown (sorry)
(*1) This SQL may be used to create the table:
CREATE OR REPLACE FUNCTION create_table_cols(tabname text, num_cols int)
RETURNS VOID AS
$func$
DECLARE
  query text;
BEGIN
  query := 'CREATE UNLOGGED TABLE ' || tabname || ' (';
  FOR i IN 1..num_cols LOOP
    query := query || 'a_' || i::text || ' int default 1';
    IF i != num_cols THEN
      query := query || ', ';
    END IF;
  END LOOP;
  query := query || ')';
  EXECUTE format(query);
END
$func$ LANGUAGE plpgsql;
SELECT create_table_cols ('to_tab_30', 30);
SELECT create_table_cols ('from_tab_30', 30);
(*2) This SQL may be used to insert 5M rows:
INSERT INTO to_tab_30 SELECT FROM generate_series(1, 5000000);
(*3) This SQL may be used for COPY TO:
COPY to_tab_30 TO '/dev/null' WITH (FORMAT text);
(*4) This SQL may be used for COPY FROM:
CREATE EXTENSION blackhole_am;
ALTER TABLE from_tab_30 SET ACCESS METHOD blackhole_am;
COPY to_tab_30 TO '/tmp/to_tab_30.txt' WITH (FORMAT text);
COPY from_tab_30 FROM '/tmp/to_tab_30.txt' WITH (FORMAT text);
If there is enough information, could you try?
Thanks,
-- 
kou
From 22daacbd77c6dd0e13fe11e30fba90f7595ff6c1 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Mon, 4 Mar 2024 13:52:34 +0900
Subject: [PATCH v18 1/5] Add CopyFromRoutine/CopyToRountine
They are for implementing custom COPY FROM/TO format. But this is not
enough to implement custom COPY FROM/TO format yet. We'll export some
APIs to receive/send data and add "format" option to COPY FROM/TO
later.
Existing text/csv/binary format implementations don't use
CopyFromRoutine/CopyToRoutine for now. We have a patch for it but we
defer it. Because there are some mysterious profile results in spite
of we get faster runtimes. See [1] for details.
[1] https://www.postgresql.org/message-id/ZdbtQJ-p5H1_EDwE%40paquier.xyz
Note that this doesn't change existing text/csv/binary format
implementations.
---
 src/backend/commands/copyfrom.c          |  24 +++++-
 src/backend/commands/copyfromparse.c     |   5 ++
 src/backend/commands/copyto.c            |  31 ++++++-
 src/include/commands/copyapi.h           | 100 +++++++++++++++++++++++
 src/include/commands/copyfrom_internal.h |   4 +
 src/tools/pgindent/typedefs.list         |   2 +
 6 files changed, 158 insertions(+), 8 deletions(-)
 create mode 100644 src/include/commands/copyapi.h
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index ce4d62e707c..ff13b3e3592 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1618,12 +1618,22 @@ BeginCopyFrom(ParseState *pstate,
 
         /* Fetch the input function and typioparam info */
         if (cstate->opts.binary)
+        {
             getTypeBinaryInputInfo(att->atttypid,
                                    &in_func_oid, &typioparams[attnum - 1]);
+            fmgr_info(in_func_oid, &in_functions[attnum - 1]);
+        }
+        else if (cstate->routine)
+            cstate->routine->CopyFromInFunc(cstate, att->atttypid,
+                                            &in_functions[attnum - 1],
+                                            &typioparams[attnum - 1]);
+
         else
+        {
             getTypeInputInfo(att->atttypid,
                              &in_func_oid, &typioparams[attnum - 1]);
-        fmgr_info(in_func_oid, &in_functions[attnum - 1]);
+            fmgr_info(in_func_oid, &in_functions[attnum - 1]);
+        }
 
         /* Get default info if available */
         defexprs[attnum - 1] = NULL;
@@ -1763,10 +1773,13 @@ BeginCopyFrom(ParseState *pstate,
         /* Read and verify binary header */
         ReceiveCopyBinaryHeader(cstate);
     }
-
-    /* create workspace for CopyReadAttributes results */
-    if (!cstate->opts.binary)
+    else if (cstate->routine)
     {
+        cstate->routine->CopyFromStart(cstate, tupDesc);
+    }
+    else
+    {
+        /* create workspace for CopyReadAttributes results */
         AttrNumber    attr_count = list_length(cstate->attnumlist);
 
         cstate->max_fields = attr_count;
@@ -1784,6 +1797,9 @@ BeginCopyFrom(ParseState *pstate,
 void
 EndCopyFrom(CopyFromState cstate)
 {
+    if (cstate->routine)
+        cstate->routine->CopyFromEnd(cstate);
+
     /* No COPY FROM related resources except memory. */
     if (cstate->is_program)
     {
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 7efcb891598..92b8d5e72d5 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -1012,6 +1012,11 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 
         Assert(fieldno == attr_count);
     }
+    else if (cstate->routine)
+    {
+        if (!cstate->routine->CopyFromOneRow(cstate, econtext, values, nulls))
+            return false;
+    }
     else
     {
         /* binary */
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index ae8b2e36d72..ff19c457abf 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -20,6 +20,7 @@
 
 #include "access/tableam.h"
 #include "commands/copy.h"
+#include "commands/copyapi.h"
 #include "commands/progress.h"
 #include "executor/execdesc.h"
 #include "executor/executor.h"
@@ -64,6 +65,9 @@ typedef enum CopyDest
  */
 typedef struct CopyToStateData
 {
+    /* format routine */
+    const CopyToRoutine *routine;
+
     /* low-level state data */
     CopyDest    copy_dest;        /* type of copy source/destination */
     FILE       *copy_file;        /* used if copy_dest == COPY_FILE */
@@ -771,14 +775,22 @@ DoCopyTo(CopyToState cstate)
         Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
 
         if (cstate->opts.binary)
+        {
             getTypeBinaryOutputInfo(attr->atttypid,
                                     &out_func_oid,
                                     &isvarlena);
+            fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
+        }
+        else if (cstate->routine)
+            cstate->routine->CopyToOutFunc(cstate, attr->atttypid,
+                                           &cstate->out_functions[attnum - 1]);
         else
+        {
             getTypeOutputInfo(attr->atttypid,
                               &out_func_oid,
                               &isvarlena);
-        fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
+            fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
+        }
     }
 
     /*
@@ -805,6 +817,8 @@ DoCopyTo(CopyToState cstate)
         tmp = 0;
         CopySendInt32(cstate, tmp);
     }
+    else if (cstate->routine)
+        cstate->routine->CopyToStart(cstate, tupDesc);
     else
     {
         /*
@@ -886,6 +900,8 @@ DoCopyTo(CopyToState cstate)
         /* Need to flush out the trailer */
         CopySendEndOfRow(cstate);
     }
+    else if (cstate->routine)
+        cstate->routine->CopyToEnd(cstate);
 
     MemoryContextDelete(cstate->rowcontext);
 
@@ -910,15 +926,22 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
     MemoryContextReset(cstate->rowcontext);
     oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
 
+    /* Make sure the tuple is fully deconstructed */
+    slot_getallattrs(slot);
+
+    if (cstate->routine)
+    {
+        cstate->routine->CopyToOneRow(cstate, slot);
+        MemoryContextSwitchTo(oldcontext);
+        return;
+    }
+
     if (cstate->opts.binary)
     {
         /* Binary per-tuple header */
         CopySendInt16(cstate, list_length(cstate->attnumlist));
     }
 
-    /* Make sure the tuple is fully deconstructed */
-    slot_getallattrs(slot);
-
     foreach(cur, cstate->attnumlist)
     {
         int            attnum = lfirst_int(cur);
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
new file mode 100644
index 00000000000..635c4cbff27
--- /dev/null
+++ b/src/include/commands/copyapi.h
@@ -0,0 +1,100 @@
+/*-------------------------------------------------------------------------
+ *
+ * copyapi.h
+ *      API for COPY TO/FROM handlers
+ *
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/commands/copyapi.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef COPYAPI_H
+#define COPYAPI_H
+
+#include "executor/tuptable.h"
+#include "nodes/execnodes.h"
+
+/* These are private in commands/copy[from|to].c */
+typedef struct CopyFromStateData *CopyFromState;
+typedef struct CopyToStateData *CopyToState;
+
+/*
+ * API structure for a COPY FROM format implementation.  Note this must be
+ * allocated in a server-lifetime manner, typically as a static const struct.
+ */
+typedef struct CopyFromRoutine
+{
+    /*
+     * Called when COPY FROM is started to set up the input functions
+     * associated to the relation's attributes writing to.  `finfo` can be
+     * optionally filled to provide the catalog information of the input
+     * function.  `typioparam` can be optionally filled to define the OID of
+     * the type to pass to the input function.  `atttypid` is the OID of data
+     * type used by the relation's attribute.
+     */
+    void        (*CopyFromInFunc) (CopyFromState cstate, Oid atttypid,
+                                   FmgrInfo *finfo, Oid *typioparam);
+
+    /*
+     * Called when COPY FROM is started.
+     *
+     * `tupDesc` is the tuple descriptor of the relation where the data needs
+     * to be copied.  This can be used for any initialization steps required
+     * by a format.
+     */
+    void        (*CopyFromStart) (CopyFromState cstate, TupleDesc tupDesc);
+
+    /*
+     * Copy one row to a set of `values` and `nulls` of size tupDesc->natts.
+     *
+     * 'econtext' is used to evaluate default expression for each column that
+     * is either not read from the file or is using the DEFAULT option of COPY
+     * FROM.  It is NULL if no default values are used.
+     *
+     * Returns false if there are no more tuples to copy.
+     */
+    bool        (*CopyFromOneRow) (CopyFromState cstate, ExprContext *econtext,
+                                   Datum *values, bool *nulls);
+
+    /* Called when COPY FROM has ended. */
+    void        (*CopyFromEnd) (CopyFromState cstate);
+} CopyFromRoutine;
+
+/*
+ * API structure for a COPY TO format implementation.   Note this must be
+ * allocated in a server-lifetime manner, typically as a static const struct.
+ */
+typedef struct CopyToRoutine
+{
+    /*
+     * Called when COPY TO is started to set up the output functions
+     * associated to the relation's attributes reading from.  `finfo` can be
+     * optionally filled.  `atttypid` is the OID of data type used by the
+     * relation's attribute.
+     */
+    void        (*CopyToOutFunc) (CopyToState cstate, Oid atttypid,
+                                  FmgrInfo *finfo);
+
+    /*
+     * Called when COPY TO is started.
+     *
+     * `tupDesc` is the tuple descriptor of the relation from where the data
+     * is read.
+     */
+    void        (*CopyToStart) (CopyToState cstate, TupleDesc tupDesc);
+
+    /*
+     * Copy one row for COPY TO.
+     *
+     * `slot` is the tuple slot where the data is emitted.
+     */
+    void        (*CopyToOneRow) (CopyToState cstate, TupleTableSlot *slot);
+
+    /* Called when COPY TO has ended */
+    void        (*CopyToEnd) (CopyToState cstate);
+} CopyToRoutine;
+
+#endif                            /* COPYAPI_H */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index cad52fcc783..509b9e92a18 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -15,6 +15,7 @@
 #define COPYFROM_INTERNAL_H
 
 #include "commands/copy.h"
+#include "commands/copyapi.h"
 #include "commands/trigger.h"
 #include "nodes/miscnodes.h"
 
@@ -58,6 +59,9 @@ typedef enum CopyInsertMethod
  */
 typedef struct CopyFromStateData
 {
+    /* format routine */
+    const CopyFromRoutine *routine;
+
     /* low-level state data */
     CopySource    copy_src;        /* type of copy source */
     FILE       *copy_file;        /* used if copy_src == COPY_FILE */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b4d7f9217ce..3ce855c8f17 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -490,6 +490,7 @@ ConvertRowtypeExpr
 CookedConstraint
 CopyDest
 CopyFormatOptions
+CopyFromRoutine
 CopyFromState
 CopyFromStateData
 CopyHeaderChoice
@@ -501,6 +502,7 @@ CopyMultiInsertInfo
 CopyOnErrorChoice
 CopySource
 CopyStmt
+CopyToRoutine
 CopyToState
 CopyToStateData
 Cost
-- 
2.45.2
From ace816c9ef7b1dceed35d7cf18b82e70fa9143e6 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Tue, 23 Jul 2024 16:44:44 +0900
Subject: [PATCH v18 2/5] Use CopyFromRoutine/CopyToRountine for the existing
 formats
The existing formats are text, csv and binary. If we find any
performance regression by this, we will not merge this to master.
This will increase indirect function call costs but this will reduce
runtime "if (cstate->opts.binary)" and "if (cstate->opts.csv_mode)"
branch costs.
This uses an optimization based of static inline function and a
constant argument call for cstate->opts.csv_mode. For example,
CopyFromTextLikeOneRow() uses this optimization. It accepts the "bool
is_csv" argument instead of using cstate->opts.csv_mode in
it. CopyFromTextOneRow() calls CopyFromTextLikeOneRow() with
false (constant) for "bool is_csv". Compiler will remove "if (is_csv)"
branch in it by this optimization.
This doesn't change existing logic. This just moves existing codes.
---
 src/backend/commands/copyfrom.c          | 215 ++++++---
 src/backend/commands/copyfromparse.c     | 556 +++++++++++++----------
 src/backend/commands/copyto.c            | 480 ++++++++++++-------
 src/include/commands/copy.h              |   2 -
 src/include/commands/copyfrom_internal.h |   8 +
 5 files changed, 813 insertions(+), 448 deletions(-)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index ff13b3e3592..1a59202f5ab 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -103,6 +103,157 @@ typedef struct CopyMultiInsertInfo
 /* non-export function prototypes */
 static void ClosePipeFromProgram(CopyFromState cstate);
 
+
+/*
+ * CopyFromRoutine implementations for text and CSV.
+ */
+
+/*
+ * CopyFromTextLikeInFunc
+ *
+ * Assign input function data for a relation's attribute in text/CSV format.
+ */
+static void
+CopyFromTextLikeInFunc(CopyFromState cstate, Oid atttypid,
+                       FmgrInfo *finfo, Oid *typioparam)
+{
+    Oid            func_oid;
+
+    getTypeInputInfo(atttypid, &func_oid, typioparam);
+    fmgr_info(func_oid, finfo);
+}
+
+/*
+ * CopyFromTextLikeStart
+ *
+ * Start of COPY FROM for text/CSV format.
+ */
+static void
+CopyFromTextLikeStart(CopyFromState cstate, TupleDesc tupDesc)
+{
+    AttrNumber    attr_count;
+
+    /*
+     * If encoding conversion is needed, we need another buffer to hold the
+     * converted input data.  Otherwise, we can just point input_buf to the
+     * same buffer as raw_buf.
+     */
+    if (cstate->need_transcoding)
+    {
+        cstate->input_buf = (char *) palloc(INPUT_BUF_SIZE + 1);
+        cstate->input_buf_index = cstate->input_buf_len = 0;
+    }
+    else
+        cstate->input_buf = cstate->raw_buf;
+    cstate->input_reached_eof = false;
+
+    initStringInfo(&cstate->line_buf);
+
+    /*
+     * Create workspace for CopyReadAttributes results; used by CSV and text
+     * format.
+     */
+    attr_count = list_length(cstate->attnumlist);
+    cstate->max_fields = attr_count;
+    cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *));
+}
+
+/*
+ * CopyFromTextLikeEnd
+ *
+ * End of COPY FROM for text/CSV format.
+ */
+static void
+CopyFromTextLikeEnd(CopyFromState cstate)
+{
+    /* nothing to do */
+}
+
+/*
+ * CopyFromRoutine implementation for "binary".
+ */
+
+/*
+ * CopyFromBinaryInFunc
+ *
+ * Assign input function data for a relation's attribute in binary format.
+ */
+static void
+CopyFromBinaryInFunc(CopyFromState cstate, Oid atttypid,
+                     FmgrInfo *finfo, Oid *typioparam)
+{
+    Oid            func_oid;
+
+    getTypeBinaryInputInfo(atttypid, &func_oid, typioparam);
+    fmgr_info(func_oid, finfo);
+}
+
+/*
+ * CopyFromBinaryStart
+ *
+ * Start of COPY FROM for binary format.
+ */
+static void
+CopyFromBinaryStart(CopyFromState cstate, TupleDesc tupDesc)
+{
+    /* Read and verify binary header */
+    ReceiveCopyBinaryHeader(cstate);
+}
+
+/*
+ * CopyFromBinaryEnd
+ *
+ * End of COPY FROM for binary format.
+ */
+static void
+CopyFromBinaryEnd(CopyFromState cstate)
+{
+    /* nothing to do */
+}
+
+/*
+ * Routines assigned to each format.
++
+ * CSV and text share the same implementation, at the exception of the
+ * per-row callback.
+ */
+static const CopyFromRoutine CopyFromRoutineText = {
+    .CopyFromInFunc = CopyFromTextLikeInFunc,
+    .CopyFromStart = CopyFromTextLikeStart,
+    .CopyFromOneRow = CopyFromTextOneRow,
+    .CopyFromEnd = CopyFromTextLikeEnd,
+};
+
+static const CopyFromRoutine CopyFromRoutineCSV = {
+    .CopyFromInFunc = CopyFromTextLikeInFunc,
+    .CopyFromStart = CopyFromTextLikeStart,
+    .CopyFromOneRow = CopyFromCSVOneRow,
+    .CopyFromEnd = CopyFromTextLikeEnd,
+};
+
+static const CopyFromRoutine CopyFromRoutineBinary = {
+    .CopyFromInFunc = CopyFromBinaryInFunc,
+    .CopyFromStart = CopyFromBinaryStart,
+    .CopyFromOneRow = CopyFromBinaryOneRow,
+    .CopyFromEnd = CopyFromBinaryEnd,
+};
+
+/*
+ * Define the COPY FROM routines to use for a format.
+ */
+static const CopyFromRoutine *
+CopyFromGetRoutine(CopyFormatOptions opts)
+{
+    if (opts.csv_mode)
+        return &CopyFromRoutineCSV;
+    else if (opts.binary)
+        return &CopyFromRoutineBinary;
+
+    /* default is text */
+    return &CopyFromRoutineText;
+}
+
+
 /*
  * error context callback for COPY FROM
  *
@@ -1381,7 +1532,6 @@ BeginCopyFrom(ParseState *pstate,
                 num_defaults;
     FmgrInfo   *in_functions;
     Oid           *typioparams;
-    Oid            in_func_oid;
     int           *defmap;
     ExprState **defexprs;
     MemoryContext oldcontext;
@@ -1413,6 +1563,9 @@ BeginCopyFrom(ParseState *pstate,
     /* Extract options from the statement node tree */
     ProcessCopyOptions(pstate, &cstate->opts, true /* is_from */ , options);
 
+    /* Set format routine */
+    cstate->routine = CopyFromGetRoutine(cstate->opts);
+
     /* Process the target relation */
     cstate->rel = rel;
 
@@ -1566,25 +1719,6 @@ BeginCopyFrom(ParseState *pstate,
     cstate->raw_buf_index = cstate->raw_buf_len = 0;
     cstate->raw_reached_eof = false;
 
-    if (!cstate->opts.binary)
-    {
-        /*
-         * If encoding conversion is needed, we need another buffer to hold
-         * the converted input data.  Otherwise, we can just point input_buf
-         * to the same buffer as raw_buf.
-         */
-        if (cstate->need_transcoding)
-        {
-            cstate->input_buf = (char *) palloc(INPUT_BUF_SIZE + 1);
-            cstate->input_buf_index = cstate->input_buf_len = 0;
-        }
-        else
-            cstate->input_buf = cstate->raw_buf;
-        cstate->input_reached_eof = false;
-
-        initStringInfo(&cstate->line_buf);
-    }
-
     initStringInfo(&cstate->attribute_buf);
 
     /* Assign range table and rteperminfos, we'll need them in CopyFrom. */
@@ -1617,23 +1751,9 @@ BeginCopyFrom(ParseState *pstate,
             continue;
 
         /* Fetch the input function and typioparam info */
-        if (cstate->opts.binary)
-        {
-            getTypeBinaryInputInfo(att->atttypid,
-                                   &in_func_oid, &typioparams[attnum - 1]);
-            fmgr_info(in_func_oid, &in_functions[attnum - 1]);
-        }
-        else if (cstate->routine)
-            cstate->routine->CopyFromInFunc(cstate, att->atttypid,
-                                            &in_functions[attnum - 1],
-                                            &typioparams[attnum - 1]);
-
-        else
-        {
-            getTypeInputInfo(att->atttypid,
-                             &in_func_oid, &typioparams[attnum - 1]);
-            fmgr_info(in_func_oid, &in_functions[attnum - 1]);
-        }
+        cstate->routine->CopyFromInFunc(cstate, att->atttypid,
+                                        &in_functions[attnum - 1],
+                                        &typioparams[attnum - 1]);
 
         /* Get default info if available */
         defexprs[attnum - 1] = NULL;
@@ -1768,23 +1888,7 @@ BeginCopyFrom(ParseState *pstate,
 
     pgstat_progress_update_multi_param(3, progress_cols, progress_vals);
 
-    if (cstate->opts.binary)
-    {
-        /* Read and verify binary header */
-        ReceiveCopyBinaryHeader(cstate);
-    }
-    else if (cstate->routine)
-    {
-        cstate->routine->CopyFromStart(cstate, tupDesc);
-    }
-    else
-    {
-        /* create workspace for CopyReadAttributes results */
-        AttrNumber    attr_count = list_length(cstate->attnumlist);
-
-        cstate->max_fields = attr_count;
-        cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *));
-    }
+    cstate->routine->CopyFromStart(cstate, tupDesc);
 
     MemoryContextSwitchTo(oldcontext);
 
@@ -1797,8 +1901,7 @@ BeginCopyFrom(ParseState *pstate,
 void
 EndCopyFrom(CopyFromState cstate)
 {
-    if (cstate->routine)
-        cstate->routine->CopyFromEnd(cstate);
+    cstate->routine->CopyFromEnd(cstate);
 
     /* No COPY FROM related resources except memory. */
     if (cstate->is_program)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 92b8d5e72d5..90824b47785 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -149,10 +149,10 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 
 
 /* non-export function prototypes */
-static bool CopyReadLine(CopyFromState cstate);
-static bool CopyReadLineText(CopyFromState cstate);
-static int    CopyReadAttributesText(CopyFromState cstate);
-static int    CopyReadAttributesCSV(CopyFromState cstate);
+static inline bool CopyReadLine(CopyFromState cstate, bool is_csv);
+static inline bool CopyReadLineText(CopyFromState cstate, bool is_csv);
+static inline int CopyReadAttributesText(CopyFromState cstate);
+static inline int CopyReadAttributesCSV(CopyFromState cstate);
 static Datum CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
                                      Oid typioparam, int32 typmod,
                                      bool *isnull);
@@ -750,8 +750,8 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes)
  *
  * NOTE: force_not_null option are not applied to the returned fields.
  */
-bool
-NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
+static inline bool
+NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields, bool is_csv)
 {
     int            fldct;
     bool        done;
@@ -768,13 +768,17 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
         tupDesc = RelationGetDescr(cstate->rel);
 
         cstate->cur_lineno++;
-        done = CopyReadLine(cstate);
+        done = CopyReadLine(cstate, is_csv);
 
         if (cstate->opts.header_line == COPY_HEADER_MATCH)
         {
             int            fldnum;
 
-            if (cstate->opts.csv_mode)
+            /*
+             * is_csv will be optimized away by compiler, as argument is
+             * constant at caller.
+             */
+            if (is_csv)
                 fldct = CopyReadAttributesCSV(cstate);
             else
                 fldct = CopyReadAttributesText(cstate);
@@ -818,7 +822,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
     cstate->cur_lineno++;
 
     /* Actually read the line into memory here */
-    done = CopyReadLine(cstate);
+    done = CopyReadLine(cstate, is_csv);
 
     /*
      * EOF at start of line means we're done.  If we see EOF after some
@@ -828,8 +832,13 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
     if (done && cstate->line_buf.len == 0)
         return false;
 
-    /* Parse the line into de-escaped field values */
-    if (cstate->opts.csv_mode)
+    /*
+     * Parse the line into de-escaped field values
+     *
+     * is_csv will be optimized away by compiler, as argument is constant at
+     * caller.
+     */
+    if (is_csv)
         fldct = CopyReadAttributesCSV(cstate);
     else
         fldct = CopyReadAttributesText(cstate);
@@ -839,6 +848,267 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
     return true;
 }
 
+/*
+ * CopyFromTextLikeOneRow
+ *
+ * Copy one row to a set of `values` and `nulls` for the text and CSV
+ * formats.
+ *
+ * Workhorse for CopyFromTextOneRow() and CopyFromCSVOneRow().
+ */
+static inline bool
+CopyFromTextLikeOneRow(CopyFromState cstate,
+                       ExprContext *econtext,
+                       Datum *values,
+                       bool *nulls,
+                       bool is_csv)
+{
+    TupleDesc    tupDesc;
+    AttrNumber    attr_count;
+    FmgrInfo   *in_functions = cstate->in_functions;
+    Oid           *typioparams = cstate->typioparams;
+    ExprState **defexprs = cstate->defexprs;
+    char      **field_strings;
+    ListCell   *cur;
+    int            fldct;
+    int            fieldno;
+    char       *string;
+
+    tupDesc = RelationGetDescr(cstate->rel);
+    attr_count = list_length(cstate->attnumlist);
+
+    /* read raw fields in the next line */
+    if (!NextCopyFromRawFields(cstate, &field_strings, &fldct, is_csv))
+        return false;
+
+    /* check for overflowing fields */
+    if (attr_count > 0 && fldct > attr_count)
+        ereport(ERROR,
+                (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                 errmsg("extra data after last expected column")));
+
+    fieldno = 0;
+
+    /* Loop to read the user attributes on the line. */
+    foreach(cur, cstate->attnumlist)
+    {
+        int            attnum = lfirst_int(cur);
+        int            m = attnum - 1;
+        Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+        if (fieldno >= fldct)
+            ereport(ERROR,
+                    (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                     errmsg("missing data for column \"%s\"",
+                            NameStr(att->attname))));
+        string = field_strings[fieldno++];
+
+        if (cstate->convert_select_flags &&
+            !cstate->convert_select_flags[m])
+        {
+            /* ignore input field, leaving column as NULL */
+            continue;
+        }
+
+        if (is_csv)
+        {
+            if (string == NULL &&
+                cstate->opts.force_notnull_flags[m])
+            {
+                /*
+                 * FORCE_NOT_NULL option is set and column is NULL - convert
+                 * it to the NULL string.
+                 */
+                string = cstate->opts.null_print;
+            }
+            else if (string != NULL && cstate->opts.force_null_flags[m]
+                     && strcmp(string, cstate->opts.null_print) == 0)
+            {
+                /*
+                 * FORCE_NULL option is set and column matches the NULL
+                 * string. It must have been quoted, or otherwise the string
+                 * would already have been set to NULL. Convert it to NULL as
+                 * specified.
+                 */
+                string = NULL;
+            }
+        }
+
+        cstate->cur_attname = NameStr(att->attname);
+        cstate->cur_attval = string;
+
+        if (string != NULL)
+            nulls[m] = false;
+
+        if (cstate->defaults[m])
+        {
+            /*
+             * The caller must supply econtext and have switched into the
+             * per-tuple memory context in it.
+             */
+            Assert(econtext != NULL);
+            Assert(CurrentMemoryContext == econtext->ecxt_per_tuple_memory);
+
+            values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]);
+        }
+
+        /*
+         * If ON_ERROR is specified with IGNORE, skip rows with soft errors
+         */
+        else if (!InputFunctionCallSafe(&in_functions[m],
+                                        string,
+                                        typioparams[m],
+                                        att->atttypmod,
+                                        (Node *) cstate->escontext,
+                                        &values[m]))
+        {
+            Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
+
+            cstate->num_errors++;
+
+            if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+            {
+                /*
+                 * Since we emit line number and column info in the below
+                 * notice message, we suppress error context information other
+                 * than the relation name.
+                 */
+                Assert(!cstate->relname_only);
+                cstate->relname_only = true;
+
+                if (cstate->cur_attval)
+                {
+                    char       *attval;
+
+                    attval = CopyLimitPrintoutLength(cstate->cur_attval);
+                    ereport(NOTICE,
+                            errmsg("skipping row due to data type incompatibility at line %llu for column %s:
\"%s\"",
+                                   (unsigned long long) cstate->cur_lineno,
+                                   cstate->cur_attname,
+                                   attval));
+                    pfree(attval);
+                }
+                else
+                    ereport(NOTICE,
+                            errmsg("skipping row due to data type incompatibility at line %llu for column %s: null
input",
+                                   (unsigned long long) cstate->cur_lineno,
+                                   cstate->cur_attname));
+
+                /* reset relname_only */
+                cstate->relname_only = false;
+            }
+
+            return true;
+        }
+
+        cstate->cur_attname = NULL;
+        cstate->cur_attval = NULL;
+    }
+
+    Assert(fieldno == attr_count);
+
+    return true;
+}
+
+
+/*
+ * CopyFromTextOneRow
+ *
+ * Per-row callback for COPY FROM with text format.
+ */
+bool
+CopyFromTextOneRow(CopyFromState cstate,
+                   ExprContext *econtext,
+                   Datum *values,
+                   bool *nulls)
+{
+    return CopyFromTextLikeOneRow(cstate, econtext, values, nulls, false);
+}
+
+/*
+ * CopyFromCSVOneRow
+ *
+ * Per-row callback for COPY FROM with CSV format.
+ */
+bool
+CopyFromCSVOneRow(CopyFromState cstate,
+                  ExprContext *econtext,
+                  Datum *values,
+                  bool *nulls)
+{
+    return CopyFromTextLikeOneRow(cstate, econtext, values, nulls, true);
+}
+
+/*
+ * CopyFromBinaryOneRow
+ *
+ * Copy one row to a set of `values` and `nulls` for the binary format.
+ */
+bool
+CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext,
+                     Datum *values, bool *nulls)
+{
+    TupleDesc    tupDesc;
+    AttrNumber    attr_count;
+    FmgrInfo   *in_functions = cstate->in_functions;
+    Oid           *typioparams = cstate->typioparams;
+    int16        fld_count;
+    ListCell   *cur;
+
+    tupDesc = RelationGetDescr(cstate->rel);
+    attr_count = list_length(cstate->attnumlist);
+
+    cstate->cur_lineno++;
+
+    if (!CopyGetInt16(cstate, &fld_count))
+    {
+        /* EOF detected (end of file, or protocol-level EOF) */
+        return false;
+    }
+
+    if (fld_count == -1)
+    {
+        /*
+         * Received EOF marker.  Wait for the protocol-level EOF, and complain
+         * if it doesn't come immediately.  In COPY FROM STDIN, this ensures
+         * that we correctly handle CopyFail, if client chooses to send that
+         * now.  When copying from file, we could ignore the rest of the file
+         * like in text mode, but we choose to be consistent with the COPY
+         * FROM STDIN case.
+         */
+        char        dummy;
+
+        if (CopyReadBinaryData(cstate, &dummy, 1) > 0)
+            ereport(ERROR,
+                    (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                     errmsg("received copy data after EOF marker")));
+        return false;
+    }
+
+    if (fld_count != attr_count)
+        ereport(ERROR,
+                (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                 errmsg("row field count is %d, expected %d",
+                        (int) fld_count, attr_count)));
+
+    foreach(cur, cstate->attnumlist)
+    {
+        int            attnum = lfirst_int(cur);
+        int            m = attnum - 1;
+        Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+        cstate->cur_attname = NameStr(att->attname);
+        values[m] = CopyReadBinaryAttribute(cstate,
+                                            &in_functions[m],
+                                            typioparams[m],
+                                            att->atttypmod,
+                                            &nulls[m]);
+        cstate->cur_attname = NULL;
+    }
+
+    return true;
+}
+
 /*
  * Read next tuple from file for COPY FROM. Return false if no more tuples.
  *
@@ -856,221 +1126,21 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 {
     TupleDesc    tupDesc;
     AttrNumber    num_phys_attrs,
-                attr_count,
                 num_defaults = cstate->num_defaults;
-    FmgrInfo   *in_functions = cstate->in_functions;
-    Oid           *typioparams = cstate->typioparams;
     int            i;
     int           *defmap = cstate->defmap;
     ExprState **defexprs = cstate->defexprs;
 
     tupDesc = RelationGetDescr(cstate->rel);
     num_phys_attrs = tupDesc->natts;
-    attr_count = list_length(cstate->attnumlist);
 
     /* Initialize all values for row to NULL */
     MemSet(values, 0, num_phys_attrs * sizeof(Datum));
     MemSet(nulls, true, num_phys_attrs * sizeof(bool));
     MemSet(cstate->defaults, false, num_phys_attrs * sizeof(bool));
 
-    if (!cstate->opts.binary)
-    {
-        char      **field_strings;
-        ListCell   *cur;
-        int            fldct;
-        int            fieldno;
-        char       *string;
-
-        /* read raw fields in the next line */
-        if (!NextCopyFromRawFields(cstate, &field_strings, &fldct))
-            return false;
-
-        /* check for overflowing fields */
-        if (attr_count > 0 && fldct > attr_count)
-            ereport(ERROR,
-                    (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                     errmsg("extra data after last expected column")));
-
-        fieldno = 0;
-
-        /* Loop to read the user attributes on the line. */
-        foreach(cur, cstate->attnumlist)
-        {
-            int            attnum = lfirst_int(cur);
-            int            m = attnum - 1;
-            Form_pg_attribute att = TupleDescAttr(tupDesc, m);
-
-            if (fieldno >= fldct)
-                ereport(ERROR,
-                        (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                         errmsg("missing data for column \"%s\"",
-                                NameStr(att->attname))));
-            string = field_strings[fieldno++];
-
-            if (cstate->convert_select_flags &&
-                !cstate->convert_select_flags[m])
-            {
-                /* ignore input field, leaving column as NULL */
-                continue;
-            }
-
-            if (cstate->opts.csv_mode)
-            {
-                if (string == NULL &&
-                    cstate->opts.force_notnull_flags[m])
-                {
-                    /*
-                     * FORCE_NOT_NULL option is set and column is NULL -
-                     * convert it to the NULL string.
-                     */
-                    string = cstate->opts.null_print;
-                }
-                else if (string != NULL && cstate->opts.force_null_flags[m]
-                         && strcmp(string, cstate->opts.null_print) == 0)
-                {
-                    /*
-                     * FORCE_NULL option is set and column matches the NULL
-                     * string. It must have been quoted, or otherwise the
-                     * string would already have been set to NULL. Convert it
-                     * to NULL as specified.
-                     */
-                    string = NULL;
-                }
-            }
-
-            cstate->cur_attname = NameStr(att->attname);
-            cstate->cur_attval = string;
-
-            if (string != NULL)
-                nulls[m] = false;
-
-            if (cstate->defaults[m])
-            {
-                /*
-                 * The caller must supply econtext and have switched into the
-                 * per-tuple memory context in it.
-                 */
-                Assert(econtext != NULL);
-                Assert(CurrentMemoryContext == econtext->ecxt_per_tuple_memory);
-
-                values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]);
-            }
-
-            /*
-             * If ON_ERROR is specified with IGNORE, skip rows with soft
-             * errors
-             */
-            else if (!InputFunctionCallSafe(&in_functions[m],
-                                            string,
-                                            typioparams[m],
-                                            att->atttypmod,
-                                            (Node *) cstate->escontext,
-                                            &values[m]))
-            {
-                Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
-
-                cstate->num_errors++;
-
-                if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
-                {
-                    /*
-                     * Since we emit line number and column info in the below
-                     * notice message, we suppress error context information
-                     * other than the relation name.
-                     */
-                    Assert(!cstate->relname_only);
-                    cstate->relname_only = true;
-
-                    if (cstate->cur_attval)
-                    {
-                        char       *attval;
-
-                        attval = CopyLimitPrintoutLength(cstate->cur_attval);
-                        ereport(NOTICE,
-                                errmsg("skipping row due to data type incompatibility at line %llu for column %s:
\"%s\"",
-                                       (unsigned long long) cstate->cur_lineno,
-                                       cstate->cur_attname,
-                                       attval));
-                        pfree(attval);
-                    }
-                    else
-                        ereport(NOTICE,
-                                errmsg("skipping row due to data type incompatibility at line %llu for column %s: null
input",
-                                       (unsigned long long) cstate->cur_lineno,
-                                       cstate->cur_attname));
-
-                    /* reset relname_only */
-                    cstate->relname_only = false;
-                }
-
-                return true;
-            }
-
-            cstate->cur_attname = NULL;
-            cstate->cur_attval = NULL;
-        }
-
-        Assert(fieldno == attr_count);
-    }
-    else if (cstate->routine)
-    {
-        if (!cstate->routine->CopyFromOneRow(cstate, econtext, values, nulls))
-            return false;
-    }
-    else
-    {
-        /* binary */
-        int16        fld_count;
-        ListCell   *cur;
-
-        cstate->cur_lineno++;
-
-        if (!CopyGetInt16(cstate, &fld_count))
-        {
-            /* EOF detected (end of file, or protocol-level EOF) */
-            return false;
-        }
-
-        if (fld_count == -1)
-        {
-            /*
-             * Received EOF marker.  Wait for the protocol-level EOF, and
-             * complain if it doesn't come immediately.  In COPY FROM STDIN,
-             * this ensures that we correctly handle CopyFail, if client
-             * chooses to send that now.  When copying from file, we could
-             * ignore the rest of the file like in text mode, but we choose to
-             * be consistent with the COPY FROM STDIN case.
-             */
-            char        dummy;
-
-            if (CopyReadBinaryData(cstate, &dummy, 1) > 0)
-                ereport(ERROR,
-                        (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                         errmsg("received copy data after EOF marker")));
-            return false;
-        }
-
-        if (fld_count != attr_count)
-            ereport(ERROR,
-                    (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                     errmsg("row field count is %d, expected %d",
-                            (int) fld_count, attr_count)));
-
-        foreach(cur, cstate->attnumlist)
-        {
-            int            attnum = lfirst_int(cur);
-            int            m = attnum - 1;
-            Form_pg_attribute att = TupleDescAttr(tupDesc, m);
-
-            cstate->cur_attname = NameStr(att->attname);
-            values[m] = CopyReadBinaryAttribute(cstate,
-                                                &in_functions[m],
-                                                typioparams[m],
-                                                att->atttypmod,
-                                                &nulls[m]);
-            cstate->cur_attname = NULL;
-        }
-    }
+    if (!cstate->routine->CopyFromOneRow(cstate, econtext, values, nulls))
+        return false;
 
     /*
      * Now compute and insert any defaults available for the columns not
@@ -1100,8 +1170,8 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
  * by newline.  The terminating newline or EOF marker is not included
  * in the final value of line_buf.
  */
-static bool
-CopyReadLine(CopyFromState cstate)
+static inline bool
+CopyReadLine(CopyFromState cstate, bool is_csv)
 {
     bool        result;
 
@@ -1109,7 +1179,7 @@ CopyReadLine(CopyFromState cstate)
     cstate->line_buf_valid = false;
 
     /* Parse data and transfer into line_buf */
-    result = CopyReadLineText(cstate);
+    result = CopyReadLineText(cstate, is_csv);
 
     if (result)
     {
@@ -1176,8 +1246,8 @@ CopyReadLine(CopyFromState cstate)
 /*
  * CopyReadLineText - inner loop of CopyReadLine for text mode
  */
-static bool
-CopyReadLineText(CopyFromState cstate)
+static inline bool
+CopyReadLineText(CopyFromState cstate, bool is_csv)
 {
     char       *copy_input_buf;
     int            input_buf_ptr;
@@ -1193,7 +1263,11 @@ CopyReadLineText(CopyFromState cstate)
     char        quotec = '\0';
     char        escapec = '\0';
 
-    if (cstate->opts.csv_mode)
+    /*
+     * is_csv will be optimized away by compiler, as argument is constant at
+     * caller.
+     */
+    if (is_csv)
     {
         quotec = cstate->opts.quote[0];
         escapec = cstate->opts.escape[0];
@@ -1270,7 +1344,11 @@ CopyReadLineText(CopyFromState cstate)
         prev_raw_ptr = input_buf_ptr;
         c = copy_input_buf[input_buf_ptr++];
 
-        if (cstate->opts.csv_mode)
+        /*
+         * is_csv will be optimized away by compiler, as argument is constant
+         * at caller.
+         */
+        if (is_csv)
         {
             /*
              * If character is '\\' or '\r', we may need to look ahead below.
@@ -1309,7 +1387,7 @@ CopyReadLineText(CopyFromState cstate)
         }
 
         /* Process \r */
-        if (c == '\r' && (!cstate->opts.csv_mode || !in_quote))
+        if (c == '\r' && (!is_csv || !in_quote))
         {
             /* Check for \r\n on first line, _and_ handle \r\n. */
             if (cstate->eol_type == EOL_UNKNOWN ||
@@ -1337,10 +1415,10 @@ CopyReadLineText(CopyFromState cstate)
                     if (cstate->eol_type == EOL_CRNL)
                         ereport(ERROR,
                                 (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                                 !cstate->opts.csv_mode ?
+                                 !is_csv ?
                                  errmsg("literal carriage return found in data") :
                                  errmsg("unquoted carriage return found in data"),
-                                 !cstate->opts.csv_mode ?
+                                 !is_csv ?
                                  errhint("Use \"\\r\" to represent carriage return.") :
                                  errhint("Use quoted CSV field to represent carriage return.")));
 
@@ -1354,10 +1432,10 @@ CopyReadLineText(CopyFromState cstate)
             else if (cstate->eol_type == EOL_NL)
                 ereport(ERROR,
                         (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                         !cstate->opts.csv_mode ?
+                         !is_csv ?
                          errmsg("literal carriage return found in data") :
                          errmsg("unquoted carriage return found in data"),
-                         !cstate->opts.csv_mode ?
+                         !is_csv ?
                          errhint("Use \"\\r\" to represent carriage return.") :
                          errhint("Use quoted CSV field to represent carriage return.")));
             /* If reach here, we have found the line terminator */
@@ -1365,15 +1443,15 @@ CopyReadLineText(CopyFromState cstate)
         }
 
         /* Process \n */
-        if (c == '\n' && (!cstate->opts.csv_mode || !in_quote))
+        if (c == '\n' && (!is_csv || !in_quote))
         {
             if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL)
                 ereport(ERROR,
                         (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                         !cstate->opts.csv_mode ?
+                         !is_csv ?
                          errmsg("literal newline found in data") :
                          errmsg("unquoted newline found in data"),
-                         !cstate->opts.csv_mode ?
+                         !is_csv ?
                          errhint("Use \"\\n\" to represent newline.") :
                          errhint("Use quoted CSV field to represent newline.")));
             cstate->eol_type = EOL_NL;    /* in case not set yet */
@@ -1385,7 +1463,7 @@ CopyReadLineText(CopyFromState cstate)
          * In CSV mode, we only recognize \. alone on a line.  This is because
          * \. is a valid CSV data value.
          */
-        if (c == '\\' && (!cstate->opts.csv_mode || first_char_in_line))
+        if (c == '\\' && (!is_csv || first_char_in_line))
         {
             char        c2;
 
@@ -1418,7 +1496,11 @@ CopyReadLineText(CopyFromState cstate)
 
                     if (c2 == '\n')
                     {
-                        if (!cstate->opts.csv_mode)
+                        /*
+                         * is_csv will be optimized away by compiler, as
+                         * argument is constant at caller.
+                         */
+                        if (!is_csv)
                             ereport(ERROR,
                                     (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
                                      errmsg("end-of-copy marker does not match previous newline style")));
@@ -1427,7 +1509,11 @@ CopyReadLineText(CopyFromState cstate)
                     }
                     else if (c2 != '\r')
                     {
-                        if (!cstate->opts.csv_mode)
+                        /*
+                         * is_csv will be optimized away by compiler, as
+                         * argument is constant at caller.
+                         */
+                        if (!is_csv)
                             ereport(ERROR,
                                     (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
                                      errmsg("end-of-copy marker corrupt")));
@@ -1443,7 +1529,11 @@ CopyReadLineText(CopyFromState cstate)
 
                 if (c2 != '\r' && c2 != '\n')
                 {
-                    if (!cstate->opts.csv_mode)
+                    /*
+                     * is_csv will be optimized away by compiler, as argument
+                     * is constant at caller.
+                     */
+                    if (!is_csv)
                         ereport(ERROR,
                                 (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
                                  errmsg("end-of-copy marker corrupt")));
@@ -1472,7 +1562,7 @@ CopyReadLineText(CopyFromState cstate)
                 result = true;    /* report EOF */
                 break;
             }
-            else if (!cstate->opts.csv_mode)
+            else if (!is_csv)
             {
                 /*
                  * If we are here, it means we found a backslash followed by
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index ff19c457abf..c7f69ba606d 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -128,6 +128,321 @@ static void CopySendEndOfRow(CopyToState cstate);
 static void CopySendInt32(CopyToState cstate, int32 val);
 static void CopySendInt16(CopyToState cstate, int16 val);
 
+/*
+ * CopyToRoutine implementations.
+ */
+
+/*
+ * CopyToTextLikeSendEndOfRow
+ *
+ * Apply line terminations for a line sent in text or CSV format depending
+ * on the destination, then send the end of a row.
+ */
+static inline void
+CopyToTextLikeSendEndOfRow(CopyToState cstate)
+{
+    switch (cstate->copy_dest)
+    {
+        case COPY_FILE:
+            /* Default line termination depends on platform */
+#ifndef WIN32
+            CopySendChar(cstate, '\n');
+#else
+            CopySendString(cstate, "\r\n");
+#endif
+            break;
+        case COPY_FRONTEND:
+            /* The FE/BE protocol uses \n as newline for all platforms */
+            CopySendChar(cstate, '\n');
+            break;
+        default:
+            break;
+    }
+
+    /* Now take the actions related to the end of a row */
+    CopySendEndOfRow(cstate);
+}
+
+/*
+ * CopyToTextLikeStart
+ *
+ * Start of COPY TO for text and CSV format.
+ */
+static void
+CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
+{
+    /*
+     * For non-binary copy, we need to convert null_print to file encoding,
+     * because it will be sent directly with CopySendString.
+     */
+    if (cstate->need_transcoding)
+        cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print,
+                                                          cstate->opts.null_print_len,
+                                                          cstate->file_encoding);
+
+    /* if a header has been requested send the line */
+    if (cstate->opts.header_line)
+    {
+        ListCell   *cur;
+        bool        hdr_delim = false;
+
+        foreach(cur, cstate->attnumlist)
+        {
+            int            attnum = lfirst_int(cur);
+            char       *colname;
+
+            if (hdr_delim)
+                CopySendChar(cstate, cstate->opts.delim[0]);
+            hdr_delim = true;
+
+            colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
+
+            if (cstate->opts.csv_mode)
+                CopyAttributeOutCSV(cstate, colname, false);
+            else
+                CopyAttributeOutText(cstate, colname);
+        }
+
+        CopyToTextLikeSendEndOfRow(cstate);
+    }
+}
+
+/*
+ * CopyToTextLikeOutFunc
+ *
+ * Assign output function data for a relation's attribute in text/CSV format.
+ */
+static void
+CopyToTextLikeOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo)
+{
+    Oid            func_oid;
+    bool        is_varlena;
+
+    /* Set output function for an attribute */
+    getTypeOutputInfo(atttypid, &func_oid, &is_varlena);
+    fmgr_info(func_oid, finfo);
+}
+
+
+/*
+ * CopyToTextLikeOneRow
+ *
+ * Process one row for text/CSV format.
+ *
+ * Workhorse for CopyToTextOneRow() and CopyToCSVOneRow().
+ */
+static inline void
+CopyToTextLikeOneRow(CopyToState cstate,
+                     TupleTableSlot *slot,
+                     bool is_csv)
+{
+    bool        need_delim = false;
+    FmgrInfo   *out_functions = cstate->out_functions;
+    ListCell   *cur;
+
+    foreach(cur, cstate->attnumlist)
+    {
+        int            attnum = lfirst_int(cur);
+        Datum        value = slot->tts_values[attnum - 1];
+        bool        isnull = slot->tts_isnull[attnum - 1];
+
+        if (need_delim)
+            CopySendChar(cstate, cstate->opts.delim[0]);
+        need_delim = true;
+
+        if (isnull)
+        {
+            CopySendString(cstate, cstate->opts.null_print_client);
+        }
+        else
+        {
+            char       *string;
+
+            string = OutputFunctionCall(&out_functions[attnum - 1],
+                                        value);
+
+            /*
+             * is_csv will be optimized away by compiler, as argument is
+             * constant at caller.
+             */
+            if (is_csv)
+                CopyAttributeOutCSV(cstate, string,
+                                    cstate->opts.force_quote_flags[attnum - 1]);
+            else
+                CopyAttributeOutText(cstate, string);
+        }
+    }
+
+    CopyToTextLikeSendEndOfRow(cstate);
+}
+
+/*
+ * CopyToTextOneRow
+ *
+ * Per-row callback for COPY TO with text format.
+ */
+static void
+CopyToTextOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+    CopyToTextLikeOneRow(cstate, slot, false);
+}
+
+/*
+ * CopyToTextOneRow
+ *
+ * Per-row callback for COPY TO with CSV format.
+ */
+static void
+CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+    CopyToTextLikeOneRow(cstate, slot, true);
+}
+
+/*
+ * CopyToTextLikeEnd
+ *
+ * End of COPY TO for text/CSV format.
+ */
+static void
+CopyToTextLikeEnd(CopyToState cstate)
+{
+    /* Nothing to do here */
+}
+
+/*
+ * CopyToRoutine implementation for "binary".
+ */
+
+/*
+ * CopyToBinaryStart
+ *
+ * Start of COPY TO for binary format.
+ */
+static void
+CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc)
+{
+    /* Generate header for a binary copy */
+    int32        tmp;
+
+    /* Signature */
+    CopySendData(cstate, BinarySignature, 11);
+    /* Flags field */
+    tmp = 0;
+    CopySendInt32(cstate, tmp);
+    /* No header extension */
+    tmp = 0;
+    CopySendInt32(cstate, tmp);
+}
+
+/*
+ * CopyToBinaryOutFunc
+ *
+ * Assign output function data for a relation's attribute in binary format.
+ */
+static void
+CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo)
+{
+    Oid            func_oid;
+    bool        is_varlena;
+
+    /* Set output function for an attribute */
+    getTypeBinaryOutputInfo(atttypid, &func_oid, &is_varlena);
+    fmgr_info(func_oid, finfo);
+}
+
+/*
+ * CopyToBinaryOneRow
+ *
+ * Process one row for binary format.
+ */
+static void
+CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+    FmgrInfo   *out_functions = cstate->out_functions;
+    ListCell   *cur;
+
+    /* Binary per-tuple header */
+    CopySendInt16(cstate, list_length(cstate->attnumlist));
+
+    foreach(cur, cstate->attnumlist)
+    {
+        int            attnum = lfirst_int(cur);
+        Datum        value = slot->tts_values[attnum - 1];
+        bool        isnull = slot->tts_isnull[attnum - 1];
+
+        if (isnull)
+        {
+            CopySendInt32(cstate, -1);
+        }
+        else
+        {
+            bytea       *outputbytes;
+
+            outputbytes = SendFunctionCall(&out_functions[attnum - 1],
+                                           value);
+            CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
+            CopySendData(cstate, VARDATA(outputbytes),
+                         VARSIZE(outputbytes) - VARHDRSZ);
+        }
+    }
+
+    CopySendEndOfRow(cstate);
+}
+
+/*
+ * CopyToBinaryEnd
+ *
+ * End of COPY TO for binary format.
+ */
+static void
+CopyToBinaryEnd(CopyToState cstate)
+{
+    /* Generate trailer for a binary copy */
+    CopySendInt16(cstate, -1);
+    /* Need to flush out the trailer */
+    CopySendEndOfRow(cstate);
+}
+
+/*
+ * CSV and text share the same implementation, at the exception of the
+ * output representation and per-row callbacks.
+ */
+static const CopyToRoutine CopyToRoutineText = {
+    .CopyToStart = CopyToTextLikeStart,
+    .CopyToOutFunc = CopyToTextLikeOutFunc,
+    .CopyToOneRow = CopyToTextOneRow,
+    .CopyToEnd = CopyToTextLikeEnd,
+};
+
+static const CopyToRoutine CopyToRoutineCSV = {
+    .CopyToStart = CopyToTextLikeStart,
+    .CopyToOutFunc = CopyToTextLikeOutFunc,
+    .CopyToOneRow = CopyToCSVOneRow,
+    .CopyToEnd = CopyToTextLikeEnd,
+};
+
+static const CopyToRoutine CopyToRoutineBinary = {
+    .CopyToStart = CopyToBinaryStart,
+    .CopyToOutFunc = CopyToBinaryOutFunc,
+    .CopyToOneRow = CopyToBinaryOneRow,
+    .CopyToEnd = CopyToBinaryEnd,
+};
+
+/*
+ * Define the COPY TO routines to use for a format.  This should be called
+ * after options are parsed.
+ */
+static const CopyToRoutine *
+CopyToGetRoutine(CopyFormatOptions opts)
+{
+    if (opts.csv_mode)
+        return &CopyToRoutineCSV;
+    else if (opts.binary)
+        return &CopyToRoutineBinary;
+
+    /* default is text */
+    return &CopyToRoutineText;
+}
 
 /*
  * Send copy start/stop messages for frontend copies.  These have changed
@@ -195,16 +510,6 @@ CopySendEndOfRow(CopyToState cstate)
     switch (cstate->copy_dest)
     {
         case COPY_FILE:
-            if (!cstate->opts.binary)
-            {
-                /* Default line termination depends on platform */
-#ifndef WIN32
-                CopySendChar(cstate, '\n');
-#else
-                CopySendString(cstate, "\r\n");
-#endif
-            }
-
             if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1,
                        cstate->copy_file) != 1 ||
                 ferror(cstate->copy_file))
@@ -239,10 +544,6 @@ CopySendEndOfRow(CopyToState cstate)
             }
             break;
         case COPY_FRONTEND:
-            /* The FE/BE protocol uses \n as newline for all platforms */
-            if (!cstate->opts.binary)
-                CopySendChar(cstate, '\n');
-
             /* Dump the accumulated row as one CopyData message */
             (void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len);
             break;
@@ -430,6 +731,9 @@ BeginCopyTo(ParseState *pstate,
     /* Extract options from the statement node tree */
     ProcessCopyOptions(pstate, &cstate->opts, false /* is_from */ , options);
 
+    /* Set format routine */
+    cstate->routine = CopyToGetRoutine(cstate->opts);
+
     /* Process the source/target relation or query */
     if (rel)
     {
@@ -770,27 +1074,10 @@ DoCopyTo(CopyToState cstate)
     foreach(cur, cstate->attnumlist)
     {
         int            attnum = lfirst_int(cur);
-        Oid            out_func_oid;
-        bool        isvarlena;
         Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
 
-        if (cstate->opts.binary)
-        {
-            getTypeBinaryOutputInfo(attr->atttypid,
-                                    &out_func_oid,
-                                    &isvarlena);
-            fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
-        }
-        else if (cstate->routine)
-            cstate->routine->CopyToOutFunc(cstate, attr->atttypid,
-                                           &cstate->out_functions[attnum - 1]);
-        else
-        {
-            getTypeOutputInfo(attr->atttypid,
-                              &out_func_oid,
-                              &isvarlena);
-            fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
-        }
+        cstate->routine->CopyToOutFunc(cstate, attr->atttypid,
+                                       &cstate->out_functions[attnum - 1]);
     }
 
     /*
@@ -803,58 +1090,7 @@ DoCopyTo(CopyToState cstate)
                                                "COPY TO",
                                                ALLOCSET_DEFAULT_SIZES);
 
-    if (cstate->opts.binary)
-    {
-        /* Generate header for a binary copy */
-        int32        tmp;
-
-        /* Signature */
-        CopySendData(cstate, BinarySignature, 11);
-        /* Flags field */
-        tmp = 0;
-        CopySendInt32(cstate, tmp);
-        /* No header extension */
-        tmp = 0;
-        CopySendInt32(cstate, tmp);
-    }
-    else if (cstate->routine)
-        cstate->routine->CopyToStart(cstate, tupDesc);
-    else
-    {
-        /*
-         * For non-binary copy, we need to convert null_print to file
-         * encoding, because it will be sent directly with CopySendString.
-         */
-        if (cstate->need_transcoding)
-            cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print,
-                                                              cstate->opts.null_print_len,
-                                                              cstate->file_encoding);
-
-        /* if a header has been requested send the line */
-        if (cstate->opts.header_line)
-        {
-            bool        hdr_delim = false;
-
-            foreach(cur, cstate->attnumlist)
-            {
-                int            attnum = lfirst_int(cur);
-                char       *colname;
-
-                if (hdr_delim)
-                    CopySendChar(cstate, cstate->opts.delim[0]);
-                hdr_delim = true;
-
-                colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
-
-                if (cstate->opts.csv_mode)
-                    CopyAttributeOutCSV(cstate, colname, false);
-                else
-                    CopyAttributeOutText(cstate, colname);
-            }
-
-            CopySendEndOfRow(cstate);
-        }
-    }
+    cstate->routine->CopyToStart(cstate, tupDesc);
 
     if (cstate->rel)
     {
@@ -893,15 +1129,7 @@ DoCopyTo(CopyToState cstate)
         processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
     }
 
-    if (cstate->opts.binary)
-    {
-        /* Generate trailer for a binary copy */
-        CopySendInt16(cstate, -1);
-        /* Need to flush out the trailer */
-        CopySendEndOfRow(cstate);
-    }
-    else if (cstate->routine)
-        cstate->routine->CopyToEnd(cstate);
+    cstate->routine->CopyToEnd(cstate);
 
     MemoryContextDelete(cstate->rowcontext);
 
@@ -917,11 +1145,7 @@ DoCopyTo(CopyToState cstate)
 static void
 CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 {
-    bool        need_delim = false;
-    FmgrInfo   *out_functions = cstate->out_functions;
     MemoryContext oldcontext;
-    ListCell   *cur;
-    char       *string;
 
     MemoryContextReset(cstate->rowcontext);
     oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
@@ -929,65 +1153,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
     /* Make sure the tuple is fully deconstructed */
     slot_getallattrs(slot);
 
-    if (cstate->routine)
-    {
-        cstate->routine->CopyToOneRow(cstate, slot);
-        MemoryContextSwitchTo(oldcontext);
-        return;
-    }
-
-    if (cstate->opts.binary)
-    {
-        /* Binary per-tuple header */
-        CopySendInt16(cstate, list_length(cstate->attnumlist));
-    }
-
-    foreach(cur, cstate->attnumlist)
-    {
-        int            attnum = lfirst_int(cur);
-        Datum        value = slot->tts_values[attnum - 1];
-        bool        isnull = slot->tts_isnull[attnum - 1];
-
-        if (!cstate->opts.binary)
-        {
-            if (need_delim)
-                CopySendChar(cstate, cstate->opts.delim[0]);
-            need_delim = true;
-        }
-
-        if (isnull)
-        {
-            if (!cstate->opts.binary)
-                CopySendString(cstate, cstate->opts.null_print_client);
-            else
-                CopySendInt32(cstate, -1);
-        }
-        else
-        {
-            if (!cstate->opts.binary)
-            {
-                string = OutputFunctionCall(&out_functions[attnum - 1],
-                                            value);
-                if (cstate->opts.csv_mode)
-                    CopyAttributeOutCSV(cstate, string,
-                                        cstate->opts.force_quote_flags[attnum - 1]);
-                else
-                    CopyAttributeOutText(cstate, string);
-            }
-            else
-            {
-                bytea       *outputbytes;
-
-                outputbytes = SendFunctionCall(&out_functions[attnum - 1],
-                                               value);
-                CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
-                CopySendData(cstate, VARDATA(outputbytes),
-                             VARSIZE(outputbytes) - VARHDRSZ);
-            }
-        }
-    }
-
-    CopySendEndOfRow(cstate);
+    cstate->routine->CopyToOneRow(cstate, slot);
 
     MemoryContextSwitchTo(oldcontext);
 }
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 141fd48dc10..ccfbdf0ee01 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -104,8 +104,6 @@ extern CopyFromState BeginCopyFrom(ParseState *pstate, Relation rel, Node *where
 extern void EndCopyFrom(CopyFromState cstate);
 extern bool NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
                          Datum *values, bool *nulls);
-extern bool NextCopyFromRawFields(CopyFromState cstate,
-                                  char ***fields, int *nfields);
 extern void CopyFromErrorCallback(void *arg);
 extern char *CopyLimitPrintoutLength(const char *str);
 
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 509b9e92a18..c11b5ff3cc0 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -187,4 +187,12 @@ typedef struct CopyFromStateData
 extern void ReceiveCopyBegin(CopyFromState cstate);
 extern void ReceiveCopyBinaryHeader(CopyFromState cstate);
 
+/* Callbacks for CopyFromRoutine->CopyFromOneRow */
+extern bool CopyFromTextOneRow(CopyFromState cstate, ExprContext *econtext,
+                               Datum *values, bool *nulls);
+extern bool CopyFromCSVOneRow(CopyFromState cstate, ExprContext *econtext,
+                              Datum *values, bool *nulls);
+extern bool CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext,
+                                 Datum *values, bool *nulls);
+
 #endif                            /* COPYFROM_INTERNAL_H */
-- 
2.45.2
From f3a336853607e7c7e24158cc2b407aaca845dc88 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Tue, 23 Jul 2024 17:39:41 +0900
Subject: [PATCH v18 3/5] Add support for adding custom COPY TO format
This uses the handler approach like tablesample. The approach creates
an internal function that returns an internal struct. In this case,
a COPY TO handler returns a CopyToRoutine and a COPY FROM handler
returns a CopyFromRoutine.
This uses the same handler for COPY TO and COPY FROM. PostgreSQL calls a
COPY TO/FROM handler with "is_from" argument. It's true for COPY FROM
and false for COPY TO:
    copy_handler(true) returns CopyToRoutine
    copy_handler(false) returns CopyFromRoutine
This also add a test module for custom COPY TO/FROM handler.
---
 src/backend/commands/copy.c                   |  96 ++++++++++++++---
 src/backend/commands/copyfrom.c               |   4 +-
 src/backend/commands/copyto.c                 |   4 +-
 src/backend/nodes/Makefile                    |   1 +
 src/backend/nodes/gen_node_support.pl         |   2 +
 src/backend/utils/adt/pseudotypes.c           |   1 +
 src/include/catalog/pg_proc.dat               |   6 ++
 src/include/catalog/pg_type.dat               |   6 ++
 src/include/commands/copy.h                   |   2 +
 src/include/commands/copyapi.h                |   4 +
 src/include/nodes/meson.build                 |   1 +
 src/test/modules/Makefile                     |   1 +
 src/test/modules/meson.build                  |   1 +
 src/test/modules/test_copy_format/.gitignore  |   4 +
 src/test/modules/test_copy_format/Makefile    |  23 ++++
 .../expected/test_copy_format.out             |  21 ++++
 src/test/modules/test_copy_format/meson.build |  33 ++++++
 .../test_copy_format/sql/test_copy_format.sql |   6 ++
 .../test_copy_format--1.0.sql                 |   8 ++
 .../test_copy_format/test_copy_format.c       | 100 ++++++++++++++++++
 .../test_copy_format/test_copy_format.control |   4 +
 21 files changed, 313 insertions(+), 15 deletions(-)
 create mode 100644 src/test/modules/test_copy_format/.gitignore
 create mode 100644 src/test/modules/test_copy_format/Makefile
 create mode 100644 src/test/modules/test_copy_format/expected/test_copy_format.out
 create mode 100644 src/test/modules/test_copy_format/meson.build
 create mode 100644 src/test/modules/test_copy_format/sql/test_copy_format.sql
 create mode 100644 src/test/modules/test_copy_format/test_copy_format--1.0.sql
 create mode 100644 src/test/modules/test_copy_format/test_copy_format.c
 create mode 100644 src/test/modules/test_copy_format/test_copy_format.control
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index df7a4a21c94..e5137e7bb3d 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -32,6 +32,7 @@
 #include "parser/parse_coerce.h"
 #include "parser/parse_collate.h"
 #include "parser/parse_expr.h"
+#include "parser/parse_func.h"
 #include "parser/parse_relation.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
@@ -439,6 +440,87 @@ defGetCopyLogVerbosityChoice(DefElem *def, ParseState *pstate)
     return COPY_LOG_VERBOSITY_DEFAULT;    /* keep compiler quiet */
 }
 
+/*
+ * Process the "format" option.
+ *
+ * This function checks whether the option value is a built-in format such as
+ * "text" and "csv" or not. If the option value isn't a built-in format, this
+ * function finds a COPY format handler that returns a CopyToRoutine (for
+ * is_from == false) or CopyFromRountine (for is_from == true). If no COPY
+ * format handler is found, this function reports an error.
+ */
+static void
+ProcessCopyOptionFormat(ParseState *pstate,
+                        CopyFormatOptions *opts_out,
+                        bool is_from,
+                        DefElem *defel)
+{
+    char       *format;
+    Oid            funcargtypes[1];
+    Oid            handlerOid = InvalidOid;
+    Datum        datum;
+    Node       *routine;
+
+    format = defGetString(defel);
+
+    /* built-in formats */
+    if (strcmp(format, "text") == 0)
+         /* default format */ return;
+    else if (strcmp(format, "csv") == 0)
+    {
+        opts_out->csv_mode = true;
+        return;
+    }
+    else if (strcmp(format, "binary") == 0)
+    {
+        opts_out->binary = true;
+        return;
+    }
+
+    /* custom format */
+    funcargtypes[0] = INTERNALOID;
+    handlerOid = LookupFuncName(list_make1(makeString(format)), 1,
+                                funcargtypes, true);
+    if (!OidIsValid(handlerOid))
+        ereport(ERROR,
+                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                 errmsg("COPY format \"%s\" not recognized", format),
+                 parser_errposition(pstate, defel->location)));
+
+    datum = OidFunctionCall1(handlerOid, BoolGetDatum(is_from));
+    routine = (Node *) DatumGetPointer(datum);
+    if (is_from)
+    {
+        if (routine == NULL || !IsA(routine, CopyFromRoutine))
+            ereport(
+                    ERROR,
+                    (errcode(
+                             ERRCODE_INVALID_PARAMETER_VALUE),
+                     errmsg("COPY handler function "
+                            "%s(%u) did not return a "
+                            "CopyFromRoutine struct",
+                            format, handlerOid),
+                     parser_errposition(
+                                        pstate, defel->location)));
+    }
+    else
+    {
+        if (routine == NULL || !IsA(routine, CopyToRoutine))
+            ereport(
+                    ERROR,
+                    (errcode(
+                             ERRCODE_INVALID_PARAMETER_VALUE),
+                     errmsg("COPY handler function "
+                            "%s(%u) did not return a "
+                            "CopyToRoutine struct",
+                            format, handlerOid),
+                     parser_errposition(
+                                        pstate, defel->location)));
+    }
+
+    opts_out->routine = routine;
+}
+
 /*
  * Process the statement option list for COPY.
  *
@@ -481,22 +563,10 @@ ProcessCopyOptions(ParseState *pstate,
 
         if (strcmp(defel->defname, "format") == 0)
         {
-            char       *fmt = defGetString(defel);
-
             if (format_specified)
                 errorConflictingDefElem(defel, pstate);
             format_specified = true;
-            if (strcmp(fmt, "text") == 0)
-                 /* default format */ ;
-            else if (strcmp(fmt, "csv") == 0)
-                opts_out->csv_mode = true;
-            else if (strcmp(fmt, "binary") == 0)
-                opts_out->binary = true;
-            else
-                ereport(ERROR,
-                        (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-                         errmsg("COPY format \"%s\" not recognized", fmt),
-                         parser_errposition(pstate, defel->location)));
+            ProcessCopyOptionFormat(pstate, opts_out, is_from, defel);
         }
         else if (strcmp(defel->defname, "freeze") == 0)
         {
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 1a59202f5ab..2b48c825a0a 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -244,7 +244,9 @@ static const CopyFromRoutine CopyFromRoutineBinary = {
 static const CopyFromRoutine *
 CopyFromGetRoutine(CopyFormatOptions opts)
 {
-    if (opts.csv_mode)
+    if (opts.routine)
+        return (const CopyFromRoutine *) opts.routine;
+    else if (opts.csv_mode)
         return &CopyFromRoutineCSV;
     else if (opts.binary)
         return &CopyFromRoutineBinary;
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index c7f69ba606d..a9e923467dc 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -435,7 +435,9 @@ static const CopyToRoutine CopyToRoutineBinary = {
 static const CopyToRoutine *
 CopyToGetRoutine(CopyFormatOptions opts)
 {
-    if (opts.csv_mode)
+    if (opts.routine)
+        return (const CopyToRoutine *) opts.routine;
+    else if (opts.csv_mode)
         return &CopyToRoutineCSV;
     else if (opts.binary)
         return &CopyToRoutineBinary;
diff --git a/src/backend/nodes/Makefile b/src/backend/nodes/Makefile
index 66bbad8e6e0..173ee11811c 100644
--- a/src/backend/nodes/Makefile
+++ b/src/backend/nodes/Makefile
@@ -49,6 +49,7 @@ node_headers = \
     access/sdir.h \
     access/tableam.h \
     access/tsmapi.h \
+    commands/copyapi.h \
     commands/event_trigger.h \
     commands/trigger.h \
     executor/tuptable.h \
diff --git a/src/backend/nodes/gen_node_support.pl b/src/backend/nodes/gen_node_support.pl
index 81df3bdf95f..428ab4f0d93 100644
--- a/src/backend/nodes/gen_node_support.pl
+++ b/src/backend/nodes/gen_node_support.pl
@@ -61,6 +61,7 @@ my @all_input_files = qw(
   access/sdir.h
   access/tableam.h
   access/tsmapi.h
+  commands/copyapi.h
   commands/event_trigger.h
   commands/trigger.h
   executor/tuptable.h
@@ -85,6 +86,7 @@ my @nodetag_only_files = qw(
   access/sdir.h
   access/tableam.h
   access/tsmapi.h
+  commands/copyapi.h
   commands/event_trigger.h
   commands/trigger.h
   executor/tuptable.h
diff --git a/src/backend/utils/adt/pseudotypes.c b/src/backend/utils/adt/pseudotypes.c
index e189e9b79d2..25f24ab95d2 100644
--- a/src/backend/utils/adt/pseudotypes.c
+++ b/src/backend/utils/adt/pseudotypes.c
@@ -370,6 +370,7 @@ PSEUDOTYPE_DUMMY_IO_FUNCS(fdw_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(table_am_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(index_am_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(tsm_handler);
+PSEUDOTYPE_DUMMY_IO_FUNCS(copy_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(internal);
 PSEUDOTYPE_DUMMY_IO_FUNCS(anyelement);
 PSEUDOTYPE_DUMMY_IO_FUNCS(anynonarray);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 73d9cf85826..126254473e6 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -7644,6 +7644,12 @@
 { oid => '3312', descr => 'I/O',
   proname => 'tsm_handler_out', prorettype => 'cstring',
   proargtypes => 'tsm_handler', prosrc => 'tsm_handler_out' },
+{ oid => '8753', descr => 'I/O',
+  proname => 'copy_handler_in', proisstrict => 'f', prorettype => 'copy_handler',
+  proargtypes => 'cstring', prosrc => 'copy_handler_in' },
+{ oid => '8754', descr => 'I/O',
+  proname => 'copy_handler_out', prorettype => 'cstring',
+  proargtypes => 'copy_handler', prosrc => 'copy_handler_out' },
 { oid => '267', descr => 'I/O',
   proname => 'table_am_handler_in', proisstrict => 'f',
   prorettype => 'table_am_handler', proargtypes => 'cstring',
diff --git a/src/include/catalog/pg_type.dat b/src/include/catalog/pg_type.dat
index ceff66ccde1..14c6c1ea486 100644
--- a/src/include/catalog/pg_type.dat
+++ b/src/include/catalog/pg_type.dat
@@ -633,6 +633,12 @@
   typcategory => 'P', typinput => 'tsm_handler_in',
   typoutput => 'tsm_handler_out', typreceive => '-', typsend => '-',
   typalign => 'i' },
+{ oid => '8752',
+  descr => 'pseudo-type for the result of a copy to/from method functoin',
+  typname => 'copy_handler', typlen => '4', typbyval => 't', typtype => 'p',
+  typcategory => 'P', typinput => 'copy_handler_in',
+  typoutput => 'copy_handler_out', typreceive => '-', typsend => '-',
+  typalign => 'i' },
 { oid => '269',
   descr => 'pseudo-type for the result of a table AM handler function',
   typname => 'table_am_handler', typlen => '4', typbyval => 't', typtype => 'p',
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index ccfbdf0ee01..79bd4fb9151 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -84,6 +84,8 @@ typedef struct CopyFormatOptions
     CopyOnErrorChoice on_error; /* what to do when error happened */
     CopyLogVerbosityChoice log_verbosity;    /* verbosity of logged messages */
     List       *convert_select; /* list of column names (can be NIL) */
+    Node       *routine;        /* CopyToRoutine or CopyFromRoutine (can be
+                                 * NULL) */
 } CopyFormatOptions;
 
 /* These are private in commands/copy[from|to].c */
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 635c4cbff27..2223cad8fd9 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -27,6 +27,8 @@ typedef struct CopyToStateData *CopyToState;
  */
 typedef struct CopyFromRoutine
 {
+    NodeTag        type;
+
     /*
      * Called when COPY FROM is started to set up the input functions
      * associated to the relation's attributes writing to.  `finfo` can be
@@ -69,6 +71,8 @@ typedef struct CopyFromRoutine
  */
 typedef struct CopyToRoutine
 {
+    NodeTag        type;
+
     /*
      * Called when COPY TO is started to set up the output functions
      * associated to the relation's attributes reading from.  `finfo` can be
diff --git a/src/include/nodes/meson.build b/src/include/nodes/meson.build
index b665e55b657..103df1a7873 100644
--- a/src/include/nodes/meson.build
+++ b/src/include/nodes/meson.build
@@ -11,6 +11,7 @@ node_support_input_i = [
   'access/sdir.h',
   'access/tableam.h',
   'access/tsmapi.h',
+  'commands/copyapi.h',
   'commands/event_trigger.h',
   'commands/trigger.h',
   'executor/tuptable.h',
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index 256799f520a..b7b46928a19 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -15,6 +15,7 @@ SUBDIRS = \
           spgist_name_ops \
           test_bloomfilter \
           test_copy_callbacks \
+          test_copy_format \
           test_custom_rmgrs \
           test_ddl_deparse \
           test_dsa \
diff --git a/src/test/modules/meson.build b/src/test/modules/meson.build
index d8fe059d236..c42b4b2b31f 100644
--- a/src/test/modules/meson.build
+++ b/src/test/modules/meson.build
@@ -14,6 +14,7 @@ subdir('spgist_name_ops')
 subdir('ssl_passphrase_callback')
 subdir('test_bloomfilter')
 subdir('test_copy_callbacks')
+subdir('test_copy_format')
 subdir('test_custom_rmgrs')
 subdir('test_ddl_deparse')
 subdir('test_dsa')
diff --git a/src/test/modules/test_copy_format/.gitignore b/src/test/modules/test_copy_format/.gitignore
new file mode 100644
index 00000000000..5dcb3ff9723
--- /dev/null
+++ b/src/test/modules/test_copy_format/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/src/test/modules/test_copy_format/Makefile b/src/test/modules/test_copy_format/Makefile
new file mode 100644
index 00000000000..8497f91624d
--- /dev/null
+++ b/src/test/modules/test_copy_format/Makefile
@@ -0,0 +1,23 @@
+# src/test/modules/test_copy_format/Makefile
+
+MODULE_big = test_copy_format
+OBJS = \
+    $(WIN32RES) \
+    test_copy_format.o
+PGFILEDESC = "test_copy_format - test custom COPY FORMAT"
+
+EXTENSION = test_copy_format
+DATA = test_copy_format--1.0.sql
+
+REGRESS = test_copy_format
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_copy_format
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out
b/src/test/modules/test_copy_format/expected/test_copy_format.out
new file mode 100644
index 00000000000..4ed7c0b12db
--- /dev/null
+++ b/src/test/modules/test_copy_format/expected/test_copy_format.out
@@ -0,0 +1,21 @@
+CREATE EXTENSION test_copy_format;
+CREATE TABLE public.test (a smallint, b integer, c bigint);
+INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+COPY public.test FROM stdin WITH (format 'test_copy_format');
+NOTICE:  test_copy_format: is_from=true
+NOTICE:  CopyFromInFunc: atttypid=21
+NOTICE:  CopyFromInFunc: atttypid=23
+NOTICE:  CopyFromInFunc: atttypid=20
+NOTICE:  CopyFromStart: natts=3
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromEnd
+COPY public.test TO stdout WITH (format 'test_copy_format');
+NOTICE:  test_copy_format: is_from=false
+NOTICE:  CopyToOutFunc: atttypid=21
+NOTICE:  CopyToOutFunc: atttypid=23
+NOTICE:  CopyToOutFunc: atttypid=20
+NOTICE:  CopyToStart: natts=3
+NOTICE:  CopyToOneRow: tts_nvalid=3
+NOTICE:  CopyToOneRow: tts_nvalid=3
+NOTICE:  CopyToOneRow: tts_nvalid=3
+NOTICE:  CopyToEnd
diff --git a/src/test/modules/test_copy_format/meson.build b/src/test/modules/test_copy_format/meson.build
new file mode 100644
index 00000000000..4cefe7b709a
--- /dev/null
+++ b/src/test/modules/test_copy_format/meson.build
@@ -0,0 +1,33 @@
+# Copyright (c) 2024, PostgreSQL Global Development Group
+
+test_copy_format_sources = files(
+  'test_copy_format.c',
+)
+
+if host_system == 'windows'
+  test_copy_format_sources += rc_lib_gen.process(win32ver_rc, extra_args: [
+    '--NAME', 'test_copy_format',
+    '--FILEDESC', 'test_copy_format - test custom COPY FORMAT',])
+endif
+
+test_copy_format = shared_module('test_copy_format',
+  test_copy_format_sources,
+  kwargs: pg_test_mod_args,
+)
+test_install_libs += test_copy_format
+
+test_install_data += files(
+  'test_copy_format.control',
+  'test_copy_format--1.0.sql',
+)
+
+tests += {
+  'name': 'test_copy_format',
+  'sd': meson.current_source_dir(),
+  'bd': meson.current_build_dir(),
+  'regress': {
+    'sql': [
+      'test_copy_format',
+    ],
+  },
+}
diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql
b/src/test/modules/test_copy_format/sql/test_copy_format.sql
new file mode 100644
index 00000000000..e805f7cb011
--- /dev/null
+++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql
@@ -0,0 +1,6 @@
+CREATE EXTENSION test_copy_format;
+CREATE TABLE public.test (a smallint, b integer, c bigint);
+INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+COPY public.test FROM stdin WITH (format 'test_copy_format');
+\.
+COPY public.test TO stdout WITH (format 'test_copy_format');
diff --git a/src/test/modules/test_copy_format/test_copy_format--1.0.sql
b/src/test/modules/test_copy_format/test_copy_format--1.0.sql
new file mode 100644
index 00000000000..d24ea03ce99
--- /dev/null
+++ b/src/test/modules/test_copy_format/test_copy_format--1.0.sql
@@ -0,0 +1,8 @@
+/* src/test/modules/test_copy_format/test_copy_format--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_copy_format" to load this file. \quit
+
+CREATE FUNCTION test_copy_format(internal)
+    RETURNS copy_handler
+    AS 'MODULE_PATHNAME' LANGUAGE C;
diff --git a/src/test/modules/test_copy_format/test_copy_format.c
b/src/test/modules/test_copy_format/test_copy_format.c
new file mode 100644
index 00000000000..f6b105659ab
--- /dev/null
+++ b/src/test/modules/test_copy_format/test_copy_format.c
@@ -0,0 +1,100 @@
+/*--------------------------------------------------------------------------
+ *
+ * test_copy_format.c
+ *        Code for testing custom COPY format.
+ *
+ * Portions Copyright (c) 2024, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *        src/test/modules/test_copy_format/test_copy_format.c
+ *
+ * -------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "commands/copyapi.h"
+#include "commands/defrem.h"
+
+PG_MODULE_MAGIC;
+
+static void
+CopyFromInFunc(CopyFromState cstate, Oid atttypid,
+               FmgrInfo *finfo, Oid *typioparam)
+{
+    ereport(NOTICE, (errmsg("CopyFromInFunc: atttypid=%d", atttypid)));
+}
+
+static void
+CopyFromStart(CopyFromState cstate, TupleDesc tupDesc)
+{
+    ereport(NOTICE, (errmsg("CopyFromStart: natts=%d", tupDesc->natts)));
+}
+
+static bool
+CopyFromOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls)
+{
+    ereport(NOTICE, (errmsg("CopyFromOneRow")));
+    return false;
+}
+
+static void
+CopyFromEnd(CopyFromState cstate)
+{
+    ereport(NOTICE, (errmsg("CopyFromEnd")));
+}
+
+static const CopyFromRoutine CopyFromRoutineTestCopyFormat = {
+    .type = T_CopyFromRoutine,
+    .CopyFromInFunc = CopyFromInFunc,
+    .CopyFromStart = CopyFromStart,
+    .CopyFromOneRow = CopyFromOneRow,
+    .CopyFromEnd = CopyFromEnd,
+};
+
+static void
+CopyToOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo)
+{
+    ereport(NOTICE, (errmsg("CopyToOutFunc: atttypid=%d", atttypid)));
+}
+
+static void
+CopyToStart(CopyToState cstate, TupleDesc tupDesc)
+{
+    ereport(NOTICE, (errmsg("CopyToStart: natts=%d", tupDesc->natts)));
+}
+
+static void
+CopyToOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+    ereport(NOTICE, (errmsg("CopyToOneRow: tts_nvalid=%u", slot->tts_nvalid)));
+}
+
+static void
+CopyToEnd(CopyToState cstate)
+{
+    ereport(NOTICE, (errmsg("CopyToEnd")));
+}
+
+static const CopyToRoutine CopyToRoutineTestCopyFormat = {
+    .type = T_CopyToRoutine,
+    .CopyToOutFunc = CopyToOutFunc,
+    .CopyToStart = CopyToStart,
+    .CopyToOneRow = CopyToOneRow,
+    .CopyToEnd = CopyToEnd,
+};
+
+PG_FUNCTION_INFO_V1(test_copy_format);
+Datum
+test_copy_format(PG_FUNCTION_ARGS)
+{
+    bool        is_from = PG_GETARG_BOOL(0);
+
+    ereport(NOTICE,
+            (errmsg("test_copy_format: is_from=%s", is_from ? "true" : "false")));
+
+    if (is_from)
+        PG_RETURN_POINTER(&CopyFromRoutineTestCopyFormat);
+    else
+        PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat);
+}
diff --git a/src/test/modules/test_copy_format/test_copy_format.control
b/src/test/modules/test_copy_format/test_copy_format.control
new file mode 100644
index 00000000000..f05a6362358
--- /dev/null
+++ b/src/test/modules/test_copy_format/test_copy_format.control
@@ -0,0 +1,4 @@
+comment = 'Test code for custom COPY format'
+default_version = '1.0'
+module_pathname = '$libdir/test_copy_format'
+relocatable = true
-- 
2.45.2
From 19512a04864ec88829a553de983f41d2ce31a375 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Tue, 23 Jan 2024 14:54:10 +0900
Subject: [PATCH v18 4/5] Export CopyToStateData and CopyFromStateData
It's for custom COPY TO/FROM format handlers implemented as extension.
This just moves codes. This doesn't change codes except
CopyDest/CopyFrom enum values. CopyDest/CopyFrom enum values such as
COPY_FILE are conflicted each other. So COPY_DEST_ prefix instead of
COPY_ prefix is used for CopyDest enum values and COPY_SOURCE_ prefix
instead of COPY_PREFIX_ is used for CopyFrom enum values. For example,
COPY_FILE in CopyDest is renamed to COPY_DEST_FILE and COPY_FILE in
CopyFrom is renamed to COPY_SOURCE_FILE.
Note that this isn't enough to implement custom COPY TO/FROM format
handlers as extension. We'll do the followings in a subsequent commit:
For custom COPY TO format handler:
1. Add an opaque space for custom COPY TO format handler
2. Export CopySendEndOfRow() to flush buffer
For custom COPY FROM format handler:
1. Add an opaque space for custom COPY FROM format handler
2. Export CopyReadBinaryData() to read the next data
---
 src/backend/commands/copyfrom.c          |   4 +-
 src/backend/commands/copyfromparse.c     |  10 +-
 src/backend/commands/copyto.c            |  77 +-----
 src/include/commands/copy.h              |  78 +-----
 src/include/commands/copyapi.h           | 306 ++++++++++++++++++++++-
 src/include/commands/copyfrom_internal.h | 165 ------------
 6 files changed, 320 insertions(+), 320 deletions(-)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 2b48c825a0a..5902172b8df 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1699,7 +1699,7 @@ BeginCopyFrom(ParseState *pstate,
                             pg_encoding_to_char(GetDatabaseEncoding()))));
     }
 
-    cstate->copy_src = COPY_FILE;    /* default */
+    cstate->copy_src = COPY_SOURCE_FILE;    /* default */
 
     cstate->whereClause = whereClause;
 
@@ -1827,7 +1827,7 @@ BeginCopyFrom(ParseState *pstate,
     if (data_source_cb)
     {
         progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK;
-        cstate->copy_src = COPY_CALLBACK;
+        cstate->copy_src = COPY_SOURCE_CALLBACK;
         cstate->data_source_cb = data_source_cb;
     }
     else if (pipe)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 90824b47785..74844103228 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -180,7 +180,7 @@ ReceiveCopyBegin(CopyFromState cstate)
     for (i = 0; i < natts; i++)
         pq_sendint16(&buf, format); /* per-column formats */
     pq_endmessage(&buf);
-    cstate->copy_src = COPY_FRONTEND;
+    cstate->copy_src = COPY_SOURCE_FRONTEND;
     cstate->fe_msgbuf = makeStringInfo();
     /* We *must* flush here to ensure FE knows it can send. */
     pq_flush();
@@ -248,7 +248,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
 
     switch (cstate->copy_src)
     {
-        case COPY_FILE:
+        case COPY_SOURCE_FILE:
             bytesread = fread(databuf, 1, maxread, cstate->copy_file);
             if (ferror(cstate->copy_file))
                 ereport(ERROR,
@@ -257,7 +257,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
             if (bytesread == 0)
                 cstate->raw_reached_eof = true;
             break;
-        case COPY_FRONTEND:
+        case COPY_SOURCE_FRONTEND:
             while (maxread > 0 && bytesread < minread && !cstate->raw_reached_eof)
             {
                 int            avail;
@@ -340,7 +340,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
                 bytesread += avail;
             }
             break;
-        case COPY_CALLBACK:
+        case COPY_SOURCE_CALLBACK:
             bytesread = cstate->data_source_cb(databuf, minread, maxread);
             break;
     }
@@ -1188,7 +1188,7 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
          * after \. up to the protocol end of copy data.  (XXX maybe better
          * not to treat \. as special?)
          */
-        if (cstate->copy_src == COPY_FRONTEND)
+        if (cstate->copy_src == COPY_SOURCE_FRONTEND)
         {
             int            inbytes;
 
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index a9e923467dc..54aa6cdecaf 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -37,67 +37,6 @@
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
-/*
- * Represents the different dest cases we need to worry about at
- * the bottom level
- */
-typedef enum CopyDest
-{
-    COPY_FILE,                    /* to file (or a piped program) */
-    COPY_FRONTEND,                /* to frontend */
-    COPY_CALLBACK,                /* to callback function */
-} CopyDest;
-
-/*
- * This struct contains all the state variables used throughout a COPY TO
- * operation.
- *
- * Multi-byte encodings: all supported client-side encodings encode multi-byte
- * characters by having the first byte's high bit set. Subsequent bytes of the
- * character can have the high bit not set. When scanning data in such an
- * encoding to look for a match to a single-byte (ie ASCII) character, we must
- * use the full pg_encoding_mblen() machinery to skip over multibyte
- * characters, else we might find a false match to a trailing byte. In
- * supported server encodings, there is no possibility of a false match, and
- * it's faster to make useless comparisons to trailing bytes than it is to
- * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true
- * when we have to do it the hard way.
- */
-typedef struct CopyToStateData
-{
-    /* format routine */
-    const CopyToRoutine *routine;
-
-    /* low-level state data */
-    CopyDest    copy_dest;        /* type of copy source/destination */
-    FILE       *copy_file;        /* used if copy_dest == COPY_FILE */
-    StringInfo    fe_msgbuf;        /* used for all dests during COPY TO */
-
-    int            file_encoding;    /* file or remote side's character encoding */
-    bool        need_transcoding;    /* file encoding diff from server? */
-    bool        encoding_embeds_ascii;    /* ASCII can be non-first byte? */
-
-    /* parameters from the COPY command */
-    Relation    rel;            /* relation to copy to */
-    QueryDesc  *queryDesc;        /* executable query to copy from */
-    List       *attnumlist;        /* integer list of attnums to copy */
-    char       *filename;        /* filename, or NULL for STDOUT */
-    bool        is_program;        /* is 'filename' a program to popen? */
-    copy_data_dest_cb data_dest_cb; /* function for writing data */
-
-    CopyFormatOptions opts;
-    Node       *whereClause;    /* WHERE condition (or NULL) */
-
-    /*
-     * Working state
-     */
-    MemoryContext copycontext;    /* per-copy execution context */
-
-    FmgrInfo   *out_functions;    /* lookup info for output functions */
-    MemoryContext rowcontext;    /* per-row evaluation context */
-    uint64        bytes_processed;    /* number of bytes processed so far */
-} CopyToStateData;
-
 /* DestReceiver for COPY (query) TO */
 typedef struct
 {
@@ -143,7 +82,7 @@ CopyToTextLikeSendEndOfRow(CopyToState cstate)
 {
     switch (cstate->copy_dest)
     {
-        case COPY_FILE:
+        case COPY_DEST_FILE:
             /* Default line termination depends on platform */
 #ifndef WIN32
             CopySendChar(cstate, '\n');
@@ -151,7 +90,7 @@ CopyToTextLikeSendEndOfRow(CopyToState cstate)
             CopySendString(cstate, "\r\n");
 #endif
             break;
-        case COPY_FRONTEND:
+        case COPY_DEST_FRONTEND:
             /* The FE/BE protocol uses \n as newline for all platforms */
             CopySendChar(cstate, '\n');
             break;
@@ -464,7 +403,7 @@ SendCopyBegin(CopyToState cstate)
     for (i = 0; i < natts; i++)
         pq_sendint16(&buf, format); /* per-column formats */
     pq_endmessage(&buf);
-    cstate->copy_dest = COPY_FRONTEND;
+    cstate->copy_dest = COPY_DEST_FRONTEND;
 }
 
 static void
@@ -511,7 +450,7 @@ CopySendEndOfRow(CopyToState cstate)
 
     switch (cstate->copy_dest)
     {
-        case COPY_FILE:
+        case COPY_DEST_FILE:
             if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1,
                        cstate->copy_file) != 1 ||
                 ferror(cstate->copy_file))
@@ -545,11 +484,11 @@ CopySendEndOfRow(CopyToState cstate)
                              errmsg("could not write to COPY file: %m")));
             }
             break;
-        case COPY_FRONTEND:
+        case COPY_DEST_FRONTEND:
             /* Dump the accumulated row as one CopyData message */
             (void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len);
             break;
-        case COPY_CALLBACK:
+        case COPY_DEST_CALLBACK:
             cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len);
             break;
     }
@@ -928,12 +867,12 @@ BeginCopyTo(ParseState *pstate,
     /* See Multibyte encoding comment above */
     cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding);
 
-    cstate->copy_dest = COPY_FILE;    /* default */
+    cstate->copy_dest = COPY_DEST_FILE; /* default */
 
     if (data_dest_cb)
     {
         progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK;
-        cstate->copy_dest = COPY_CALLBACK;
+        cstate->copy_dest = COPY_DEST_CALLBACK;
         cstate->data_dest_cb = data_dest_cb;
     }
     else if (pipe)
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 79bd4fb9151..e2411848e9f 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -14,87 +14,11 @@
 #ifndef COPY_H
 #define COPY_H
 
-#include "nodes/execnodes.h"
+#include "commands/copyapi.h"
 #include "nodes/parsenodes.h"
 #include "parser/parse_node.h"
 #include "tcop/dest.h"
 
-/*
- * Represents whether a header line should be present, and whether it must
- * match the actual names (which implies "true").
- */
-typedef enum CopyHeaderChoice
-{
-    COPY_HEADER_FALSE = 0,
-    COPY_HEADER_TRUE,
-    COPY_HEADER_MATCH,
-} CopyHeaderChoice;
-
-/*
- * Represents where to save input processing errors.  More values to be added
- * in the future.
- */
-typedef enum CopyOnErrorChoice
-{
-    COPY_ON_ERROR_STOP = 0,        /* immediately throw errors, default */
-    COPY_ON_ERROR_IGNORE,        /* ignore errors */
-} CopyOnErrorChoice;
-
-/*
- * Represents verbosity of logged messages by COPY command.
- */
-typedef enum CopyLogVerbosityChoice
-{
-    COPY_LOG_VERBOSITY_DEFAULT = 0, /* logs no additional messages, default */
-    COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
-} CopyLogVerbosityChoice;
-
-/*
- * A struct to hold COPY options, in a parsed form. All of these are related
- * to formatting, except for 'freeze', which doesn't really belong here, but
- * it's expedient to parse it along with all the other options.
- */
-typedef struct CopyFormatOptions
-{
-    /* parameters from the COPY command */
-    int            file_encoding;    /* file or remote side's character encoding,
-                                 * -1 if not specified */
-    bool        binary;            /* binary format? */
-    bool        freeze;            /* freeze rows on loading? */
-    bool        csv_mode;        /* Comma Separated Value format? */
-    CopyHeaderChoice header_line;    /* header line? */
-    char       *null_print;        /* NULL marker string (server encoding!) */
-    int            null_print_len; /* length of same */
-    char       *null_print_client;    /* same converted to file encoding */
-    char       *default_print;    /* DEFAULT marker string */
-    int            default_print_len;    /* length of same */
-    char       *delim;            /* column delimiter (must be 1 byte) */
-    char       *quote;            /* CSV quote char (must be 1 byte) */
-    char       *escape;            /* CSV escape char (must be 1 byte) */
-    List       *force_quote;    /* list of column names */
-    bool        force_quote_all;    /* FORCE_QUOTE *? */
-    bool       *force_quote_flags;    /* per-column CSV FQ flags */
-    List       *force_notnull;    /* list of column names */
-    bool        force_notnull_all;    /* FORCE_NOT_NULL *? */
-    bool       *force_notnull_flags;    /* per-column CSV FNN flags */
-    List       *force_null;        /* list of column names */
-    bool        force_null_all; /* FORCE_NULL *? */
-    bool       *force_null_flags;    /* per-column CSV FN flags */
-    bool        convert_selectively;    /* do selective binary conversion? */
-    CopyOnErrorChoice on_error; /* what to do when error happened */
-    CopyLogVerbosityChoice log_verbosity;    /* verbosity of logged messages */
-    List       *convert_select; /* list of column names (can be NIL) */
-    Node       *routine;        /* CopyToRoutine or CopyFromRoutine (can be
-                                 * NULL) */
-} CopyFormatOptions;
-
-/* These are private in commands/copy[from|to].c */
-typedef struct CopyFromStateData *CopyFromState;
-typedef struct CopyToStateData *CopyToState;
-
-typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
-typedef void (*copy_data_dest_cb) (void *data, int len);
-
 extern void DoCopy(ParseState *pstate, const CopyStmt *stmt,
                    int stmt_location, int stmt_len,
                    uint64 *processed);
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 2223cad8fd9..3104d99ea9f 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -14,12 +14,83 @@
 #ifndef COPYAPI_H
 #define COPYAPI_H
 
+#include "commands/trigger.h"
+#include "executor/execdesc.h"
 #include "executor/tuptable.h"
 #include "nodes/execnodes.h"
 
-/* These are private in commands/copy[from|to].c */
+/*
+ * Represents whether a header line should be present, and whether it must
+ * match the actual names (which implies "true").
+ */
+typedef enum CopyHeaderChoice
+{
+    COPY_HEADER_FALSE = 0,
+    COPY_HEADER_TRUE,
+    COPY_HEADER_MATCH,
+} CopyHeaderChoice;
+
+/*
+ * Represents where to save input processing errors.  More values to be added
+ * in the future.
+ */
+typedef enum CopyOnErrorChoice
+{
+    COPY_ON_ERROR_STOP = 0,        /* immediately throw errors, default */
+    COPY_ON_ERROR_IGNORE,        /* ignore errors */
+} CopyOnErrorChoice;
+
+/*
+ * Represents verbosity of logged messages by COPY command.
+ */
+typedef enum CopyLogVerbosityChoice
+{
+    COPY_LOG_VERBOSITY_DEFAULT = 0, /* logs no additional messages, default */
+    COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
+} CopyLogVerbosityChoice;
+
+/*
+ * A struct to hold COPY options, in a parsed form. All of these are related
+ * to formatting, except for 'freeze', which doesn't really belong here, but
+ * it's expedient to parse it along with all the other options.
+ */
+typedef struct CopyFormatOptions
+{
+    /* parameters from the COPY command */
+    int            file_encoding;    /* file or remote side's character encoding,
+                                 * -1 if not specified */
+    bool        binary;            /* binary format? */
+    bool        freeze;            /* freeze rows on loading? */
+    bool        csv_mode;        /* Comma Separated Value format? */
+    CopyHeaderChoice header_line;    /* header line? */
+    char       *null_print;        /* NULL marker string (server encoding!) */
+    int            null_print_len; /* length of same */
+    char       *null_print_client;    /* same converted to file encoding */
+    char       *default_print;    /* DEFAULT marker string */
+    int            default_print_len;    /* length of same */
+    char       *delim;            /* column delimiter (must be 1 byte) */
+    char       *quote;            /* CSV quote char (must be 1 byte) */
+    char       *escape;            /* CSV escape char (must be 1 byte) */
+    List       *force_quote;    /* list of column names */
+    bool        force_quote_all;    /* FORCE_QUOTE *? */
+    bool       *force_quote_flags;    /* per-column CSV FQ flags */
+    List       *force_notnull;    /* list of column names */
+    bool        force_notnull_all;    /* FORCE_NOT_NULL *? */
+    bool       *force_notnull_flags;    /* per-column CSV FNN flags */
+    List       *force_null;        /* list of column names */
+    bool        force_null_all; /* FORCE_NULL *? */
+    bool       *force_null_flags;    /* per-column CSV FN flags */
+    bool        convert_selectively;    /* do selective binary conversion? */
+    CopyOnErrorChoice on_error; /* what to do when error happened */
+    CopyLogVerbosityChoice log_verbosity;    /* verbosity of logged messages */
+    List       *convert_select; /* list of column names (can be NIL) */
+    Node       *routine;        /* CopyToRoutine or CopyFromRoutine (can be
+                                 * NULL) */
+} CopyFormatOptions;
+
+typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
+
 typedef struct CopyFromStateData *CopyFromState;
-typedef struct CopyToStateData *CopyToState;
 
 /*
  * API structure for a COPY FROM format implementation.  Note this must be
@@ -65,6 +136,174 @@ typedef struct CopyFromRoutine
     void        (*CopyFromEnd) (CopyFromState cstate);
 } CopyFromRoutine;
 
+/*
+ * Represents the different source cases we need to worry about at
+ * the bottom level
+ */
+typedef enum CopySource
+{
+    COPY_SOURCE_FILE,            /* from file (or a piped program) */
+    COPY_SOURCE_FRONTEND,        /* from frontend */
+    COPY_SOURCE_CALLBACK,        /* from callback function */
+} CopySource;
+
+/*
+ * Represents the end-of-line terminator type of the input
+ */
+typedef enum EolType
+{
+    EOL_UNKNOWN,
+    EOL_NL,
+    EOL_CR,
+    EOL_CRNL,
+} EolType;
+
+/*
+ * Represents the insert method to be used during COPY FROM.
+ */
+typedef enum CopyInsertMethod
+{
+    CIM_SINGLE,                    /* use table_tuple_insert or ExecForeignInsert */
+    CIM_MULTI,                    /* always use table_multi_insert or
+                                 * ExecForeignBatchInsert */
+    CIM_MULTI_CONDITIONAL,        /* use table_multi_insert or
+                                 * ExecForeignBatchInsert only if valid */
+} CopyInsertMethod;
+
+/*
+ * This struct contains all the state variables used throughout a COPY FROM
+ * operation.
+ */
+typedef struct CopyFromStateData
+{
+    /* format routine */
+    const CopyFromRoutine *routine;
+
+    /* low-level state data */
+    CopySource    copy_src;        /* type of copy source */
+    FILE       *copy_file;        /* used if copy_src == COPY_FILE */
+    StringInfo    fe_msgbuf;        /* used if copy_src == COPY_FRONTEND */
+
+    EolType        eol_type;        /* EOL type of input */
+    int            file_encoding;    /* file or remote side's character encoding */
+    bool        need_transcoding;    /* file encoding diff from server? */
+    Oid            conversion_proc;    /* encoding conversion function */
+
+    /* parameters from the COPY command */
+    Relation    rel;            /* relation to copy from */
+    List       *attnumlist;        /* integer list of attnums to copy */
+    char       *filename;        /* filename, or NULL for STDIN */
+    bool        is_program;        /* is 'filename' a program to popen? */
+    copy_data_source_cb data_source_cb; /* function for reading data */
+
+    CopyFormatOptions opts;
+    bool       *convert_select_flags;    /* per-column CSV/TEXT CS flags */
+    Node       *whereClause;    /* WHERE condition (or NULL) */
+
+    /* these are just for error messages, see CopyFromErrorCallback */
+    const char *cur_relname;    /* table name for error messages */
+    uint64        cur_lineno;        /* line number for error messages */
+    const char *cur_attname;    /* current att for error messages */
+    const char *cur_attval;        /* current att value for error messages */
+    bool        relname_only;    /* don't output line number, att, etc. */
+
+    /*
+     * Working state
+     */
+    MemoryContext copycontext;    /* per-copy execution context */
+
+    AttrNumber    num_defaults;    /* count of att that are missing and have
+                                 * default value */
+    FmgrInfo   *in_functions;    /* array of input functions for each attrs */
+    Oid           *typioparams;    /* array of element types for in_functions */
+    ErrorSaveContext *escontext;    /* soft error trapper during in_functions
+                                     * execution */
+    uint64        num_errors;        /* total number of rows which contained soft
+                                 * errors */
+    int           *defmap;            /* array of default att numbers related to
+                                 * missing att */
+    ExprState **defexprs;        /* array of default att expressions for all
+                                 * att */
+    bool       *defaults;        /* if DEFAULT marker was found for
+                                 * corresponding att */
+    bool        volatile_defexprs;    /* is any of defexprs volatile? */
+    List       *range_table;    /* single element list of RangeTblEntry */
+    List       *rteperminfos;    /* single element list of RTEPermissionInfo */
+    ExprState  *qualexpr;
+
+    TransitionCaptureState *transition_capture;
+
+    /*
+     * These variables are used to reduce overhead in COPY FROM.
+     *
+     * attribute_buf holds the separated, de-escaped text for each field of
+     * the current line.  The CopyReadAttributes functions return arrays of
+     * pointers into this buffer.  We avoid palloc/pfree overhead by re-using
+     * the buffer on each cycle.
+     *
+     * In binary COPY FROM, attribute_buf holds the binary data for the
+     * current field, but the usage is otherwise similar.
+     */
+    StringInfoData attribute_buf;
+
+    /* field raw data pointers found by COPY FROM */
+
+    int            max_fields;
+    char      **raw_fields;
+
+    /*
+     * Similarly, line_buf holds the whole input line being processed. The
+     * input cycle is first to read the whole line into line_buf, and then
+     * extract the individual attribute fields into attribute_buf.  line_buf
+     * is preserved unmodified so that we can display it in error messages if
+     * appropriate.  (In binary mode, line_buf is not used.)
+     */
+    StringInfoData line_buf;
+    bool        line_buf_valid; /* contains the row being processed? */
+
+    /*
+     * input_buf holds input data, already converted to database encoding.
+     *
+     * In text mode, CopyReadLine parses this data sufficiently to locate line
+     * boundaries, then transfers the data to line_buf. We guarantee that
+     * there is a \0 at input_buf[input_buf_len] at all times.  (In binary
+     * mode, input_buf is not used.)
+     *
+     * If encoding conversion is not required, input_buf is not a separate
+     * buffer but points directly to raw_buf.  In that case, input_buf_len
+     * tracks the number of bytes that have been verified as valid in the
+     * database encoding, and raw_buf_len is the total number of bytes stored
+     * in the buffer.
+     */
+#define INPUT_BUF_SIZE 65536    /* we palloc INPUT_BUF_SIZE+1 bytes */
+    char       *input_buf;
+    int            input_buf_index;    /* next byte to process */
+    int            input_buf_len;    /* total # of bytes stored */
+    bool        input_reached_eof;    /* true if we reached EOF */
+    bool        input_reached_error;    /* true if a conversion error happened */
+    /* Shorthand for number of unconsumed bytes available in input_buf */
+#define INPUT_BUF_BYTES(cstate) ((cstate)->input_buf_len - (cstate)->input_buf_index)
+
+    /*
+     * raw_buf holds raw input data read from the data source (file or client
+     * connection), not yet converted to the database encoding.  Like with
+     * 'input_buf', we guarantee that there is a \0 at raw_buf[raw_buf_len].
+     */
+#define RAW_BUF_SIZE 65536        /* we palloc RAW_BUF_SIZE+1 bytes */
+    char       *raw_buf;
+    int            raw_buf_index;    /* next byte to process */
+    int            raw_buf_len;    /* total # of bytes stored */
+    bool        raw_reached_eof;    /* true if we reached EOF */
+
+    /* Shorthand for number of unconsumed bytes available in raw_buf */
+#define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index)
+
+    uint64        bytes_processed;    /* number of bytes processed so far */
+} CopyFromStateData;
+
+
+typedef struct CopyToStateData *CopyToState;
+
 /*
  * API structure for a COPY TO format implementation.   Note this must be
  * allocated in a server-lifetime manner, typically as a static const struct.
@@ -101,4 +340,67 @@ typedef struct CopyToRoutine
     void        (*CopyToEnd) (CopyToState cstate);
 } CopyToRoutine;
 
+/*
+ * Represents the different dest cases we need to worry about at
+ * the bottom level
+ */
+typedef enum CopyDest
+{
+    COPY_DEST_FILE,                /* to file (or a piped program) */
+    COPY_DEST_FRONTEND,            /* to frontend */
+    COPY_DEST_CALLBACK,            /* to callback function */
+} CopyDest;
+
+typedef void (*copy_data_dest_cb) (void *data, int len);
+
+/*
+ * This struct contains all the state variables used throughout a COPY TO
+ * operation.
+ *
+ * Multi-byte encodings: all supported client-side encodings encode multi-byte
+ * characters by having the first byte's high bit set. Subsequent bytes of the
+ * character can have the high bit not set. When scanning data in such an
+ * encoding to look for a match to a single-byte (ie ASCII) character, we must
+ * use the full pg_encoding_mblen() machinery to skip over multibyte
+ * characters, else we might find a false match to a trailing byte. In
+ * supported server encodings, there is no possibility of a false match, and
+ * it's faster to make useless comparisons to trailing bytes than it is to
+ * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true
+ * when we have to do it the hard way.
+ */
+typedef struct CopyToStateData
+{
+    /* format routine */
+    const CopyToRoutine *routine;
+
+    /* low-level state data */
+    CopyDest    copy_dest;        /* type of copy source/destination */
+    FILE       *copy_file;        /* used if copy_dest == COPY_FILE */
+    StringInfo    fe_msgbuf;        /* used for all dests during COPY TO */
+
+    int            file_encoding;    /* file or remote side's character encoding */
+    bool        need_transcoding;    /* file encoding diff from server? */
+    bool        encoding_embeds_ascii;    /* ASCII can be non-first byte? */
+
+    /* parameters from the COPY command */
+    Relation    rel;            /* relation to copy to */
+    QueryDesc  *queryDesc;        /* executable query to copy from */
+    List       *attnumlist;        /* integer list of attnums to copy */
+    char       *filename;        /* filename, or NULL for STDOUT */
+    bool        is_program;        /* is 'filename' a program to popen? */
+    copy_data_dest_cb data_dest_cb; /* function for writing data */
+
+    CopyFormatOptions opts;
+    Node       *whereClause;    /* WHERE condition (or NULL) */
+
+    /*
+     * Working state
+     */
+    MemoryContext copycontext;    /* per-copy execution context */
+
+    FmgrInfo   *out_functions;    /* lookup info for output functions */
+    MemoryContext rowcontext;    /* per-row evaluation context */
+    uint64        bytes_processed;    /* number of bytes processed so far */
+} CopyToStateData;
+
 #endif                            /* COPYAPI_H */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index c11b5ff3cc0..3863d26d5b7 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -19,171 +19,6 @@
 #include "commands/trigger.h"
 #include "nodes/miscnodes.h"
 
-/*
- * Represents the different source cases we need to worry about at
- * the bottom level
- */
-typedef enum CopySource
-{
-    COPY_FILE,                    /* from file (or a piped program) */
-    COPY_FRONTEND,                /* from frontend */
-    COPY_CALLBACK,                /* from callback function */
-} CopySource;
-
-/*
- *    Represents the end-of-line terminator type of the input
- */
-typedef enum EolType
-{
-    EOL_UNKNOWN,
-    EOL_NL,
-    EOL_CR,
-    EOL_CRNL,
-} EolType;
-
-/*
- * Represents the insert method to be used during COPY FROM.
- */
-typedef enum CopyInsertMethod
-{
-    CIM_SINGLE,                    /* use table_tuple_insert or ExecForeignInsert */
-    CIM_MULTI,                    /* always use table_multi_insert or
-                                 * ExecForeignBatchInsert */
-    CIM_MULTI_CONDITIONAL,        /* use table_multi_insert or
-                                 * ExecForeignBatchInsert only if valid */
-} CopyInsertMethod;
-
-/*
- * This struct contains all the state variables used throughout a COPY FROM
- * operation.
- */
-typedef struct CopyFromStateData
-{
-    /* format routine */
-    const CopyFromRoutine *routine;
-
-    /* low-level state data */
-    CopySource    copy_src;        /* type of copy source */
-    FILE       *copy_file;        /* used if copy_src == COPY_FILE */
-    StringInfo    fe_msgbuf;        /* used if copy_src == COPY_FRONTEND */
-
-    EolType        eol_type;        /* EOL type of input */
-    int            file_encoding;    /* file or remote side's character encoding */
-    bool        need_transcoding;    /* file encoding diff from server? */
-    Oid            conversion_proc;    /* encoding conversion function */
-
-    /* parameters from the COPY command */
-    Relation    rel;            /* relation to copy from */
-    List       *attnumlist;        /* integer list of attnums to copy */
-    char       *filename;        /* filename, or NULL for STDIN */
-    bool        is_program;        /* is 'filename' a program to popen? */
-    copy_data_source_cb data_source_cb; /* function for reading data */
-
-    CopyFormatOptions opts;
-    bool       *convert_select_flags;    /* per-column CSV/TEXT CS flags */
-    Node       *whereClause;    /* WHERE condition (or NULL) */
-
-    /* these are just for error messages, see CopyFromErrorCallback */
-    const char *cur_relname;    /* table name for error messages */
-    uint64        cur_lineno;        /* line number for error messages */
-    const char *cur_attname;    /* current att for error messages */
-    const char *cur_attval;        /* current att value for error messages */
-    bool        relname_only;    /* don't output line number, att, etc. */
-
-    /*
-     * Working state
-     */
-    MemoryContext copycontext;    /* per-copy execution context */
-
-    AttrNumber    num_defaults;    /* count of att that are missing and have
-                                 * default value */
-    FmgrInfo   *in_functions;    /* array of input functions for each attrs */
-    Oid           *typioparams;    /* array of element types for in_functions */
-    ErrorSaveContext *escontext;    /* soft error trapper during in_functions
-                                     * execution */
-    uint64        num_errors;        /* total number of rows which contained soft
-                                 * errors */
-    int           *defmap;            /* array of default att numbers related to
-                                 * missing att */
-    ExprState **defexprs;        /* array of default att expressions for all
-                                 * att */
-    bool       *defaults;        /* if DEFAULT marker was found for
-                                 * corresponding att */
-    bool        volatile_defexprs;    /* is any of defexprs volatile? */
-    List       *range_table;    /* single element list of RangeTblEntry */
-    List       *rteperminfos;    /* single element list of RTEPermissionInfo */
-    ExprState  *qualexpr;
-
-    TransitionCaptureState *transition_capture;
-
-    /*
-     * These variables are used to reduce overhead in COPY FROM.
-     *
-     * attribute_buf holds the separated, de-escaped text for each field of
-     * the current line.  The CopyReadAttributes functions return arrays of
-     * pointers into this buffer.  We avoid palloc/pfree overhead by re-using
-     * the buffer on each cycle.
-     *
-     * In binary COPY FROM, attribute_buf holds the binary data for the
-     * current field, but the usage is otherwise similar.
-     */
-    StringInfoData attribute_buf;
-
-    /* field raw data pointers found by COPY FROM */
-
-    int            max_fields;
-    char      **raw_fields;
-
-    /*
-     * Similarly, line_buf holds the whole input line being processed. The
-     * input cycle is first to read the whole line into line_buf, and then
-     * extract the individual attribute fields into attribute_buf.  line_buf
-     * is preserved unmodified so that we can display it in error messages if
-     * appropriate.  (In binary mode, line_buf is not used.)
-     */
-    StringInfoData line_buf;
-    bool        line_buf_valid; /* contains the row being processed? */
-
-    /*
-     * input_buf holds input data, already converted to database encoding.
-     *
-     * In text mode, CopyReadLine parses this data sufficiently to locate line
-     * boundaries, then transfers the data to line_buf. We guarantee that
-     * there is a \0 at input_buf[input_buf_len] at all times.  (In binary
-     * mode, input_buf is not used.)
-     *
-     * If encoding conversion is not required, input_buf is not a separate
-     * buffer but points directly to raw_buf.  In that case, input_buf_len
-     * tracks the number of bytes that have been verified as valid in the
-     * database encoding, and raw_buf_len is the total number of bytes stored
-     * in the buffer.
-     */
-#define INPUT_BUF_SIZE 65536    /* we palloc INPUT_BUF_SIZE+1 bytes */
-    char       *input_buf;
-    int            input_buf_index;    /* next byte to process */
-    int            input_buf_len;    /* total # of bytes stored */
-    bool        input_reached_eof;    /* true if we reached EOF */
-    bool        input_reached_error;    /* true if a conversion error happened */
-    /* Shorthand for number of unconsumed bytes available in input_buf */
-#define INPUT_BUF_BYTES(cstate) ((cstate)->input_buf_len - (cstate)->input_buf_index)
-
-    /*
-     * raw_buf holds raw input data read from the data source (file or client
-     * connection), not yet converted to the database encoding.  Like with
-     * 'input_buf', we guarantee that there is a \0 at raw_buf[raw_buf_len].
-     */
-#define RAW_BUF_SIZE 65536        /* we palloc RAW_BUF_SIZE+1 bytes */
-    char       *raw_buf;
-    int            raw_buf_index;    /* next byte to process */
-    int            raw_buf_len;    /* total # of bytes stored */
-    bool        raw_reached_eof;    /* true if we reached EOF */
-
-    /* Shorthand for number of unconsumed bytes available in raw_buf */
-#define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index)
-
-    uint64        bytes_processed;    /* number of bytes processed so far */
-} CopyFromStateData;
-
 extern void ReceiveCopyBegin(CopyFromState cstate);
 extern void ReceiveCopyBinaryHeader(CopyFromState cstate);
 
-- 
2.45.2
From 7afdeeaafd4045477d90cf0c9ab356074e4ea100 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Tue, 23 Jan 2024 15:12:43 +0900
Subject: [PATCH v18 5/5] Add support for implementing custom COPY TO/FROM
 format as extension
For custom COPY TO format implementation:
* Add CopyToStateData::opaque that can be used to keep data for custom
  COPY TO format implementation
* Export CopySendEndOfRow() to flush data in CopyToStateData::fe_msgbuf
* Rename CopySendEndOfRow() to CopyToStateFlush() because it's a
  method for CopyToState and it's used for flushing. End-of-row related
  codes were moved to CopyToTextSendEndOfRow().
For custom COPY FROM format implementation:
* Add CopyFromStateData::opaque that can be used to keep data for
  custom COPY From format implementation
* Export CopyReadBinaryData() to read the next data
* Rename CopyReadBinaryData() to CopyFromStateRead() because it's a
  method for CopyFromState and "BinaryData" is redundant.
---
 src/backend/commands/copyfromparse.c | 21 ++++++++++-----------
 src/backend/commands/copyto.c        | 15 +++++++--------
 src/include/commands/copyapi.h       | 10 ++++++++++
 3 files changed, 27 insertions(+), 19 deletions(-)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 74844103228..cd80d34f3da 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -164,7 +164,6 @@ static int    CopyGetData(CopyFromState cstate, void *databuf,
 static inline bool CopyGetInt32(CopyFromState cstate, int32 *val);
 static inline bool CopyGetInt16(CopyFromState cstate, int16 *val);
 static void CopyLoadInputBuf(CopyFromState cstate);
-static int    CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes);
 
 void
 ReceiveCopyBegin(CopyFromState cstate)
@@ -193,7 +192,7 @@ ReceiveCopyBinaryHeader(CopyFromState cstate)
     int32        tmp;
 
     /* Signature */
-    if (CopyReadBinaryData(cstate, readSig, 11) != 11 ||
+    if (CopyFromStateRead(cstate, readSig, 11) != 11 ||
         memcmp(readSig, BinarySignature, 11) != 0)
         ereport(ERROR,
                 (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
@@ -221,7 +220,7 @@ ReceiveCopyBinaryHeader(CopyFromState cstate)
     /* Skip extension header, if present */
     while (tmp-- > 0)
     {
-        if (CopyReadBinaryData(cstate, readSig, 1) != 1)
+        if (CopyFromStateRead(cstate, readSig, 1) != 1)
             ereport(ERROR,
                     (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
                      errmsg("invalid COPY file header (wrong length)")));
@@ -363,7 +362,7 @@ CopyGetInt32(CopyFromState cstate, int32 *val)
 {
     uint32        buf;
 
-    if (CopyReadBinaryData(cstate, (char *) &buf, sizeof(buf)) != sizeof(buf))
+    if (CopyFromStateRead(cstate, (char *) &buf, sizeof(buf)) != sizeof(buf))
     {
         *val = 0;                /* suppress compiler warning */
         return false;
@@ -380,7 +379,7 @@ CopyGetInt16(CopyFromState cstate, int16 *val)
 {
     uint16        buf;
 
-    if (CopyReadBinaryData(cstate, (char *) &buf, sizeof(buf)) != sizeof(buf))
+    if (CopyFromStateRead(cstate, (char *) &buf, sizeof(buf)) != sizeof(buf))
     {
         *val = 0;                /* suppress compiler warning */
         return false;
@@ -691,14 +690,14 @@ CopyLoadInputBuf(CopyFromState cstate)
 }
 
 /*
- * CopyReadBinaryData
+ * CopyFromStateRead
  *
  * Reads up to 'nbytes' bytes from cstate->copy_file via cstate->raw_buf
  * and writes them to 'dest'.  Returns the number of bytes read (which
  * would be less than 'nbytes' only if we reach EOF).
  */
-static int
-CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes)
+int
+CopyFromStateRead(CopyFromState cstate, char *dest, int nbytes)
 {
     int            copied_bytes = 0;
 
@@ -1078,7 +1077,7 @@ CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext,
          */
         char        dummy;
 
-        if (CopyReadBinaryData(cstate, &dummy, 1) > 0)
+        if (CopyFromStateRead(cstate, &dummy, 1) > 0)
             ereport(ERROR,
                     (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
                      errmsg("received copy data after EOF marker")));
@@ -2103,8 +2102,8 @@ CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
     resetStringInfo(&cstate->attribute_buf);
 
     enlargeStringInfo(&cstate->attribute_buf, fld_size);
-    if (CopyReadBinaryData(cstate, cstate->attribute_buf.data,
-                           fld_size) != fld_size)
+    if (CopyFromStateRead(cstate, cstate->attribute_buf.data,
+                          fld_size) != fld_size)
         ereport(ERROR,
                 (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
                  errmsg("unexpected EOF in COPY data")));
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 54aa6cdecaf..cd9e352533a 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -63,7 +63,6 @@ static void SendCopyEnd(CopyToState cstate);
 static void CopySendData(CopyToState cstate, const void *databuf, int datasize);
 static void CopySendString(CopyToState cstate, const char *str);
 static void CopySendChar(CopyToState cstate, char c);
-static void CopySendEndOfRow(CopyToState cstate);
 static void CopySendInt32(CopyToState cstate, int32 val);
 static void CopySendInt16(CopyToState cstate, int16 val);
 
@@ -99,7 +98,7 @@ CopyToTextLikeSendEndOfRow(CopyToState cstate)
     }
 
     /* Now take the actions related to the end of a row */
-    CopySendEndOfRow(cstate);
+    CopyToStateFlush(cstate);
 }
 
 /*
@@ -325,7 +324,7 @@ CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot)
         }
     }
 
-    CopySendEndOfRow(cstate);
+    CopyToStateFlush(cstate);
 }
 
 /*
@@ -339,7 +338,7 @@ CopyToBinaryEnd(CopyToState cstate)
     /* Generate trailer for a binary copy */
     CopySendInt16(cstate, -1);
     /* Need to flush out the trailer */
-    CopySendEndOfRow(cstate);
+    CopyToStateFlush(cstate);
 }
 
 /*
@@ -419,8 +418,8 @@ SendCopyEnd(CopyToState cstate)
  * CopySendData sends output data to the destination (file or frontend)
  * CopySendString does the same for null-terminated strings
  * CopySendChar does the same for single characters
- * CopySendEndOfRow does the appropriate thing at end of each data row
- *    (data is not actually flushed except by CopySendEndOfRow)
+ * CopyToStateFlush flushes the buffered data
+ *    (data is not actually flushed except by CopyToStateFlush)
  *
  * NB: no data conversion is applied by these functions
  *----------
@@ -443,8 +442,8 @@ CopySendChar(CopyToState cstate, char c)
     appendStringInfoCharMacro(cstate->fe_msgbuf, c);
 }
 
-static void
-CopySendEndOfRow(CopyToState cstate)
+void
+CopyToStateFlush(CopyToState cstate)
 {
     StringInfo    fe_msgbuf = cstate->fe_msgbuf;
 
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 3104d99ea9f..0820b47a2d2 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -299,8 +299,13 @@ typedef struct CopyFromStateData
 #define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index)
 
     uint64        bytes_processed;    /* number of bytes processed so far */
+
+    /* For custom format implementation */
+    void       *opaque;            /* private space */
 } CopyFromStateData;
 
+extern int    CopyFromStateRead(CopyFromState cstate, char *dest, int nbytes);
+
 
 typedef struct CopyToStateData *CopyToState;
 
@@ -401,6 +406,11 @@ typedef struct CopyToStateData
     FmgrInfo   *out_functions;    /* lookup info for output functions */
     MemoryContext rowcontext;    /* per-row evaluation context */
     uint64        bytes_processed;    /* number of bytes processed so far */
+
+    /* For custom format implementation */
+    void       *opaque;            /* private space */
 } CopyToStateData;
 
+extern void CopyToStateFlush(CopyToState cstate);
+
 #endif                            /* COPYAPI_H */
-- 
2.45.2
			
		Hi, THREAD SUMMARY: Proposal: How about making COPY format extendable? Background: Currently, COPY TO/FROM supports only "text", "csv" and "binary" formats. There are some requests to support more COPY formats. For example: * 2023-11: JSON and JSON lines [1] * 2022-04: Apache Arrow [2] * 2018-02: Apache Avro, Apache Parquet and Apache ORC [3] There were discussions how to add support for more formats. [3][4] In these discussions, we got a consensus about making COPY format extendable. [1]: https://www.postgresql.org/message-id/flat/24e3ee88-ec1e-421b-89ae-8a47ee0d2df1%40joeconway.com#a5e6b8829f9a74dfc835f6f29f2e44c5 [2]: https://www.postgresql.org/message-id/flat/CAGrfaBVyfm0wPzXVqm0%3Dh5uArYh9N_ij%2BsVpUtDHqkB%3DVyB3jw%40mail.gmail.com [3]: https://www.postgresql.org/message-id/flat/20180210151304.fonjztsynewldfba%40gmail.com [4]: https://www.postgresql.org/message-id/flat/3741749.1655952719%40sss.pgh.pa.us#2bb7af4a3d2c7669f9a49808d777a20d Concerns: * Performance: If we make COPY format extendable, it will introduce some overheads. We don't want to loss our optimization efforts for the current implementations by this. [5] * Extendability: We don't know which API set is enough for custom COPY format implementations yet. We don't want to provide too much APIs to reduce maintenance cost. [5]: https://www.postgresql.org/message-id/3741749.1655952719%40sss.pgh.pa.us Implementation: The v18 patch set is the latest patch set. [6] It includes the following patches: 0001: This adds a basic feature (Copy{From,To}Routine) (This isn't enough for extending COPY format. This just extracts minimal procedure sets to be extendable as callback sets.) 0002: This uses Copy{From,To}Rountine for the existing formats (text, csv and binary) (This may not be committed because there is a profiling related concern. See the following section for details) 0003: This adds support for specifying custom format by "COPY ... WITH (format 'my-format')" (This also adds a test for this feature.) 0004: This exports Copy{From,To}StateData (But this isn't enough to implement custom COPY FROM/TO handlers as an extension.) 0005: This adds opaque member to Copy{From,To}StateData and export some functions to read the next data and flush the buffer (We can implement a PoC Apache Arrow COPY FROM/TO handler as an extension with this. [7]) [6]: https://www.postgresql.org/message-id/flat/20240724.173059.909782980111496972.kou%40clear-code.com [7]: https://github.com/kou/pg-copy-arrow Implementation notes: * 0002: We use "static inline" and "constant argument" for optimization. * 0002: This hides NextCopyFromRawFields() in a public header because it's not used in PostgreSQL and we want to use "static inline" for it. If it's a problem, we can keep it and create an internal function for "static inline". * 0003: We use "CREATE FUNCTION" to register a custom COPY FROM/TO handler. It's the same approach as tablesample. * 0004 and 0005: We can mix them but this patch set split them for easy to review. 0004 just moves the existing codes. It doesn't change the existing codes. * PoC: I provide it as a separated repository instead of a patch because an extension exists as a separated project in general. If it's a problem, I can provide it as a patch for contrib/. * This patch set still has minimal Copy{From,To}Routine. For example, custom COPY FROM/TO handlers can't process their own options with this patch set. We may add more callbacks to Copy{From,To}Routine later based on real world use-cases. Performance concern: We have a benchmark result and a profile for the change that uses Copy{From,To}Routine for the existing formats. [8] They are based on the v15 patch but there are no significant difference between the v15 patch and v18 patch set. These results show the followings: * Runtime: The patched version is faster than HEAD. * The patched version: 6232ms in average * HEAD: 6550ms in average * Profile: The patched version spends more percents than HEAD in a core function. * The patched version: 85.61% in CopyOneRowTo() * HEAD: 80.35% in CopyOneRowTo() [8]: https://www.postgresql.org/message-id/flat/ZdbtQJ-p5H1_EDwE%40paquier.xyz Here are related information for this benchmark/profile: * Use -O2 for optimization build flag ("meson setup --buildtype=release" may be used) * Use tmpfs for PGDATA * Disable fsync * Run on scissors (what is "scissors" in this context...?) [9] * Unlogged table may be used * Use a table that has 30 integer columns (*1) * Use 5M rows (*2) * Use '/dev/null' for COPY TO (*3) * Use blackhole_am for COPY FROM (*4) https://github.com/michaelpq/pg_plugins/tree/main/blackhole_am * perf is used but used options are unknown (sorry) (*1) This SQL may be used to create the table: CREATE OR REPLACE FUNCTION create_table_cols(tabname text, num_cols int) RETURNS VOID AS $func$ DECLARE query text; BEGIN query := 'CREATE UNLOGGED TABLE ' || tabname || ' ('; FOR i IN 1..num_cols LOOP query := query || 'a_' || i::text || ' int default 1'; IF i != num_cols THEN query := query || ', '; END IF; END LOOP; query := query || ')'; EXECUTE format(query); END $func$ LANGUAGE plpgsql; SELECT create_table_cols ('to_tab_30', 30); SELECT create_table_cols ('from_tab_30', 30); (*2) This SQL may be used to insert 5M rows: INSERT INTO to_tab_30 SELECT FROM generate_series(1, 5000000); (*3) This SQL may be used for COPY TO: COPY to_tab_30 TO '/dev/null' WITH (FORMAT text); (*4) This SQL may be used for COPY FROM: CREATE EXTENSION blackhole_am; ALTER TABLE from_tab_30 SET ACCESS METHOD blackhole_am; COPY to_tab_30 TO '/tmp/to_tab_30.txt' WITH (FORMAT text); COPY from_tab_30 FROM '/tmp/to_tab_30.txt' WITH (FORMAT text); [9]: https://www.postgresql.org/message-id/flat/Zbr6piWuVHDtFFOl%40paquier.xyz#dbbec4d5c54ef2317be01a54abaf495c Thanks, -- kou
> On Jul 25, 2024, at 12:51, Sutou Kouhei <kou@clear-code.com> wrote:
> 
> Hi,
> 
> THREAD SUMMARY:
Very nice summary.
> 
> Implementation:
> 
> The v18 patch set is the latest patch set. [6]
> It includes the following patches:
> 
> 0001: This adds a basic feature (Copy{From,To}Routine)
>      (This isn't enough for extending COPY format.
>      This just extracts minimal procedure sets to be
>      extendable as callback sets.)
> 0002: This uses Copy{From,To}Rountine for the existing
>      formats (text, csv and binary)
>      (This may not be committed because there is a
>      profiling related concern. See the following section
>      for details)
> 0003: This adds support for specifying custom format by
>      "COPY ... WITH (format 'my-format')"
>      (This also adds a test for this feature.)
> 0004: This exports Copy{From,To}StateData
>      (But this isn't enough to implement custom COPY
>      FROM/TO handlers as an extension.)
> 0005: This adds opaque member to Copy{From,To}StateData and
>      export some functions to read the next data and flush
>      the buffer
>      (We can implement a PoC Apache Arrow COPY FROM/TO
>      handler as an extension with this. [7])
> 
> Thanks,
> --
> kou
> 
This review is for 0001 only because the other patches are not ready
for commit.
The v18-0001 patch applies cleanly to HEAD. “make check-world” also
runs cleanly. The patch looks good for me.
Regards,
Yong
			
		Hi Sutou,
On Wed, Jul 24, 2024 at 4:31 PM Sutou Kouhei <kou@clear-code.com> wrote:
>
> Hi,
>
> In <9172d4eb-6de0-4c6d-beab-8210b7a2219b@enterprisedb.com>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 22 Jul 2024 14:36:40 +0200,
>   Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
>
> > Thanks for the summary/responses. I still think it'd be better to post a
> > summary as a separate message, not as yet another post responding to
> > someone else. If I was reading the thread, I would not have noticed this
> > is meant to be a summary. I'd even consider putting a "THREAD SUMMARY"
> > title on the first line, or something like that. Up to you, of course.
>
> It makes sense. I'll do it as a separated e-mail.
>
> > My suggestions would be to maintain this as a series of patches, making
> > incremental changes, with the "more complex" or "more experimental"
> > parts larger in the series. For example, I can imagine doing this:
> >
> > 0001 - minimal version of the patch (e.g. current v17)
> > 0002 - switch existing formats to the new interface
> > 0003 - extend the interface to add bits needed for columnar formats
> > 0004 - add DML to create/alter/drop custom implementations
> > 0005 - minimal patch with extension adding support for Arrow
> >
> > Or something like that. The idea is that we still have a coherent story
> > of what we're trying to do, and can discuss the incremental changes
> > (easier than looking at a large patch). It's even possible to commit
> > earlier parts before the later parts are quite cleanup up for commit.
> > And some changes changes may not be even meant for commit (e.g. the
> > extension) but as guidance / validation for the earlier parts.
>
> OK. I attach the v18 patch set:
>
> 0001: add a basic feature (Copy{From,To}Routine)
>       (same as the v17 but it's based on the current master)
> 0002: use Copy{From,To}Rountine for the existing formats
>       (this may not be committed because there is a
>       profiling related concern)
> 0003: add support for specifying custom format by "COPY
>       ... WITH (format 'my-format')"
>       (this also has a test)
> 0004: export Copy{From,To}StateData
>       (but this isn't enough to implement custom COPY
>       FROM/TO handlers as an extension)
> 0005: add opaque member to Copy{From,To}StateData and export
>       some functions to read the next data and flush the buffer
>       (we can implement a PoC Apache Arrow COPY FROM/TO
>       handler as an extension with this)
>
> https://github.com/kou/pg-copy-arrow is a PoC Apache Arrow
> COPY FROM/TO handler as an extension.
>
>
> Notes:
>
> * 0002: We use "static inline" and "constant argument" for
>   optimization.
> * 0002: This hides NextCopyFromRawFields() in a public
>   header because it's not used in PostgreSQL and we want to
>   use "static inline" for it. If it's a problem, we can keep
>   it and create an internal function for "static inline".
> * 0003: We use "CREATE FUNCTION" to register a custom COPY
>   FROM/TO handler. It's the same approach as tablesample.
> * 0004 and 0005: We can mix them but this patch set split
>   them for easy to review. 0004 just moves the existing
>   codes. It doesn't change the existing codes.
> * PoC: I provide it as a separated repository instead of a
>   patch because an extension exists as a separated project
>   in general. If it's a problem, I can provide it as a patch
>   for contrib/.
> * This patch set still has minimal Copy{From,To}Routine. For
>   example, custom COPY FROM/TO handlers can't process their
>   own options with this patch set. We may add more callbacks
>   to Copy{From,To}Routine later based on real world use-cases.
>
> > Unfortunately, there's not much information about what exactly the tests
> > did, context (hardware, ...). So I don't know, really. But if you share
> > enough information on how to reproduce this, I'm willing to take a look
> > and investigate.
>
> Thanks. Here is related information based on the past
> e-mails from Michael:
>
> * Use -O2 for optimization build flag
>   ("meson setup --buildtype=release" may be used)
> * Use tmpfs for PGDATA
> * Disable fsync
> * Run on scissors (what is "scissors" in this context...?)
>   https://www.postgresql.org/message-id/flat/Zbr6piWuVHDtFFOl%40paquier.xyz#dbbec4d5c54ef2317be01a54abaf495c
> * Unlogged table may be used
> * Use a table that has 30 integer columns (*1)
> * Use 5M rows (*2)
> * Use '/dev/null' for COPY TO (*3)
> * Use blackhole_am for COPY FROM (*4)
>   https://github.com/michaelpq/pg_plugins/tree/main/blackhole_am
> * perf is used but used options are unknown (sorry)
>
> (*1) This SQL may be used to create the table:
>
> CREATE OR REPLACE FUNCTION create_table_cols(tabname text, num_cols int)
> RETURNS VOID AS
> $func$
> DECLARE
>   query text;
> BEGIN
>   query := 'CREATE UNLOGGED TABLE ' || tabname || ' (';
>   FOR i IN 1..num_cols LOOP
>     query := query || 'a_' || i::text || ' int default 1';
>     IF i != num_cols THEN
>       query := query || ', ';
>     END IF;
>   END LOOP;
>   query := query || ')';
>   EXECUTE format(query);
> END
> $func$ LANGUAGE plpgsql;
> SELECT create_table_cols ('to_tab_30', 30);
> SELECT create_table_cols ('from_tab_30', 30);
>
> (*2) This SQL may be used to insert 5M rows:
>
> INSERT INTO to_tab_30 SELECT FROM generate_series(1, 5000000);
>
> (*3) This SQL may be used for COPY TO:
>
> COPY to_tab_30 TO '/dev/null' WITH (FORMAT text);
>
> (*4) This SQL may be used for COPY FROM:
>
> CREATE EXTENSION blackhole_am;
> ALTER TABLE from_tab_30 SET ACCESS METHOD blackhole_am;
> COPY to_tab_30 TO '/tmp/to_tab_30.txt' WITH (FORMAT text);
> COPY from_tab_30 FROM '/tmp/to_tab_30.txt' WITH (FORMAT text);
>
>
> If there is enough information, could you try?
>
Thanks for updating the patches, I applied them and test
in my local machine, I did not use tmpfs in my test, I guess
if I run the tests enough rounds, the OS will cache the
pages, below is my numbers(I run each test 30 times, I
count for the last 10 ones):
HEAD                                PATCHED
COPY to_tab_30 TO '/dev/null' WITH (FORMAT text);
5628.280 ms                   5679.860 ms
5583.144 ms                   5588.078 ms
5604.444 ms                   5628.029 ms
5617.133 ms                   5613.926 ms
5575.570 ms                   5601.045 ms
5634.828 ms                   5616.409 ms
5693.489 ms                   5637.434 ms
5585.857 ms                   5609.531 ms
5613.948 ms                   5643.629 ms
5610.394 ms                   5580.206 ms
COPY from_tab_30 FROM '/tmp/to_tab_30.txt' WITH (FORMAT text);
3929.955 ms                   4050.895 ms
3909.061 ms                   3890.156 ms
3940.272 ms                   3927.614 ms
3907.535 ms                   3925.560 ms
3952.719 ms                   3942.141 ms
3933.751 ms                   3904.250 ms
3958.274 ms                   4025.581 ms
3937.065 ms                   3894.149 ms
3949.896 ms                   3933.878 ms
3925.399 ms                   3936.170 ms
I did not see obvious performance degradation, maybe it's
because I did not use tmpfs, but I think this OTH means
that the *function call* and *if branch* added for each row
is not the bottleneck of the whole execution path.
In 0001,
+typedef struct CopyFromRoutine
+{
+ /*
+ * Called when COPY FROM is started to set up the input functions
+ * associated to the relation's attributes writing to.  `finfo` can be
+ * optionally filled to provide the catalog information of the input
+ * function.  `typioparam` can be optionally filled to define the OID of
+ * the type to pass to the input function.  `atttypid` is the OID of data
+ * type used by the relation's attribute.
+typedef struct CopyToRoutine
+{
+ /*
+ * Called when COPY TO is started to set up the output functions
+ * associated to the relation's attributes reading from.  `finfo` can be
+ * optionally filled.  `atttypid` is the OID of data type used by the
+ * relation's attribute.
The second comment has a simplified description for `finfo`, I think it
should match the first by:
`finfo` can be optionally filled to provide the catalog information of the
output function.
After I post the patch diffs, the gmail grammer shows some hints that
it should be *associated with* rather than *associated to*, but I'm
not sure about this one.
I think the patches are in good shape, I can help to do some
further tests if needed, thanks for working on this.
>
> Thanks,
> --
> kou
--
Regards
Junwang Zhao
			
		On 7/25/24 06:51, Sutou Kouhei wrote:
> Hi,
> 
> ...
>
> Here are related information for this benchmark/profile:
> 
> * Use -O2 for optimization build flag
>   ("meson setup --buildtype=release" may be used)
> * Use tmpfs for PGDATA
> * Disable fsync
> * Run on scissors (what is "scissors" in this context...?)   [9]
> * Unlogged table may be used
> * Use a table that has 30 integer columns (*1)
> * Use 5M rows (*2)
> * Use '/dev/null' for COPY TO (*3)
> * Use blackhole_am for COPY FROM (*4)
>   https://github.com/michaelpq/pg_plugins/tree/main/blackhole_am
> * perf is used but used options are unknown (sorry)
> 
> 
> (*1) This SQL may be used to create the table:
> 
> CREATE OR REPLACE FUNCTION create_table_cols(tabname text, num_cols int)
> RETURNS VOID AS
> $func$
> DECLARE
>   query text;
> BEGIN
>   query := 'CREATE UNLOGGED TABLE ' || tabname || ' (';
>   FOR i IN 1..num_cols LOOP
>     query := query || 'a_' || i::text || ' int default 1';
>     IF i != num_cols THEN
>       query := query || ', ';
>     END IF;
>   END LOOP;
>   query := query || ')';
>   EXECUTE format(query);
> END
> $func$ LANGUAGE plpgsql;
> SELECT create_table_cols ('to_tab_30', 30);
> SELECT create_table_cols ('from_tab_30', 30);
> 
> 
> (*2) This SQL may be used to insert 5M rows:
> 
> INSERT INTO to_tab_30 SELECT FROM generate_series(1, 5000000);
> 
> 
> (*3) This SQL may be used for COPY TO:
> 
> COPY to_tab_30 TO '/dev/null' WITH (FORMAT text);
> 
> 
> (*4) This SQL may be used for COPY FROM:
> 
> CREATE EXTENSION blackhole_am;
> ALTER TABLE from_tab_30 SET ACCESS METHOD blackhole_am;
> COPY to_tab_30 TO '/tmp/to_tab_30.txt' WITH (FORMAT text);
> COPY from_tab_30 FROM '/tmp/to_tab_30.txt' WITH (FORMAT text);
> 
Thanks for the benchmark instructions and updated patches. Very helpful!
I wrote a simple script to automate the benchmark - it just runs these
tests with different parameters (number of columns and number of
imported/exported rows). See the run.sh attachment, along with two CSV
results from current master and with all patches applied.
The attached PDF has a simple summary, with a median duration for each
combination, and a comparison (patched/master). The results are from my
laptop, so it's probably noisy, and it would be good to test it on a
more realistic hardware (for perf-sensitive things).
- For COPY FROM there is no difference - the results are within 1% of
master, and there's no systemic difference.
- For COPY TO it's a different story, though. There's a pretty clear
regression, by ~5%. It's a bit interesting the correlation with the
number of columns is not stronger ...
I did do some basic profiling, and the perf diff looks like this:
# Event 'task-clock:upppH'
#
# Baseline  Delta Abs  Shared Object  Symbol
# ........  .........  .............
.........................................
#
    13.34%    -12.94%  postgres       [.] CopyOneRowTo
              +10.75%  postgres       [.] CopyToTextOneRow
     4.31%     +2.84%  postgres       [.] pg_ltoa
    10.96%     +1.15%  postgres       [.] CopySendChar
     8.68%     +0.78%  postgres       [.] AllocSetAlloc
    10.89%     -0.70%  postgres       [.] CopyAttributeOutText
     5.01%     -0.47%  postgres       [.] enlargeStringInfo
     4.95%     -0.42%  postgres       [.] OutputFunctionCall
     5.29%     -0.37%  postgres       [.] int4out
     5.90%     -0.31%  postgres       [.] appendBinaryStringInfo
               +0.29%  postgres       [.] CopyToStateFlush
     0.27%     -0.27%  postgres       [.] memcpy@plt
Not particularly surprising that CopyToTextOneRow has +11%, but that's
because it's a new function. The perf difference is perhaps due to
pg_ltoa/CopySendChar, but not sure why.
I also did some flamegraph - attached is for master, patched and diff.
It's interesting the main change in the flamegraphs is CopyToStateFlush
pops up on the left side. Because, what is that about? That is a thing
introduced in the 0005 patch, so maybe the regression is not strictly
about the existing formats moving to the new API, but due to something
else in a later version of the patch?
It would be good do run the tests for each patch in the series, and then
see when does the regression actually appear.
FWIW None of this actually proves this is an issue in practice. No one
will be exporting into /dev/null or importing into blackhole, and I'd
bet the difference gets way smaller for more realistic cases.
regards
-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
			
		Вложения
Hi,
In <CAEG8a3+KN=uofw5ksnCwh5s3m_VcfFYd=jTzcpO5uVLBHwSQEg@mail.gmail.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Sun, 28 Jul 2024 22:49:47 +0800,
  Junwang Zhao <zhjwpku@gmail.com> wrote:
> Thanks for updating the patches, I applied them and test
> in my local machine, I did not use tmpfs in my test, I guess
> if I run the tests enough rounds, the OS will cache the
> pages, below is my numbers(I run each test 30 times, I
> count for the last 10 ones):
> 
> HEAD                                PATCHED
> 
> COPY to_tab_30 TO '/dev/null' WITH (FORMAT text);
> 
> 5628.280 ms                   5679.860 ms
> 5583.144 ms                   5588.078 ms
> 5604.444 ms                   5628.029 ms
> 5617.133 ms                   5613.926 ms
> 5575.570 ms                   5601.045 ms
> 5634.828 ms                   5616.409 ms
> 5693.489 ms                   5637.434 ms
> 5585.857 ms                   5609.531 ms
> 5613.948 ms                   5643.629 ms
> 5610.394 ms                   5580.206 ms
> 
> COPY from_tab_30 FROM '/tmp/to_tab_30.txt' WITH (FORMAT text);
> 
> 3929.955 ms                   4050.895 ms
> 3909.061 ms                   3890.156 ms
> 3940.272 ms                   3927.614 ms
> 3907.535 ms                   3925.560 ms
> 3952.719 ms                   3942.141 ms
> 3933.751 ms                   3904.250 ms
> 3958.274 ms                   4025.581 ms
> 3937.065 ms                   3894.149 ms
> 3949.896 ms                   3933.878 ms
> 3925.399 ms                   3936.170 ms
> 
> I did not see obvious performance degradation, maybe it's
> because I did not use tmpfs, but I think this OTH means
> that the *function call* and *if branch* added for each row
> is not the bottleneck of the whole execution path.
Thanks for sharing your numbers. I agree with there are no
obvious performance degradation.
> In 0001,
> 
> +typedef struct CopyFromRoutine
> +{
> + /*
> + * Called when COPY FROM is started to set up the input functions
> + * associated to the relation's attributes writing to.  `finfo` can be
> + * optionally filled to provide the catalog information of the input
> + * function.  `typioparam` can be optionally filled to define the OID of
> + * the type to pass to the input function.  `atttypid` is the OID of data
> + * type used by the relation's attribute.
> 
> +typedef struct CopyToRoutine
> +{
> + /*
> + * Called when COPY TO is started to set up the output functions
> + * associated to the relation's attributes reading from.  `finfo` can be
> + * optionally filled.  `atttypid` is the OID of data type used by the
> + * relation's attribute.
> 
> The second comment has a simplified description for `finfo`, I think it
> should match the first by:
> 
> `finfo` can be optionally filled to provide the catalog information of the
> output function.
Good catch. I'll update it as suggested in the next patch set.
> After I post the patch diffs, the gmail grammer shows some hints that
> it should be *associated with* rather than *associated to*, but I'm
> not sure about this one.
Thanks. I'll use "associated with".
> I think the patches are in good shape, I can help to do some
> further tests if needed, thanks for working on this.
Thanks!
-- 
kou
			
		Hi,
In <26541788-8853-4d93-86cd-5f701b13ae51@enterprisedb.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 29 Jul 2024 14:17:08 +0200,
  Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
> I wrote a simple script to automate the benchmark - it just runs these
> tests with different parameters (number of columns and number of
> imported/exported rows). See the run.sh attachment, along with two CSV
> results from current master and with all patches applied.
Thanks. I also used the script with some modifications:
1. Create a test database automatically
2. Enable blackhole_am automatically
3. Create create_table_cols() automatically
I attach it. I also attach results of master and patched. My
results are from my desktop. So it's probably noisy.
> - For COPY FROM there is no difference - the results are within 1% of
> master, and there's no systemic difference.
> 
> - For COPY TO it's a different story, though. There's a pretty clear
> regression, by ~5%. It's a bit interesting the correlation with the
> number of columns is not stronger ...
My results showed different trend:
- COPY FROM: Patched is about 15-20% slower than master
- COPY TO: Patched is a bit faster than master
Here are some my numbers:
type    n_cols    n_rows    diff    master        patched
----------------------------------------------------------
TO    5    1    100.56%    218.376000    219.609000
FROM    5    1    113.33%    168.493000    190.954000
...
TO    5    5    100.60%    1037.773000    1044.045000
FROM    5    5    116.46%    767.966000    894.377000
...
TO    5    10    100.15%    2092.245000    2095.472000
FROM    5    10    115.91%    1508.160000    1748.130000
TO    10    1    98.62%    353.087000    348.214000
FROM    10    1    118.65%    260.551000    309.133000
...
TO    10    5    96.89%    1724.061000    1670.427000
FROM    10    5    119.92%    1224.098000    1467.941000
...
TO    10    10    98.70%    3444.291000    3399.538000
FROM    10    10    118.79%    2462.314000    2924.866000
TO    15    1    97.71%    492.082000    480.802000
FROM    15    1    115.59%    347.820000    402.033000
...
TO    15    5    98.32%    2402.419000    2362.140000
FROM    15    5    115.48%    1657.594000    1914.245000
...
TO    15    10    96.91%    4830.319000    4681.145000
FROM    15    10    115.09%    3304.798000    3803.542000
TO    20    1    96.05%    629.828000    604.939000
FROM    20    1    118.50%    438.673000    519.839000
...
TO    20    5    97.15%    3084.210000    2996.331000
FROM    20    5    115.35%    2110.909000    2435.032000
...
TO    25    1    98.29%    764.779000    751.684000
FROM    25    1    115.13%    519.686000    598.301000
...
TO    25    5    94.08%    3843.996000    3616.614000
FROM    25    5    115.62%    2554.008000    2952.928000
...
TO    25    10    97.41%    7504.865000    7310.549000
FROM    25    10    117.25%    4994.463000    5856.029000
TO    30    1    94.39%    906.324000    855.503000
FROM    30    1    119.60%    604.110000    722.491000
...
TO    30    5    96.50%    4419.907000    4265.417000
FROM    30    5    116.97%    2932.883000    3430.556000
...
TO    30    10    94.39%    8974.878000    8470.991000
FROM    30    10    117.84%    5800.793000    6835.900000
----
See the attached diff.txt for full numbers.
I also attach scripts to generate the diff.txt. Here is the
command line I used:
----
ruby diff.rb <(ruby aggregate.rb master.result) <(ruby aggregate.rb patched.result) | tee diff.txt
----
My environment:
* Debian GNU/Linux sid
* gcc (Debian 13.3.0-2) 13.3.0
* AMD Ryzen 9 3900X 12-Core Processor
I'll look into this.
If someone is interested in this proposal, could you share
your numbers?
> It's interesting the main change in the flamegraphs is CopyToStateFlush
> pops up on the left side. Because, what is that about? That is a thing
> introduced in the 0005 patch, so maybe the regression is not strictly
> about the existing formats moving to the new API, but due to something
> else in a later version of the patch?
Ah, making static CopySendEndOfRow() a to non-static function
(CopyToStateFlush()) may be the reason of this. Could you
try the attached v19 patch? It changes the 0005 patch:
* It reverts the static change
* It adds a new non-static function that just exports
  CopySendEndOfRow()
Thanks,
-- 
kou
#!/usr/bin/bash
DIR=${1:-$(pwd)}
psql postgres > /dev/null 2>&1 <<EOF
DROP DATABASE IF EXISTS test;
CREATE DATABASE test;
EOF
psql test > /dev/null 2>&1 <<EOF
CREATE EXTENSION blackhole_am;
CREATE OR REPLACE FUNCTION create_table_cols(tabname text, num_cols int)
RETURNS VOID AS
\$func\$
DECLARE
  query text;
BEGIN
  query := 'CREATE UNLOGGED TABLE ' || tabname || ' (';
  FOR i IN 1..num_cols LOOP
    query := query || 'a_' || i::text || ' int default 1';
    IF i != num_cols THEN
      query := query || ', ';
    END IF;
  END LOOP;
  query := query || ')';
  EXECUTE format(query);
END
\$func\$ LANGUAGE plpgsql;
EOF
for c in $(seq 5 5 30); do
    for rows in $(seq 1 10); do
        psql test > /dev/null 2>&1 <<EOF
DROP TABLE IF EXISTS to_table;
DROP TABLE IF EXISTS from_table;
SELECT create_table_cols ('to_table', $c);
SELECT create_table_cols ('from_table', $c);
INSERT INTO to_table SELECT FROM generate_series(1, $rows * 1000000);
COPY to_table TO '$DIR/test.data' WITH (FORMAT text);
ALTER TABLE from_table SET ACCESS METHOD blackhole_am;
EOF
        for r in $(seq 1 10); do
            s=$(psql test -t -A -c "SELECT EXTRACT(EPOCH FROM now())")
            psql test -c "COPY to_table TO '/dev/null' WITH (FORMAT text)" > /dev/null 2>&1
            d=$(psql test -t -A -c "SELECT 1000 * (EXTRACT(EPOCH FROM now()) - $s)")
            echo "COPY_TO" $c $rows $r $d
        done
        # run COPY FROM 10x
        for r in $(seq 1 10); do
            s=$(psql test -t -A -c "SELECT EXTRACT(EPOCH FROM now())")
            psql test -c "COPY from_table FROM '$DIR/test.data' WITH (FORMAT text)" > /dev/null 2>&1
            d=$(psql test -t -A -c "SELECT 1000 * (EXTRACT(EPOCH FROM now()) - $s)")
            echo "COPY_FROM" $c $rows $r $d
        done
    done
done
COPY_TO 5 1 1 212.831000
COPY_TO 5 1 2 208.677000
COPY_TO 5 1 3 215.074000
COPY_TO 5 1 4 218.376000
COPY_TO 5 1 5 219.056000
COPY_TO 5 1 6 218.237000
COPY_TO 5 1 7 234.709000
COPY_TO 5 1 8 220.561000
COPY_TO 5 1 9 219.747000
COPY_TO 5 1 10 211.881000
COPY_FROM 5 1 1 166.336000
COPY_FROM 5 1 2 166.000000
COPY_FROM 5 1 3 166.776000
COPY_FROM 5 1 4 168.493000
COPY_FROM 5 1 5 169.632000
COPY_FROM 5 1 6 164.290000
COPY_FROM 5 1 7 167.841000
COPY_FROM 5 1 8 169.336000
COPY_FROM 5 1 9 172.948000
COPY_FROM 5 1 10 168.893000
COPY_TO 5 2 1 412.065000
COPY_TO 5 2 2 420.758000
COPY_TO 5 2 3 421.387000
COPY_TO 5 2 4 402.165000
COPY_TO 5 2 5 414.407000
COPY_TO 5 2 6 423.387000
COPY_TO 5 2 7 426.431000
COPY_TO 5 2 8 424.798000
COPY_TO 5 2 9 419.588000
COPY_TO 5 2 10 425.688000
COPY_FROM 5 2 1 308.856000
COPY_FROM 5 2 2 319.487000
COPY_FROM 5 2 3 316.488000
COPY_FROM 5 2 4 315.212000
COPY_FROM 5 2 5 316.066000
COPY_FROM 5 2 6 310.381000
COPY_FROM 5 2 7 322.447000
COPY_FROM 5 2 8 318.206000
COPY_FROM 5 2 9 322.588000
COPY_FROM 5 2 10 317.101000
COPY_TO 5 3 1 633.255000
COPY_TO 5 3 2 616.202000
COPY_TO 5 3 3 610.864000
COPY_TO 5 3 4 628.803000
COPY_TO 5 3 5 638.041000
COPY_TO 5 3 6 647.732000
COPY_TO 5 3 7 624.457000
COPY_TO 5 3 8 624.007000
COPY_TO 5 3 9 616.109000
COPY_TO 5 3 10 624.354000
COPY_FROM 5 3 1 469.425000
COPY_FROM 5 3 2 471.284000
COPY_FROM 5 3 3 468.651000
COPY_FROM 5 3 4 465.177000
COPY_FROM 5 3 5 466.697000
COPY_FROM 5 3 6 463.886000
COPY_FROM 5 3 7 480.866000
COPY_FROM 5 3 8 465.048000
COPY_FROM 5 3 9 469.349000
COPY_FROM 5 3 10 467.342000
COPY_TO 5 4 1 837.447000
COPY_TO 5 4 2 848.536000
COPY_TO 5 4 3 867.580000
COPY_TO 5 4 4 831.669000
COPY_TO 5 4 5 839.633000
COPY_TO 5 4 6 846.060000
COPY_TO 5 4 7 824.590000
COPY_TO 5 4 8 836.084000
COPY_TO 5 4 9 845.936000
COPY_TO 5 4 10 851.128000
COPY_FROM 5 4 1 604.809000
COPY_FROM 5 4 2 617.653000
COPY_FROM 5 4 3 615.883000
COPY_FROM 5 4 4 616.633000
COPY_FROM 5 4 5 617.737000
COPY_FROM 5 4 6 617.361000
COPY_FROM 5 4 7 608.998000
COPY_FROM 5 4 8 621.576000
COPY_FROM 5 4 9 619.759000
COPY_FROM 5 4 10 625.312000
COPY_TO 5 5 1 1057.027000
COPY_TO 5 5 2 1038.905000
COPY_TO 5 5 3 1034.425000
COPY_TO 5 5 4 1048.834000
COPY_TO 5 5 5 1069.693000
COPY_TO 5 5 6 1019.558000
COPY_TO 5 5 7 1007.099000
COPY_TO 5 5 8 1021.759000
COPY_TO 5 5 9 1037.773000
COPY_TO 5 5 10 1008.977000
COPY_FROM 5 5 1 753.724000
COPY_FROM 5 5 2 769.060000
COPY_FROM 5 5 3 765.603000
COPY_FROM 5 5 4 769.101000
COPY_FROM 5 5 5 767.057000
COPY_FROM 5 5 6 767.966000
COPY_FROM 5 5 7 781.901000
COPY_FROM 5 5 8 772.262000
COPY_FROM 5 5 9 762.266000
COPY_FROM 5 5 10 767.036000
COPY_TO 5 6 1 1245.932000
COPY_TO 5 6 2 1254.330000
COPY_TO 5 6 3 1254.507000
COPY_TO 5 6 4 1255.708000
COPY_TO 5 6 5 1238.643000
COPY_TO 5 6 6 1259.656000
COPY_TO 5 6 7 1262.356000
COPY_TO 5 6 8 1253.554000
COPY_TO 5 6 9 1262.281000
COPY_TO 5 6 10 1253.491000
COPY_FROM 5 6 1 940.044000
COPY_FROM 5 6 2 938.479000
COPY_FROM 5 6 3 926.584000
COPY_FROM 5 6 4 920.494000
COPY_FROM 5 6 5 908.873000
COPY_FROM 5 6 6 917.936000
COPY_FROM 5 6 7 917.126000
COPY_FROM 5 6 8 921.488000
COPY_FROM 5 6 9 917.245000
COPY_FROM 5 6 10 916.243000
COPY_TO 5 7 1 1430.487000
COPY_TO 5 7 2 1427.373000
COPY_TO 5 7 3 1497.434000
COPY_TO 5 7 4 1463.688000
COPY_TO 5 7 5 1441.485000
COPY_TO 5 7 6 1474.119000
COPY_TO 5 7 7 1514.650000
COPY_TO 5 7 8 1478.208000
COPY_TO 5 7 9 1495.704000
COPY_TO 5 7 10 1459.739000
COPY_FROM 5 7 1 1077.841000
COPY_FROM 5 7 2 1084.081000
COPY_FROM 5 7 3 1093.168000
COPY_FROM 5 7 4 1078.736000
COPY_FROM 5 7 5 1076.685000
COPY_FROM 5 7 6 1110.902000
COPY_FROM 5 7 7 1079.210000
COPY_FROM 5 7 8 1067.793000
COPY_FROM 5 7 9 1079.762000
COPY_FROM 5 7 10 1084.935000
COPY_TO 5 8 1 1650.517000
COPY_TO 5 8 2 1697.566000
COPY_TO 5 8 3 1667.700000
COPY_TO 5 8 4 1656.847000
COPY_TO 5 8 5 1659.966000
COPY_TO 5 8 6 1702.676000
COPY_TO 5 8 7 1696.178000
COPY_TO 5 8 8 1682.269000
COPY_TO 5 8 9 1690.789000
COPY_TO 5 8 10 1703.132000
COPY_FROM 5 8 1 1253.866000
COPY_FROM 5 8 2 1245.838000
COPY_FROM 5 8 3 1231.959000
COPY_FROM 5 8 4 1216.493000
COPY_FROM 5 8 5 1213.282000
COPY_FROM 5 8 6 1213.838000
COPY_FROM 5 8 7 1251.825000
COPY_FROM 5 8 8 1245.100000
COPY_FROM 5 8 9 1261.415000
COPY_FROM 5 8 10 1212.752000
COPY_TO 5 9 1 1850.090000
COPY_TO 5 9 2 1899.929000
COPY_TO 5 9 3 1860.290000
COPY_TO 5 9 4 1832.055000
COPY_TO 5 9 5 1857.414000
COPY_TO 5 9 6 1879.424000
COPY_TO 5 9 7 1875.373000
COPY_TO 5 9 8 1854.969000
COPY_TO 5 9 9 1915.033000
COPY_TO 5 9 10 1866.939000
COPY_FROM 5 9 1 1370.836000
COPY_FROM 5 9 2 1379.806000
COPY_FROM 5 9 3 1372.183000
COPY_FROM 5 9 4 1367.779000
COPY_FROM 5 9 5 1368.464000
COPY_FROM 5 9 6 1380.544000
COPY_FROM 5 9 7 1363.804000
COPY_FROM 5 9 8 1362.463000
COPY_FROM 5 9 9 1371.727000
COPY_FROM 5 9 10 1377.122000
COPY_TO 5 10 1 2058.078000
COPY_TO 5 10 2 2064.015000
COPY_TO 5 10 3 2120.218000
COPY_TO 5 10 4 2060.682000
COPY_TO 5 10 5 2105.438000
COPY_TO 5 10 6 2076.790000
COPY_TO 5 10 7 2095.560000
COPY_TO 5 10 8 2092.245000
COPY_TO 5 10 9 2034.601000
COPY_TO 5 10 10 2094.292000
COPY_FROM 5 10 1 1557.934000
COPY_FROM 5 10 2 1517.610000
COPY_FROM 5 10 3 1506.637000
COPY_FROM 5 10 4 1515.831000
COPY_FROM 5 10 5 1490.391000
COPY_FROM 5 10 6 1507.338000
COPY_FROM 5 10 7 1508.160000
COPY_FROM 5 10 8 1523.402000
COPY_FROM 5 10 9 1504.555000
COPY_FROM 5 10 10 1500.368000
COPY_TO 10 1 1 350.108000
COPY_TO 10 1 2 354.319000
COPY_TO 10 1 3 347.724000
COPY_TO 10 1 4 344.384000
COPY_TO 10 1 5 355.083000
COPY_TO 10 1 6 363.509000
COPY_TO 10 1 7 355.307000
COPY_TO 10 1 8 345.092000
COPY_TO 10 1 9 353.087000
COPY_TO 10 1 10 352.411000
COPY_FROM 10 1 1 259.050000
COPY_FROM 10 1 2 261.272000
COPY_FROM 10 1 3 258.407000
COPY_FROM 10 1 4 260.551000
COPY_FROM 10 1 5 260.306000
COPY_FROM 10 1 6 262.650000
COPY_FROM 10 1 7 259.448000
COPY_FROM 10 1 8 263.050000
COPY_FROM 10 1 9 259.594000
COPY_FROM 10 1 10 262.014000
COPY_TO 10 2 1 687.593000
COPY_TO 10 2 2 689.272000
COPY_TO 10 2 3 672.518000
COPY_TO 10 2 4 697.031000
COPY_TO 10 2 5 709.173000
COPY_TO 10 2 6 704.194000
COPY_TO 10 2 7 696.468000
COPY_TO 10 2 8 693.674000
COPY_TO 10 2 9 699.779000
COPY_TO 10 2 10 692.238000
COPY_FROM 10 2 1 497.979000
COPY_FROM 10 2 2 513.060000
COPY_FROM 10 2 3 502.765000
COPY_FROM 10 2 4 509.832000
COPY_FROM 10 2 5 507.076000
COPY_FROM 10 2 6 501.886000
COPY_FROM 10 2 7 503.953000
COPY_FROM 10 2 8 509.601000
COPY_FROM 10 2 9 508.680000
COPY_FROM 10 2 10 497.768000
COPY_TO 10 3 1 1036.252000
COPY_TO 10 3 2 1011.853000
COPY_TO 10 3 3 1022.256000
COPY_TO 10 3 4 1034.388000
COPY_TO 10 3 5 1011.247000
COPY_TO 10 3 6 1042.124000
COPY_TO 10 3 7 1040.866000
COPY_TO 10 3 8 1025.704000
COPY_TO 10 3 9 1023.673000
COPY_TO 10 3 10 1061.591000
COPY_FROM 10 3 1 733.149000
COPY_FROM 10 3 2 743.642000
COPY_FROM 10 3 3 752.712000
COPY_FROM 10 3 4 735.685000
COPY_FROM 10 3 5 743.496000
COPY_FROM 10 3 6 749.289000
COPY_FROM 10 3 7 747.307000
COPY_FROM 10 3 8 750.439000
COPY_FROM 10 3 9 747.840000
COPY_FROM 10 3 10 746.275000
COPY_TO 10 4 1 1339.158000
COPY_TO 10 4 2 1349.486000
COPY_TO 10 4 3 1391.879000
COPY_TO 10 4 4 1402.481000
COPY_TO 10 4 5 1402.359000
COPY_TO 10 4 6 1349.511000
COPY_TO 10 4 7 1395.431000
COPY_TO 10 4 8 1395.865000
COPY_TO 10 4 9 1352.711000
COPY_TO 10 4 10 1335.961000
COPY_FROM 10 4 1 988.278000
COPY_FROM 10 4 2 988.250000
COPY_FROM 10 4 3 986.391000
COPY_FROM 10 4 4 992.929000
COPY_FROM 10 4 5 984.924000
COPY_FROM 10 4 6 989.783000
COPY_FROM 10 4 7 984.885000
COPY_FROM 10 4 8 977.104000
COPY_FROM 10 4 9 991.286000
COPY_FROM 10 4 10 984.057000
COPY_TO 10 5 1 1714.117000
COPY_TO 10 5 2 1737.382000
COPY_TO 10 5 3 1744.501000
COPY_TO 10 5 4 1705.814000
COPY_TO 10 5 5 1724.631000
COPY_TO 10 5 6 1670.422000
COPY_TO 10 5 7 1724.061000
COPY_TO 10 5 8 1741.960000
COPY_TO 10 5 9 1698.542000
COPY_TO 10 5 10 1703.680000
COPY_FROM 10 5 1 1236.786000
COPY_FROM 10 5 2 1228.271000
COPY_FROM 10 5 3 1233.229000
COPY_FROM 10 5 4 1223.438000
COPY_FROM 10 5 5 1218.269000
COPY_FROM 10 5 6 1215.843000
COPY_FROM 10 5 7 1218.998000
COPY_FROM 10 5 8 1223.761000
COPY_FROM 10 5 9 1237.311000
COPY_FROM 10 5 10 1224.098000
COPY_TO 10 6 1 2034.971000
COPY_TO 10 6 2 2086.575000
COPY_TO 10 6 3 2061.166000
COPY_TO 10 6 4 2028.774000
COPY_TO 10 6 5 1976.820000
COPY_TO 10 6 6 2048.341000
COPY_TO 10 6 7 2126.830000
COPY_TO 10 6 8 2113.916000
COPY_TO 10 6 9 2044.993000
COPY_TO 10 6 10 2059.930000
COPY_FROM 10 6 1 1460.496000
COPY_FROM 10 6 2 1455.160000
COPY_FROM 10 6 3 1472.230000
COPY_FROM 10 6 4 1466.294000
COPY_FROM 10 6 5 1470.005000
COPY_FROM 10 6 6 1460.124000
COPY_FROM 10 6 7 1484.157000
COPY_FROM 10 6 8 1498.308000
COPY_FROM 10 6 9 1472.033000
COPY_FROM 10 6 10 1464.715000
COPY_TO 10 7 1 2392.091000
COPY_TO 10 7 2 2376.371000
COPY_TO 10 7 3 2409.333000
COPY_TO 10 7 4 2436.201000
COPY_TO 10 7 5 2406.606000
COPY_TO 10 7 6 2415.106000
COPY_TO 10 7 7 2460.604000
COPY_TO 10 7 8 2407.684000
COPY_TO 10 7 9 2352.239000
COPY_TO 10 7 10 2453.835000
COPY_FROM 10 7 1 1720.389000
COPY_FROM 10 7 2 1716.569000
COPY_FROM 10 7 3 1724.858000
COPY_FROM 10 7 4 1714.529000
COPY_FROM 10 7 5 1704.039000
COPY_FROM 10 7 6 1723.536000
COPY_FROM 10 7 7 1725.329000
COPY_FROM 10 7 8 1690.714000
COPY_FROM 10 7 9 1726.614000
COPY_FROM 10 7 10 1740.956000
COPY_TO 10 8 1 2796.200000
COPY_TO 10 8 2 2761.445000
COPY_TO 10 8 3 2753.313000
COPY_TO 10 8 4 2767.549000
COPY_TO 10 8 5 2759.920000
COPY_TO 10 8 6 2753.090000
COPY_TO 10 8 7 2766.374000
COPY_TO 10 8 8 2758.385000
COPY_TO 10 8 9 2822.724000
COPY_TO 10 8 10 2746.903000
COPY_FROM 10 8 1 1963.436000
COPY_FROM 10 8 2 1965.409000
COPY_FROM 10 8 3 1978.345000
COPY_FROM 10 8 4 1957.258000
COPY_FROM 10 8 5 1948.144000
COPY_FROM 10 8 6 1960.546000
COPY_FROM 10 8 7 1985.631000
COPY_FROM 10 8 8 1928.848000
COPY_FROM 10 8 9 1932.803000
COPY_FROM 10 8 10 1950.939000
COPY_TO 10 9 1 3101.821000
COPY_TO 10 9 2 3119.955000
COPY_TO 10 9 3 3071.974000
COPY_TO 10 9 4 3058.962000
COPY_TO 10 9 5 3100.206000
COPY_TO 10 9 6 3085.071000
COPY_TO 10 9 7 3099.553000
COPY_TO 10 9 8 3133.255000
COPY_TO 10 9 9 3112.448000
COPY_TO 10 9 10 3078.218000
COPY_FROM 10 9 1 2231.402000
COPY_FROM 10 9 2 2270.319000
COPY_FROM 10 9 3 2212.585000
COPY_FROM 10 9 4 2214.820000
COPY_FROM 10 9 5 2182.791000
COPY_FROM 10 9 6 2297.706000
COPY_FROM 10 9 7 2203.188000
COPY_FROM 10 9 8 2275.390000
COPY_FROM 10 9 9 2191.008000
COPY_FROM 10 9 10 2202.305000
COPY_TO 10 10 1 3451.928000
COPY_TO 10 10 2 3434.760000
COPY_TO 10 10 3 3473.176000
COPY_TO 10 10 4 3483.251000
COPY_TO 10 10 5 3432.006000
COPY_TO 10 10 6 3423.612000
COPY_TO 10 10 7 3444.291000
COPY_TO 10 10 8 3425.827000
COPY_TO 10 10 9 3411.279000
COPY_TO 10 10 10 3454.997000
COPY_FROM 10 10 1 2447.611000
COPY_FROM 10 10 2 2462.314000
COPY_FROM 10 10 3 2435.629000
COPY_FROM 10 10 4 2536.588000
COPY_FROM 10 10 5 2472.294000
COPY_FROM 10 10 6 2461.858000
COPY_FROM 10 10 7 2451.032000
COPY_FROM 10 10 8 2430.200000
COPY_FROM 10 10 9 2493.472000
COPY_FROM 10 10 10 2473.227000
COPY_TO 15 1 1 484.336000
COPY_TO 15 1 2 479.033000
COPY_TO 15 1 3 479.735000
COPY_TO 15 1 4 485.839000
COPY_TO 15 1 5 478.669000
COPY_TO 15 1 6 503.063000
COPY_TO 15 1 7 497.116000
COPY_TO 15 1 8 496.381000
COPY_TO 15 1 9 492.082000
COPY_TO 15 1 10 503.341000
COPY_FROM 15 1 1 344.944000
COPY_FROM 15 1 2 347.820000
COPY_FROM 15 1 3 350.856000
COPY_FROM 15 1 4 343.687000
COPY_FROM 15 1 5 348.509000
COPY_FROM 15 1 6 352.488000
COPY_FROM 15 1 7 347.183000
COPY_FROM 15 1 8 346.262000
COPY_FROM 15 1 9 346.139000
COPY_FROM 15 1 10 349.911000
COPY_TO 15 2 1 962.932000
COPY_TO 15 2 2 963.658000
COPY_TO 15 2 3 962.137000
COPY_TO 15 2 4 962.108000
COPY_TO 15 2 5 970.632000
COPY_TO 15 2 6 953.700000
COPY_TO 15 2 7 981.138000
COPY_TO 15 2 8 973.898000
COPY_TO 15 2 9 970.741000
COPY_TO 15 2 10 948.693000
COPY_FROM 15 2 1 665.328000
COPY_FROM 15 2 2 676.310000
COPY_FROM 15 2 3 671.458000
COPY_FROM 15 2 4 670.664000
COPY_FROM 15 2 5 679.016000
COPY_FROM 15 2 6 669.444000
COPY_FROM 15 2 7 667.946000
COPY_FROM 15 2 8 667.764000
COPY_FROM 15 2 9 674.499000
COPY_FROM 15 2 10 671.073000
COPY_TO 15 3 1 1420.086000
COPY_TO 15 3 2 1448.308000
COPY_TO 15 3 3 1454.913000
COPY_TO 15 3 4 1439.704000
COPY_TO 15 3 5 1446.298000
COPY_TO 15 3 6 1470.852000
COPY_TO 15 3 7 1456.382000
COPY_TO 15 3 8 1458.444000
COPY_TO 15 3 9 1463.900000
COPY_TO 15 3 10 1461.039000
COPY_FROM 15 3 1 1007.392000
COPY_FROM 15 3 2 1001.408000
COPY_FROM 15 3 3 1010.905000
COPY_FROM 15 3 4 1004.810000
COPY_FROM 15 3 5 1000.949000
COPY_FROM 15 3 6 1011.789000
COPY_FROM 15 3 7 1009.358000
COPY_FROM 15 3 8 1010.479000
COPY_FROM 15 3 9 999.429000
COPY_FROM 15 3 10 1021.780000
COPY_TO 15 4 1 1933.560000
COPY_TO 15 4 2 1903.286000
COPY_TO 15 4 3 1931.942000
COPY_TO 15 4 4 1963.633000
COPY_TO 15 4 5 1936.343000
COPY_TO 15 4 6 1944.860000
COPY_TO 15 4 7 1924.043000
COPY_TO 15 4 8 1923.471000
COPY_TO 15 4 9 1890.255000
COPY_TO 15 4 10 1960.720000
COPY_FROM 15 4 1 1368.884000
COPY_FROM 15 4 2 1339.628000
COPY_FROM 15 4 3 1311.420000
COPY_FROM 15 4 4 1334.597000
COPY_FROM 15 4 5 1351.018000
COPY_FROM 15 4 6 1344.878000
COPY_FROM 15 4 7 1355.443000
COPY_FROM 15 4 8 1339.170000
COPY_FROM 15 4 9 1340.700000
COPY_FROM 15 4 10 1326.241000
COPY_TO 15 5 1 2318.844000
COPY_TO 15 5 2 2402.419000
COPY_TO 15 5 3 2396.781000
COPY_TO 15 5 4 2423.735000
COPY_TO 15 5 5 2409.499000
COPY_TO 15 5 6 2428.851000
COPY_TO 15 5 7 2348.598000
COPY_TO 15 5 8 2322.848000
COPY_TO 15 5 9 2352.590000
COPY_TO 15 5 10 2445.076000
COPY_FROM 15 5 1 1648.351000
COPY_FROM 15 5 2 1651.210000
COPY_FROM 15 5 3 1652.014000
COPY_FROM 15 5 4 1657.594000
COPY_FROM 15 5 5 1664.718000
COPY_FROM 15 5 6 1659.842000
COPY_FROM 15 5 7 1624.535000
COPY_FROM 15 5 8 1674.149000
COPY_FROM 15 5 9 1647.591000
COPY_FROM 15 5 10 1669.124000
COPY_TO 15 6 1 2881.853000
COPY_TO 15 6 2 2868.673000
COPY_TO 15 6 3 2965.197000
COPY_TO 15 6 4 2884.662000
COPY_TO 15 6 5 2838.135000
COPY_TO 15 6 6 2916.165000
COPY_TO 15 6 7 2886.197000
COPY_TO 15 6 8 2933.154000
COPY_TO 15 6 9 2928.349000
COPY_TO 15 6 10 2901.545000
COPY_FROM 15 6 1 2013.017000
COPY_FROM 15 6 2 1978.835000
COPY_FROM 15 6 3 2004.485000
COPY_FROM 15 6 4 1987.586000
COPY_FROM 15 6 5 1975.135000
COPY_FROM 15 6 6 1989.522000
COPY_FROM 15 6 7 1988.856000
COPY_FROM 15 6 8 1983.815000
COPY_FROM 15 6 9 2013.150000
COPY_FROM 15 6 10 1997.074000
COPY_TO 15 7 1 3335.493000
COPY_TO 15 7 2 3357.154000
COPY_TO 15 7 3 3347.085000
COPY_TO 15 7 4 3296.994000
COPY_TO 15 7 5 3376.383000
COPY_TO 15 7 6 3368.554000
COPY_TO 15 7 7 3401.287000
COPY_TO 15 7 8 3359.792000
COPY_TO 15 7 9 3351.542000
COPY_TO 15 7 10 3359.085000
COPY_FROM 15 7 1 2341.669000
COPY_FROM 15 7 2 2318.762000
COPY_FROM 15 7 3 2302.094000
COPY_FROM 15 7 4 2295.824000
COPY_FROM 15 7 5 2282.052000
COPY_FROM 15 7 6 2285.734000
COPY_FROM 15 7 7 2286.871000
COPY_FROM 15 7 8 2301.570000
COPY_FROM 15 7 9 2294.122000
COPY_FROM 15 7 10 2318.100000
COPY_TO 15 8 1 3838.944000
COPY_TO 15 8 2 3832.013000
COPY_TO 15 8 3 3794.855000
COPY_TO 15 8 4 3829.692000
COPY_TO 15 8 5 3902.267000
COPY_TO 15 8 6 3876.061000
COPY_TO 15 8 7 3844.652000
COPY_TO 15 8 8 3819.619000
COPY_TO 15 8 9 3891.511000
COPY_TO 15 8 10 3902.708000
COPY_FROM 15 8 1 2665.396000
COPY_FROM 15 8 2 2677.914000
COPY_FROM 15 8 3 2666.726000
COPY_FROM 15 8 4 2633.747000
COPY_FROM 15 8 5 2632.702000
COPY_FROM 15 8 6 2664.116000
COPY_FROM 15 8 7 2614.453000
COPY_FROM 15 8 8 2662.111000
COPY_FROM 15 8 9 2660.616000
COPY_FROM 15 8 10 2695.048000
COPY_TO 15 9 1 4341.815000
COPY_TO 15 9 2 4302.586000
COPY_TO 15 9 3 4281.296000
COPY_TO 15 9 4 4260.384000
COPY_TO 15 9 5 4354.295000
COPY_TO 15 9 6 4395.239000
COPY_TO 15 9 7 4294.927000
COPY_TO 15 9 8 4299.131000
COPY_TO 15 9 9 4324.381000
COPY_TO 15 9 10 4308.416000
COPY_FROM 15 9 1 2952.762000
COPY_FROM 15 9 2 2976.541000
COPY_FROM 15 9 3 2980.895000
COPY_FROM 15 9 4 2988.607000
COPY_FROM 15 9 5 2931.639000
COPY_FROM 15 9 6 2980.360000
COPY_FROM 15 9 7 2987.142000
COPY_FROM 15 9 8 2942.020000
COPY_FROM 15 9 9 2956.429000
COPY_FROM 15 9 10 2976.833000
COPY_TO 15 10 1 4908.128000
COPY_TO 15 10 2 4808.306000
COPY_TO 15 10 3 4884.962000
COPY_TO 15 10 4 4871.861000
COPY_TO 15 10 5 4793.649000
COPY_TO 15 10 6 4783.691000
COPY_TO 15 10 7 4953.107000
COPY_TO 15 10 8 4770.645000
COPY_TO 15 10 9 4830.319000
COPY_TO 15 10 10 4817.374000
COPY_FROM 15 10 1 3316.914000
COPY_FROM 15 10 2 3317.386000
COPY_FROM 15 10 3 3304.798000
COPY_FROM 15 10 4 3260.573000
COPY_FROM 15 10 5 3275.390000
COPY_FROM 15 10 6 3298.207000
COPY_FROM 15 10 7 3286.026000
COPY_FROM 15 10 8 3363.954000
COPY_FROM 15 10 9 3294.820000
COPY_FROM 15 10 10 3306.407000
COPY_TO 20 1 1 619.998000
COPY_TO 20 1 2 616.942000
COPY_TO 20 1 3 624.587000
COPY_TO 20 1 4 633.838000
COPY_TO 20 1 5 651.659000
COPY_TO 20 1 6 638.405000
COPY_TO 20 1 7 629.828000
COPY_TO 20 1 8 621.210000
COPY_TO 20 1 9 635.503000
COPY_TO 20 1 10 629.262000
COPY_FROM 20 1 1 433.467000
COPY_FROM 20 1 2 431.611000
COPY_FROM 20 1 3 438.673000
COPY_FROM 20 1 4 439.864000
COPY_FROM 20 1 5 436.883000
COPY_FROM 20 1 6 436.025000
COPY_FROM 20 1 7 447.105000
COPY_FROM 20 1 8 452.754000
COPY_FROM 20 1 9 434.757000
COPY_FROM 20 1 10 439.372000
COPY_TO 20 2 1 1215.557000
COPY_TO 20 2 2 1198.834000
COPY_TO 20 2 3 1248.734000
COPY_TO 20 2 4 1224.716000
COPY_TO 20 2 5 1221.355000
COPY_TO 20 2 6 1235.157000
COPY_TO 20 2 7 1213.212000
COPY_TO 20 2 8 1251.544000
COPY_TO 20 2 9 1211.466000
COPY_TO 20 2 10 1232.067000
COPY_FROM 20 2 1 853.265000
COPY_FROM 20 2 2 861.634000
COPY_FROM 20 2 3 875.109000
COPY_FROM 20 2 4 866.576000
COPY_FROM 20 2 5 869.608000
COPY_FROM 20 2 6 867.634000
COPY_FROM 20 2 7 868.359000
COPY_FROM 20 2 8 879.867000
COPY_FROM 20 2 9 856.513000
COPY_FROM 20 2 10 846.929000
COPY_TO 20 3 1 1853.167000
COPY_TO 20 3 2 1908.958000
COPY_TO 20 3 3 1854.300000
COPY_TO 20 3 4 1854.920000
COPY_TO 20 3 5 1908.171000
COPY_TO 20 3 6 1875.182000
COPY_TO 20 3 7 1858.945000
COPY_TO 20 3 8 1836.676000
COPY_TO 20 3 9 1892.760000
COPY_TO 20 3 10 1832.188000
COPY_FROM 20 3 1 1269.621000
COPY_FROM 20 3 2 1268.794000
COPY_FROM 20 3 3 1306.010000
COPY_FROM 20 3 4 1268.746000
COPY_FROM 20 3 5 1285.443000
COPY_FROM 20 3 6 1272.459000
COPY_FROM 20 3 7 1284.552000
COPY_FROM 20 3 8 1277.634000
COPY_FROM 20 3 9 1283.592000
COPY_FROM 20 3 10 1277.291000
COPY_TO 20 4 1 2366.791000
COPY_TO 20 4 2 2467.617000
COPY_TO 20 4 3 2503.922000
COPY_TO 20 4 4 2419.396000
COPY_TO 20 4 5 2362.517000
COPY_TO 20 4 6 2436.106000
COPY_TO 20 4 7 2515.537000
COPY_TO 20 4 8 2444.051000
COPY_TO 20 4 9 2368.470000
COPY_TO 20 4 10 2476.241000
COPY_FROM 20 4 1 1686.377000
COPY_FROM 20 4 2 1766.247000
COPY_FROM 20 4 3 1765.013000
COPY_FROM 20 4 4 1710.638000
COPY_FROM 20 4 5 1681.944000
COPY_FROM 20 4 6 1672.305000
COPY_FROM 20 4 7 1680.594000
COPY_FROM 20 4 8 1692.007000
COPY_FROM 20 4 9 1696.334000
COPY_FROM 20 4 10 1673.502000
COPY_TO 20 5 1 3044.926000
COPY_TO 20 5 2 2999.139000
COPY_TO 20 5 3 3012.201000
COPY_TO 20 5 4 3079.507000
COPY_TO 20 5 5 3084.210000
COPY_TO 20 5 6 3106.328000
COPY_TO 20 5 7 3107.643000
COPY_TO 20 5 8 3103.127000
COPY_TO 20 5 9 3098.074000
COPY_TO 20 5 10 3071.407000
COPY_FROM 20 5 1 2110.909000
COPY_FROM 20 5 2 2119.924000
COPY_FROM 20 5 3 2094.429000
COPY_FROM 20 5 4 2113.787000
COPY_FROM 20 5 5 2093.251000
COPY_FROM 20 5 6 2103.724000
COPY_FROM 20 5 7 2163.264000
COPY_FROM 20 5 8 2110.832000
COPY_FROM 20 5 9 2120.593000
COPY_FROM 20 5 10 2108.865000
COPY_TO 20 6 1 3778.026000
COPY_TO 20 6 2 3660.842000
COPY_TO 20 6 3 3586.255000
COPY_TO 20 6 4 3621.287000
COPY_TO 20 6 5 3765.054000
COPY_TO 20 6 6 3730.942000
COPY_TO 20 6 7 3700.704000
COPY_TO 20 6 8 3683.990000
COPY_TO 20 6 9 3654.364000
COPY_TO 20 6 10 3711.707000
COPY_FROM 20 6 1 2512.796000
COPY_FROM 20 6 2 2499.849000
COPY_FROM 20 6 3 2581.643000
COPY_FROM 20 6 4 2540.972000
COPY_FROM 20 6 5 2522.357000
COPY_FROM 20 6 6 2519.327000
COPY_FROM 20 6 7 2539.536000
COPY_FROM 20 6 8 2529.492000
COPY_FROM 20 6 9 2527.186000
COPY_FROM 20 6 10 2537.575000
COPY_TO 20 7 1 4302.273000
COPY_TO 20 7 2 4320.033000
COPY_TO 20 7 3 4234.169000
COPY_TO 20 7 4 4347.949000
COPY_TO 20 7 5 4297.509000
COPY_TO 20 7 6 4348.086000
COPY_TO 20 7 7 4302.051000
COPY_TO 20 7 8 4325.364000
COPY_TO 20 7 9 4322.654000
COPY_TO 20 7 10 4271.526000
COPY_FROM 20 7 1 2911.560000
COPY_FROM 20 7 2 2940.254000
COPY_FROM 20 7 3 2980.597000
COPY_FROM 20 7 4 2973.070000
COPY_FROM 20 7 5 2933.554000
COPY_FROM 20 7 6 2953.611000
COPY_FROM 20 7 7 2922.042000
COPY_FROM 20 7 8 2906.997000
COPY_FROM 20 7 9 2904.686000
COPY_FROM 20 7 10 2941.453000
COPY_TO 20 8 1 4764.222000
COPY_TO 20 8 2 4728.320000
COPY_TO 20 8 3 4795.743000
COPY_TO 20 8 4 4882.833000
COPY_TO 20 8 5 4815.518000
COPY_TO 20 8 6 4886.483000
COPY_TO 20 8 7 4924.319000
COPY_TO 20 8 8 4838.255000
COPY_TO 20 8 9 4863.534000
COPY_TO 20 8 10 4925.173000
COPY_FROM 20 8 1 3377.310000
COPY_FROM 20 8 2 3374.520000
COPY_FROM 20 8 3 3415.924000
COPY_FROM 20 8 4 3359.085000
COPY_FROM 20 8 5 3354.984000
COPY_FROM 20 8 6 3314.657000
COPY_FROM 20 8 7 3315.929000
COPY_FROM 20 8 8 3446.995000
COPY_FROM 20 8 9 3368.091000
COPY_FROM 20 8 10 3390.674000
COPY_TO 20 9 1 5463.960000
COPY_TO 20 9 2 5463.921000
COPY_TO 20 9 3 5378.138000
COPY_TO 20 9 4 5535.958000
COPY_TO 20 9 5 5503.000000
COPY_TO 20 9 6 5457.850000
COPY_TO 20 9 7 5435.157000
COPY_TO 20 9 8 5422.457000
COPY_TO 20 9 9 5482.427000
COPY_TO 20 9 10 5495.809000
COPY_FROM 20 9 1 3876.496000
COPY_FROM 20 9 2 3770.921000
COPY_FROM 20 9 3 3729.432000
COPY_FROM 20 9 4 3739.708000
COPY_FROM 20 9 5 3787.856000
COPY_FROM 20 9 6 3757.324000
COPY_FROM 20 9 7 3793.676000
COPY_FROM 20 9 8 3840.151000
COPY_FROM 20 9 9 3721.829000
COPY_FROM 20 9 10 3769.584000
COPY_TO 20 10 1 6021.466000
COPY_TO 20 10 2 6050.644000
COPY_TO 20 10 3 6035.796000
COPY_TO 20 10 4 5991.765000
COPY_TO 20 10 5 6095.925000
COPY_TO 20 10 6 6006.453000
COPY_TO 20 10 7 6043.915000
COPY_TO 20 10 8 6184.330000
COPY_TO 20 10 9 5997.352000
COPY_TO 20 10 10 6142.882000
COPY_FROM 20 10 1 4220.218000
COPY_FROM 20 10 2 4160.915000
COPY_FROM 20 10 3 4172.628000
COPY_FROM 20 10 4 4183.532000
COPY_FROM 20 10 5 4208.204000
COPY_FROM 20 10 6 4232.293000
COPY_FROM 20 10 7 4188.968000
COPY_FROM 20 10 8 4191.494000
COPY_FROM 20 10 9 4196.841000
COPY_FROM 20 10 10 4172.418000
COPY_TO 25 1 1 774.678000
COPY_TO 25 1 2 787.791000
COPY_TO 25 1 3 773.815000
COPY_TO 25 1 4 744.220000
COPY_TO 25 1 5 763.742000
COPY_TO 25 1 6 764.779000
COPY_TO 25 1 7 763.397000
COPY_TO 25 1 8 750.529000
COPY_TO 25 1 9 775.028000
COPY_TO 25 1 10 763.085000
COPY_FROM 25 1 1 524.445000
COPY_FROM 25 1 2 519.951000
COPY_FROM 25 1 3 516.212000
COPY_FROM 25 1 4 516.155000
COPY_FROM 25 1 5 519.686000
COPY_FROM 25 1 6 524.260000
COPY_FROM 25 1 7 521.384000
COPY_FROM 25 1 8 516.947000
COPY_FROM 25 1 9 516.268000
COPY_FROM 25 1 10 513.815000
COPY_TO 25 2 1 1513.097000
COPY_TO 25 2 2 1516.435000
COPY_TO 25 2 3 1514.322000
COPY_TO 25 2 4 1515.332000
COPY_TO 25 2 5 1539.159000
COPY_TO 25 2 6 1504.517000
COPY_TO 25 2 7 1551.701000
COPY_TO 25 2 8 1536.408000
COPY_TO 25 2 9 1506.469000
COPY_TO 25 2 10 1507.693000
COPY_FROM 25 2 1 1031.906000
COPY_FROM 25 2 2 1011.518000
COPY_FROM 25 2 3 1015.601000
COPY_FROM 25 2 4 1022.738000
COPY_FROM 25 2 5 1024.219000
COPY_FROM 25 2 6 1018.943000
COPY_FROM 25 2 7 1008.076000
COPY_FROM 25 2 8 1008.687000
COPY_FROM 25 2 9 1019.874000
COPY_FROM 25 2 10 1010.362000
COPY_TO 25 3 1 2275.840000
COPY_TO 25 3 2 2292.456000
COPY_TO 25 3 3 2304.261000
COPY_TO 25 3 4 2260.663000
COPY_TO 25 3 5 2274.911000
COPY_TO 25 3 6 2307.456000
COPY_TO 25 3 7 2304.885000
COPY_TO 25 3 8 2328.952000
COPY_TO 25 3 9 2205.891000
COPY_TO 25 3 10 2252.140000
COPY_FROM 25 3 1 1491.799000
COPY_FROM 25 3 2 1508.012000
COPY_FROM 25 3 3 1507.554000
COPY_FROM 25 3 4 1540.556000
COPY_FROM 25 3 5 1538.755000
COPY_FROM 25 3 6 1524.962000
COPY_FROM 25 3 7 1519.040000
COPY_FROM 25 3 8 1527.385000
COPY_FROM 25 3 9 1542.953000
COPY_FROM 25 3 10 1523.412000
COPY_TO 25 4 1 3052.605000
COPY_TO 25 4 2 2998.820000
COPY_TO 25 4 3 2984.156000
COPY_TO 25 4 4 3034.054000
COPY_TO 25 4 5 3035.638000
COPY_TO 25 4 6 3021.914000
COPY_TO 25 4 7 3086.029000
COPY_TO 25 4 8 3104.967000
COPY_TO 25 4 9 3084.419000
COPY_TO 25 4 10 3052.696000
COPY_FROM 25 4 1 2019.843000
COPY_FROM 25 4 2 2010.303000
COPY_FROM 25 4 3 2008.544000
COPY_FROM 25 4 4 2017.551000
COPY_FROM 25 4 5 1983.106000
COPY_FROM 25 4 6 1972.640000
COPY_FROM 25 4 7 1998.370000
COPY_FROM 25 4 8 1972.399000
COPY_FROM 25 4 9 2014.721000
COPY_FROM 25 4 10 1990.860000
COPY_TO 25 5 1 3803.703000
COPY_TO 25 5 2 3801.972000
COPY_TO 25 5 3 3732.563000
COPY_TO 25 5 4 3844.295000
COPY_TO 25 5 5 3843.996000
COPY_TO 25 5 6 3860.533000
COPY_TO 25 5 7 3885.893000
COPY_TO 25 5 8 3901.853000
COPY_TO 25 5 9 3811.751000
COPY_TO 25 5 10 3830.153000
COPY_FROM 25 5 1 2512.122000
COPY_FROM 25 5 2 2485.190000
COPY_FROM 25 5 3 2514.064000
COPY_FROM 25 5 4 2629.482000
COPY_FROM 25 5 5 2574.073000
COPY_FROM 25 5 6 2554.008000
COPY_FROM 25 5 7 2554.302000
COPY_FROM 25 5 8 2538.815000
COPY_FROM 25 5 9 2557.007000
COPY_FROM 25 5 10 2498.580000
COPY_TO 25 6 1 4623.929000
COPY_TO 25 6 2 4565.644000
COPY_TO 25 6 3 4579.721000
COPY_TO 25 6 4 4524.352000
COPY_TO 25 6 5 4470.642000
COPY_TO 25 6 6 4563.316000
COPY_TO 25 6 7 4576.716000
COPY_TO 25 6 8 4491.117000
COPY_TO 25 6 9 4544.761000
COPY_TO 25 6 10 4424.612000
COPY_FROM 25 6 1 3018.827000
COPY_FROM 25 6 2 2978.490000
COPY_FROM 25 6 3 2995.232000
COPY_FROM 25 6 4 2967.654000
COPY_FROM 25 6 5 3029.289000
COPY_FROM 25 6 6 2956.739000
COPY_FROM 25 6 7 2964.034000
COPY_FROM 25 6 8 2969.406000
COPY_FROM 25 6 9 2990.859000
COPY_FROM 25 6 10 3004.016000
COPY_TO 25 7 1 5388.767000
COPY_TO 25 7 2 5261.497000
COPY_TO 25 7 3 5266.503000
COPY_TO 25 7 4 5328.781000
COPY_TO 25 7 5 5331.428000
COPY_TO 25 7 6 5342.277000
COPY_TO 25 7 7 5309.748000
COPY_TO 25 7 8 5396.271000
COPY_TO 25 7 9 5242.006000
COPY_TO 25 7 10 5204.319000
COPY_FROM 25 7 1 3526.509000
COPY_FROM 25 7 2 3533.526000
COPY_FROM 25 7 3 3574.351000
COPY_FROM 25 7 4 3550.997000
COPY_FROM 25 7 5 3519.623000
COPY_FROM 25 7 6 3462.743000
COPY_FROM 25 7 7 3504.243000
COPY_FROM 25 7 8 3521.010000
COPY_FROM 25 7 9 3431.482000
COPY_FROM 25 7 10 3419.169000
COPY_TO 25 8 1 6097.554000
COPY_TO 25 8 2 5984.897000
COPY_TO 25 8 3 6040.903000
COPY_TO 25 8 4 6147.806000
COPY_TO 25 8 5 6037.164000
COPY_TO 25 8 6 5987.661000
COPY_TO 25 8 7 6096.899000
COPY_TO 25 8 8 6073.973000
COPY_TO 25 8 9 6105.735000
COPY_TO 25 8 10 5974.114000
COPY_FROM 25 8 1 3988.738000
COPY_FROM 25 8 2 4009.777000
COPY_FROM 25 8 3 4027.431000
COPY_FROM 25 8 4 3976.333000
COPY_FROM 25 8 5 3961.928000
COPY_FROM 25 8 6 3974.345000
COPY_FROM 25 8 7 4029.581000
COPY_FROM 25 8 8 4025.947000
COPY_FROM 25 8 9 3977.926000
COPY_FROM 25 8 10 4035.786000
COPY_TO 25 9 1 6753.774000
COPY_TO 25 9 2 6700.288000
COPY_TO 25 9 3 6880.717000
COPY_TO 25 9 4 6825.173000
COPY_TO 25 9 5 6697.153000
COPY_TO 25 9 6 6785.494000
COPY_TO 25 9 7 6879.979000
COPY_TO 25 9 8 6743.111000
COPY_TO 25 9 9 6850.346000
COPY_TO 25 9 10 6787.185000
COPY_FROM 25 9 1 4517.219000
COPY_FROM 25 9 2 4531.329000
COPY_FROM 25 9 3 4529.439000
COPY_FROM 25 9 4 4481.905000
COPY_FROM 25 9 5 4518.109000
COPY_FROM 25 9 6 4502.731000
COPY_FROM 25 9 7 4473.914000
COPY_FROM 25 9 8 4471.436000
COPY_FROM 25 9 9 4500.187000
COPY_FROM 25 9 10 4479.554000
COPY_TO 25 10 1 7557.810000
COPY_TO 25 10 2 7559.711000
COPY_TO 25 10 3 7542.392000
COPY_TO 25 10 4 7291.018000
COPY_TO 25 10 5 7504.865000
COPY_TO 25 10 6 7432.488000
COPY_TO 25 10 7 7432.530000
COPY_TO 25 10 8 7474.229000
COPY_TO 25 10 9 7384.188000
COPY_TO 25 10 10 7551.992000
COPY_FROM 25 10 1 4964.734000
COPY_FROM 25 10 2 5042.329000
COPY_FROM 25 10 3 5013.357000
COPY_FROM 25 10 4 4986.712000
COPY_FROM 25 10 5 4996.862000
COPY_FROM 25 10 6 4945.983000
COPY_FROM 25 10 7 4994.463000
COPY_FROM 25 10 8 4944.533000
COPY_FROM 25 10 9 5018.457000
COPY_FROM 25 10 10 4967.123000
COPY_TO 30 1 1 905.785000
COPY_TO 30 1 2 919.553000
COPY_TO 30 1 3 891.263000
COPY_TO 30 1 4 923.963000
COPY_TO 30 1 5 901.843000
COPY_TO 30 1 6 915.491000
COPY_TO 30 1 7 896.540000
COPY_TO 30 1 8 906.324000
COPY_TO 30 1 9 892.686000
COPY_TO 30 1 10 924.998000
COPY_FROM 30 1 1 587.472000
COPY_FROM 30 1 2 605.176000
COPY_FROM 30 1 3 591.641000
COPY_FROM 30 1 4 622.076000
COPY_FROM 30 1 5 604.110000
COPY_FROM 30 1 6 619.221000
COPY_FROM 30 1 7 612.524000
COPY_FROM 30 1 8 603.729000
COPY_FROM 30 1 9 595.670000
COPY_FROM 30 1 10 598.395000
COPY_TO 30 2 1 1799.114000
COPY_TO 30 2 2 1802.407000
COPY_TO 30 2 3 1813.957000
COPY_TO 30 2 4 1765.727000
COPY_TO 30 2 5 1798.418000
COPY_TO 30 2 6 1817.917000
COPY_TO 30 2 7 1780.496000
COPY_TO 30 2 8 1772.734000
COPY_TO 30 2 9 1771.637000
COPY_TO 30 2 10 1837.537000
COPY_FROM 30 2 1 1186.556000
COPY_FROM 30 2 2 1189.396000
COPY_FROM 30 2 3 1188.794000
COPY_FROM 30 2 4 1196.751000
COPY_FROM 30 2 5 1208.097000
COPY_FROM 30 2 6 1195.639000
COPY_FROM 30 2 7 1181.028000
COPY_FROM 30 2 8 1177.701000
COPY_FROM 30 2 9 1181.959000
COPY_FROM 30 2 10 1171.377000
COPY_TO 30 3 1 2668.510000
COPY_TO 30 3 2 2662.493000
COPY_TO 30 3 3 2659.467000
COPY_TO 30 3 4 2629.276000
COPY_TO 30 3 5 2630.829000
COPY_TO 30 3 6 2632.760000
COPY_TO 30 3 7 2642.559000
COPY_TO 30 3 8 2675.854000
COPY_TO 30 3 9 2686.168000
COPY_TO 30 3 10 2703.022000
COPY_FROM 30 3 1 1749.300000
COPY_FROM 30 3 2 1732.106000
COPY_FROM 30 3 3 1744.452000
COPY_FROM 30 3 4 1762.979000
COPY_FROM 30 3 5 1758.033000
COPY_FROM 30 3 6 1772.605000
COPY_FROM 30 3 7 1754.809000
COPY_FROM 30 3 8 1751.785000
COPY_FROM 30 3 9 1762.331000
COPY_FROM 30 3 10 1745.872000
COPY_TO 30 4 1 3575.638000
COPY_TO 30 4 2 3540.611000
COPY_TO 30 4 3 3555.631000
COPY_TO 30 4 4 3508.023000
COPY_TO 30 4 5 3548.267000
COPY_TO 30 4 6 3530.229000
COPY_TO 30 4 7 3624.151000
COPY_TO 30 4 8 3549.913000
COPY_TO 30 4 9 3579.071000
COPY_TO 30 4 10 3548.049000
COPY_FROM 30 4 1 2333.686000
COPY_FROM 30 4 2 2354.055000
COPY_FROM 30 4 3 2329.804000
COPY_FROM 30 4 4 2393.154000
COPY_FROM 30 4 5 2357.848000
COPY_FROM 30 4 6 2351.915000
COPY_FROM 30 4 7 2340.428000
COPY_FROM 30 4 8 2364.307000
COPY_FROM 30 4 9 2353.620000
COPY_FROM 30 4 10 2363.992000
COPY_TO 30 5 1 4324.843000
COPY_TO 30 5 2 4387.595000
COPY_TO 30 5 3 4416.761000
COPY_TO 30 5 4 4406.291000
COPY_TO 30 5 5 4418.657000
COPY_TO 30 5 6 4432.811000
COPY_TO 30 5 7 4422.989000
COPY_TO 30 5 8 4467.277000
COPY_TO 30 5 9 4474.720000
COPY_TO 30 5 10 4419.907000
COPY_FROM 30 5 1 2911.757000
COPY_FROM 30 5 2 2921.622000
COPY_FROM 30 5 3 2863.662000
COPY_FROM 30 5 4 3017.345000
COPY_FROM 30 5 5 2904.579000
COPY_FROM 30 5 6 2954.328000
COPY_FROM 30 5 7 2965.111000
COPY_FROM 30 5 8 2962.503000
COPY_FROM 30 5 9 2881.468000
COPY_FROM 30 5 10 2932.883000
COPY_TO 30 6 1 5324.111000
COPY_TO 30 6 2 5273.693000
COPY_TO 30 6 3 5477.630000
COPY_TO 30 6 4 5470.590000
COPY_TO 30 6 5 5330.046000
COPY_TO 30 6 6 5314.785000
COPY_TO 30 6 7 5280.238000
COPY_TO 30 6 8 5447.156000
COPY_TO 30 6 9 5470.025000
COPY_TO 30 6 10 5382.615000
COPY_FROM 30 6 1 3519.835000
COPY_FROM 30 6 2 3495.999000
COPY_FROM 30 6 3 3447.579000
COPY_FROM 30 6 4 3503.293000
COPY_FROM 30 6 5 3467.442000
COPY_FROM 30 6 6 3502.490000
COPY_FROM 30 6 7 3539.083000
COPY_FROM 30 6 8 3514.108000
COPY_FROM 30 6 9 3558.769000
COPY_FROM 30 6 10 3557.883000
COPY_TO 30 7 1 6270.765000
COPY_TO 30 7 2 6250.630000
COPY_TO 30 7 3 6291.501000
COPY_TO 30 7 4 6277.021000
COPY_TO 30 7 5 6197.067000
COPY_TO 30 7 6 6204.168000
COPY_TO 30 7 7 6326.866000
COPY_TO 30 7 8 6219.435000
COPY_TO 30 7 9 6229.165000
COPY_TO 30 7 10 6182.055000
COPY_FROM 30 7 1 4064.754000
COPY_FROM 30 7 2 4161.991000
COPY_FROM 30 7 3 4099.098000
COPY_FROM 30 7 4 4098.243000
COPY_FROM 30 7 5 4094.954000
COPY_FROM 30 7 6 4113.331000
COPY_FROM 30 7 7 4162.527000
COPY_FROM 30 7 8 4117.655000
COPY_FROM 30 7 9 4038.147000
COPY_FROM 30 7 10 4247.750000
COPY_TO 30 8 1 7036.335000
COPY_TO 30 8 2 7161.077000
COPY_TO 30 8 3 7198.475000
COPY_TO 30 8 4 7057.568000
COPY_TO 30 8 5 7068.777000
COPY_TO 30 8 6 7145.575000
COPY_TO 30 8 7 7164.393000
COPY_TO 30 8 8 7146.893000
COPY_TO 30 8 9 7263.004000
COPY_TO 30 8 10 7258.462000
COPY_FROM 30 8 1 4709.346000
COPY_FROM 30 8 2 4727.176000
COPY_FROM 30 8 3 4643.916000
COPY_FROM 30 8 4 4646.425000
COPY_FROM 30 8 5 4714.948000
COPY_FROM 30 8 6 4669.370000
COPY_FROM 30 8 7 4649.179000
COPY_FROM 30 8 8 4604.831000
COPY_FROM 30 8 9 4657.557000
COPY_FROM 30 8 10 4672.892000
COPY_TO 30 9 1 7908.138000
COPY_TO 30 9 2 8046.895000
COPY_TO 30 9 3 8140.333000
COPY_TO 30 9 4 8103.733000
COPY_TO 30 9 5 8007.650000
COPY_TO 30 9 6 7955.601000
COPY_TO 30 9 7 8044.544000
COPY_TO 30 9 8 8086.140000
COPY_TO 30 9 9 8062.369000
COPY_TO 30 9 10 7827.011000
COPY_FROM 30 9 1 5204.533000
COPY_FROM 30 9 2 5201.463000
COPY_FROM 30 9 3 5234.632000
COPY_FROM 30 9 4 5236.902000
COPY_FROM 30 9 5 5269.275000
COPY_FROM 30 9 6 5263.596000
COPY_FROM 30 9 7 5192.508000
COPY_FROM 30 9 8 5234.723000
COPY_FROM 30 9 9 5188.671000
COPY_FROM 30 9 10 5160.328000
COPY_TO 30 10 1 8859.946000
COPY_TO 30 10 2 8904.060000
COPY_TO 30 10 3 9075.677000
COPY_TO 30 10 4 8911.511000
COPY_TO 30 10 5 8923.505000
COPY_TO 30 10 6 8955.312000
COPY_TO 30 10 7 9014.532000
COPY_TO 30 10 8 9100.991000
COPY_TO 30 10 9 8978.536000
COPY_TO 30 10 10 8974.878000
COPY_FROM 30 10 1 5820.402000
COPY_FROM 30 10 2 5800.793000
COPY_FROM 30 10 3 5817.289000
COPY_FROM 30 10 4 5766.636000
COPY_FROM 30 10 5 5947.599000
COPY_FROM 30 10 6 5756.134000
COPY_FROM 30 10 7 5764.180000
COPY_FROM 30 10 8 5796.569000
COPY_FROM 30 10 9 5796.612000
COPY_FROM 30 10 10 5849.049000
COPY_TO 5 1 1 226.623000
COPY_TO 5 1 2 227.444000
COPY_TO 5 1 3 214.579000
COPY_TO 5 1 4 218.737000
COPY_TO 5 1 5 218.708000
COPY_TO 5 1 6 221.763000
COPY_TO 5 1 7 212.154000
COPY_TO 5 1 8 219.050000
COPY_TO 5 1 9 225.217000
COPY_TO 5 1 10 219.609000
COPY_FROM 5 1 1 190.806000
COPY_FROM 5 1 2 192.065000
COPY_FROM 5 1 3 192.423000
COPY_FROM 5 1 4 200.560000
COPY_FROM 5 1 5 190.027000
COPY_FROM 5 1 6 190.954000
COPY_FROM 5 1 7 190.775000
COPY_FROM 5 1 8 187.590000
COPY_FROM 5 1 9 194.545000
COPY_FROM 5 1 10 190.831000
COPY_TO 5 2 1 419.239000
COPY_TO 5 2 2 428.527000
COPY_TO 5 2 3 424.408000
COPY_TO 5 2 4 428.882000
COPY_TO 5 2 5 419.476000
COPY_TO 5 2 6 422.894000
COPY_TO 5 2 7 418.744000
COPY_TO 5 2 8 425.265000
COPY_TO 5 2 9 428.402000
COPY_TO 5 2 10 425.687000
COPY_FROM 5 2 1 368.581000
COPY_FROM 5 2 2 368.334000
COPY_FROM 5 2 3 379.807000
COPY_FROM 5 2 4 367.980000
COPY_FROM 5 2 5 364.015000
COPY_FROM 5 2 6 366.088000
COPY_FROM 5 2 7 360.102000
COPY_FROM 5 2 8 366.403000
COPY_FROM 5 2 9 363.637000
COPY_FROM 5 2 10 362.853000
COPY_TO 5 3 1 636.678000
COPY_TO 5 3 2 635.709000
COPY_TO 5 3 3 631.716000
COPY_TO 5 3 4 608.849000
COPY_TO 5 3 5 625.253000
COPY_TO 5 3 6 630.432000
COPY_TO 5 3 7 636.818000
COPY_TO 5 3 8 640.687000
COPY_TO 5 3 9 651.740000
COPY_TO 5 3 10 622.738000
COPY_FROM 5 3 1 541.713000
COPY_FROM 5 3 2 532.056000
COPY_FROM 5 3 3 539.630000
COPY_FROM 5 3 4 549.629000
COPY_FROM 5 3 5 548.109000
COPY_FROM 5 3 6 533.228000
COPY_FROM 5 3 7 532.981000
COPY_FROM 5 3 8 527.524000
COPY_FROM 5 3 9 566.548000
COPY_FROM 5 3 10 531.553000
COPY_TO 5 4 1 823.149000
COPY_TO 5 4 2 842.084000
COPY_TO 5 4 3 841.990000
COPY_TO 5 4 4 834.844000
COPY_TO 5 4 5 847.631000
COPY_TO 5 4 6 852.530000
COPY_TO 5 4 7 822.453000
COPY_TO 5 4 8 851.579000
COPY_TO 5 4 9 841.356000
COPY_TO 5 4 10 840.655000
COPY_FROM 5 4 1 715.727000
COPY_FROM 5 4 2 700.656000
COPY_FROM 5 4 3 714.135000
COPY_FROM 5 4 4 711.922000
COPY_FROM 5 4 5 703.007000
COPY_FROM 5 4 6 700.765000
COPY_FROM 5 4 7 705.071000
COPY_FROM 5 4 8 716.543000
COPY_FROM 5 4 9 702.448000
COPY_FROM 5 4 10 716.714000
COPY_TO 5 5 1 1044.045000
COPY_TO 5 5 2 1039.683000
COPY_TO 5 5 3 1010.508000
COPY_TO 5 5 4 1032.182000
COPY_TO 5 5 5 1056.995000
COPY_TO 5 5 6 1028.120000
COPY_TO 5 5 7 1035.610000
COPY_TO 5 5 8 1047.220000
COPY_TO 5 5 9 1056.572000
COPY_TO 5 5 10 1052.532000
COPY_FROM 5 5 1 880.451000
COPY_FROM 5 5 2 892.421000
COPY_FROM 5 5 3 926.924000
COPY_FROM 5 5 4 891.630000
COPY_FROM 5 5 5 931.319000
COPY_FROM 5 5 6 900.775000
COPY_FROM 5 5 7 894.377000
COPY_FROM 5 5 8 892.984000
COPY_FROM 5 5 9 882.452000
COPY_FROM 5 5 10 941.360000
COPY_TO 5 6 1 1258.759000
COPY_TO 5 6 2 1259.336000
COPY_TO 5 6 3 1268.761000
COPY_TO 5 6 4 1234.730000
COPY_TO 5 6 5 1272.013000
COPY_TO 5 6 6 1233.970000
COPY_TO 5 6 7 1281.098000
COPY_TO 5 6 8 1267.348000
COPY_TO 5 6 9 1259.674000
COPY_TO 5 6 10 1266.219000
COPY_FROM 5 6 1 1052.524000
COPY_FROM 5 6 2 1067.610000
COPY_FROM 5 6 3 1057.225000
COPY_FROM 5 6 4 1053.887000
COPY_FROM 5 6 5 1066.923000
COPY_FROM 5 6 6 1066.930000
COPY_FROM 5 6 7 1064.119000
COPY_FROM 5 6 8 1103.817000
COPY_FROM 5 6 9 1040.265000
COPY_FROM 5 6 10 1049.068000
COPY_TO 5 7 1 1492.215000
COPY_TO 5 7 2 1488.576000
COPY_TO 5 7 3 1467.710000
COPY_TO 5 7 4 1478.339000
COPY_TO 5 7 5 1501.272000
COPY_TO 5 7 6 1483.944000
COPY_TO 5 7 7 1479.922000
COPY_TO 5 7 8 1476.075000
COPY_TO 5 7 9 1470.403000
COPY_TO 5 7 10 1504.996000
COPY_FROM 5 7 1 1231.400000
COPY_FROM 5 7 2 1207.745000
COPY_FROM 5 7 3 1238.918000
COPY_FROM 5 7 4 1228.868000
COPY_FROM 5 7 5 1239.988000
COPY_FROM 5 7 6 1230.274000
COPY_FROM 5 7 7 1236.876000
COPY_FROM 5 7 8 1227.257000
COPY_FROM 5 7 9 1230.378000
COPY_FROM 5 7 10 1286.864000
COPY_TO 5 8 1 1739.946000
COPY_TO 5 8 2 1699.952000
COPY_TO 5 8 3 1679.076000
COPY_TO 5 8 4 1686.910000
COPY_TO 5 8 5 1688.083000
COPY_TO 5 8 6 1694.051000
COPY_TO 5 8 7 1678.831000
COPY_TO 5 8 8 1659.907000
COPY_TO 5 8 9 1641.518000
COPY_TO 5 8 10 1679.057000
COPY_FROM 5 8 1 1437.100000
COPY_FROM 5 8 2 1424.070000
COPY_FROM 5 8 3 1473.867000
COPY_FROM 5 8 4 1405.431000
COPY_FROM 5 8 5 1406.246000
COPY_FROM 5 8 6 1419.742000
COPY_FROM 5 8 7 1387.097000
COPY_FROM 5 8 8 1396.140000
COPY_FROM 5 8 9 1420.520000
COPY_FROM 5 8 10 1412.001000
COPY_TO 5 9 1 1858.925000
COPY_TO 5 9 2 1850.901000
COPY_TO 5 9 3 1873.408000
COPY_TO 5 9 4 1905.935000
COPY_TO 5 9 5 1910.295000
COPY_TO 5 9 6 1889.258000
COPY_TO 5 9 7 1865.899000
COPY_TO 5 9 8 1874.485000
COPY_TO 5 9 9 1906.459000
COPY_TO 5 9 10 1844.316000
COPY_FROM 5 9 1 1584.546000
COPY_FROM 5 9 2 1586.177000
COPY_FROM 5 9 3 1578.157000
COPY_FROM 5 9 4 1553.313000
COPY_FROM 5 9 5 1547.309000
COPY_FROM 5 9 6 1588.149000
COPY_FROM 5 9 7 1569.061000
COPY_FROM 5 9 8 1579.066000
COPY_FROM 5 9 9 1570.615000
COPY_FROM 5 9 10 1592.860000
COPY_TO 5 10 1 2095.472000
COPY_TO 5 10 2 2077.086000
COPY_TO 5 10 3 2082.011000
COPY_TO 5 10 4 2118.808000
COPY_TO 5 10 5 2122.738000
COPY_TO 5 10 6 2116.635000
COPY_TO 5 10 7 2065.169000
COPY_TO 5 10 8 2071.043000
COPY_TO 5 10 9 2104.322000
COPY_TO 5 10 10 2094.018000
COPY_FROM 5 10 1 1746.916000
COPY_FROM 5 10 2 1748.130000
COPY_FROM 5 10 3 1742.154000
COPY_FROM 5 10 4 1822.064000
COPY_FROM 5 10 5 1739.668000
COPY_FROM 5 10 6 1736.219000
COPY_FROM 5 10 7 1828.462000
COPY_FROM 5 10 8 1741.385000
COPY_FROM 5 10 9 1749.339000
COPY_FROM 5 10 10 1749.511000
COPY_TO 10 1 1 348.214000
COPY_TO 10 1 2 344.409000
COPY_TO 10 1 3 348.118000
COPY_TO 10 1 4 351.250000
COPY_TO 10 1 5 345.164000
COPY_TO 10 1 6 348.686000
COPY_TO 10 1 7 347.033000
COPY_TO 10 1 8 356.881000
COPY_TO 10 1 9 363.224000
COPY_TO 10 1 10 344.265000
COPY_FROM 10 1 1 309.644000
COPY_FROM 10 1 2 315.303000
COPY_FROM 10 1 3 309.133000
COPY_FROM 10 1 4 308.645000
COPY_FROM 10 1 5 307.657000
COPY_FROM 10 1 6 308.826000
COPY_FROM 10 1 7 306.510000
COPY_FROM 10 1 8 306.906000
COPY_FROM 10 1 9 312.354000
COPY_FROM 10 1 10 309.572000
COPY_TO 10 2 1 680.964000
COPY_TO 10 2 2 686.492000
COPY_TO 10 2 3 673.836000
COPY_TO 10 2 4 693.009000
COPY_TO 10 2 5 679.089000
COPY_TO 10 2 6 675.584000
COPY_TO 10 2 7 682.799000
COPY_TO 10 2 8 692.569000
COPY_TO 10 2 9 671.136000
COPY_TO 10 2 10 658.264000
COPY_FROM 10 2 1 596.359000
COPY_FROM 10 2 2 592.091000
COPY_FROM 10 2 3 593.862000
COPY_FROM 10 2 4 595.358000
COPY_FROM 10 2 5 595.974000
COPY_FROM 10 2 6 620.312000
COPY_FROM 10 2 7 596.066000
COPY_FROM 10 2 8 600.032000
COPY_FROM 10 2 9 600.454000
COPY_FROM 10 2 10 596.003000
COPY_TO 10 3 1 1019.610000
COPY_TO 10 3 2 1007.821000
COPY_TO 10 3 3 1014.551000
COPY_TO 10 3 4 1004.209000
COPY_TO 10 3 5 1037.550000
COPY_TO 10 3 6 1006.828000
COPY_TO 10 3 7 1018.162000
COPY_TO 10 3 8 992.985000
COPY_TO 10 3 9 1025.867000
COPY_TO 10 3 10 1028.286000
COPY_FROM 10 3 1 892.287000
COPY_FROM 10 3 2 887.084000
COPY_FROM 10 3 3 892.174000
COPY_FROM 10 3 4 899.172000
COPY_FROM 10 3 5 880.837000
COPY_FROM 10 3 6 885.155000
COPY_FROM 10 3 7 893.880000
COPY_FROM 10 3 8 870.693000
COPY_FROM 10 3 9 882.712000
COPY_FROM 10 3 10 878.129000
COPY_TO 10 4 1 1358.053000
COPY_TO 10 4 2 1360.787000
COPY_TO 10 4 3 1322.403000
COPY_TO 10 4 4 1388.729000
COPY_TO 10 4 5 1371.818000
COPY_TO 10 4 6 1349.647000
COPY_TO 10 4 7 1373.746000
COPY_TO 10 4 8 1426.870000
COPY_TO 10 4 9 1347.131000
COPY_TO 10 4 10 1336.103000
COPY_FROM 10 4 1 1187.230000
COPY_FROM 10 4 2 1175.994000
COPY_FROM 10 4 3 1186.190000
COPY_FROM 10 4 4 1186.168000
COPY_FROM 10 4 5 1182.343000
COPY_FROM 10 4 6 1179.842000
COPY_FROM 10 4 7 1175.598000
COPY_FROM 10 4 8 1189.885000
COPY_FROM 10 4 9 1166.391000
COPY_FROM 10 4 10 1175.445000
COPY_TO 10 5 1 1715.079000
COPY_TO 10 5 2 1685.962000
COPY_TO 10 5 3 1670.427000
COPY_TO 10 5 4 1662.272000
COPY_TO 10 5 5 1683.975000
COPY_TO 10 5 6 1662.505000
COPY_TO 10 5 7 1678.846000
COPY_TO 10 5 8 1659.495000
COPY_TO 10 5 9 1640.480000
COPY_TO 10 5 10 1666.186000
COPY_FROM 10 5 1 1469.628000
COPY_FROM 10 5 2 1482.367000
COPY_FROM 10 5 3 1467.941000
COPY_FROM 10 5 4 1443.202000
COPY_FROM 10 5 5 1438.004000
COPY_FROM 10 5 6 1435.447000
COPY_FROM 10 5 7 1440.746000
COPY_FROM 10 5 8 1440.996000
COPY_FROM 10 5 9 1516.147000
COPY_FROM 10 5 10 1523.081000
COPY_TO 10 6 1 2008.597000
COPY_TO 10 6 2 2002.489000
COPY_TO 10 6 3 2054.108000
COPY_TO 10 6 4 2020.325000
COPY_TO 10 6 5 2074.237000
COPY_TO 10 6 6 2029.803000
COPY_TO 10 6 7 2004.565000
COPY_TO 10 6 8 2027.488000
COPY_TO 10 6 9 2018.207000
COPY_TO 10 6 10 2043.407000
COPY_FROM 10 6 1 1732.059000
COPY_FROM 10 6 2 1698.740000
COPY_FROM 10 6 3 1744.397000
COPY_FROM 10 6 4 1752.396000
COPY_FROM 10 6 5 1738.473000
COPY_FROM 10 6 6 1750.763000
COPY_FROM 10 6 7 1760.260000
COPY_FROM 10 6 8 1734.506000
COPY_FROM 10 6 9 1752.343000
COPY_FROM 10 6 10 1800.136000
COPY_TO 10 7 1 2373.267000
COPY_TO 10 7 2 2359.455000
COPY_TO 10 7 3 2376.194000
COPY_TO 10 7 4 2369.713000
COPY_TO 10 7 5 2380.525000
COPY_TO 10 7 6 2355.680000
COPY_TO 10 7 7 2365.292000
COPY_TO 10 7 8 2387.594000
COPY_TO 10 7 9 2361.091000
COPY_TO 10 7 10 2399.029000
COPY_FROM 10 7 1 2040.625000
COPY_FROM 10 7 2 2031.799000
COPY_FROM 10 7 3 2054.883000
COPY_FROM 10 7 4 2020.964000
COPY_FROM 10 7 5 2085.711000
COPY_FROM 10 7 6 2056.172000
COPY_FROM 10 7 7 2053.141000
COPY_FROM 10 7 8 2017.080000
COPY_FROM 10 7 9 2036.249000
COPY_FROM 10 7 10 2055.574000
COPY_TO 10 8 1 2708.496000
COPY_TO 10 8 2 2670.277000
COPY_TO 10 8 3 2739.491000
COPY_TO 10 8 4 2670.203000
COPY_TO 10 8 5 2686.905000
COPY_TO 10 8 6 2715.423000
COPY_TO 10 8 7 2661.954000
COPY_TO 10 8 8 2679.533000
COPY_TO 10 8 9 2700.084000
COPY_TO 10 8 10 2692.732000
COPY_FROM 10 8 1 2332.678000
COPY_FROM 10 8 2 2327.148000
COPY_FROM 10 8 3 2365.272000
COPY_FROM 10 8 4 2323.775000
COPY_FROM 10 8 5 2327.727000
COPY_FROM 10 8 6 2328.340000
COPY_FROM 10 8 7 2351.656000
COPY_FROM 10 8 8 2359.587000
COPY_FROM 10 8 9 2315.807000
COPY_FROM 10 8 10 2323.951000
COPY_TO 10 9 1 3048.751000
COPY_TO 10 9 2 3047.431000
COPY_TO 10 9 3 3033.034000
COPY_TO 10 9 4 3024.685000
COPY_TO 10 9 5 3033.612000
COPY_TO 10 9 6 3071.925000
COPY_TO 10 9 7 3066.067000
COPY_TO 10 9 8 3061.065000
COPY_TO 10 9 9 3033.557000
COPY_TO 10 9 10 3139.233000
COPY_FROM 10 9 1 2637.134000
COPY_FROM 10 9 2 2648.296000
COPY_FROM 10 9 3 2595.698000
COPY_FROM 10 9 4 2684.115000
COPY_FROM 10 9 5 2640.266000
COPY_FROM 10 9 6 2647.282000
COPY_FROM 10 9 7 2626.573000
COPY_FROM 10 9 8 2597.198000
COPY_FROM 10 9 9 2590.305000
COPY_FROM 10 9 10 2607.834000
COPY_TO 10 10 1 3399.538000
COPY_TO 10 10 2 3395.112000
COPY_TO 10 10 3 3379.849000
COPY_TO 10 10 4 3447.512000
COPY_TO 10 10 5 3395.209000
COPY_TO 10 10 6 3372.455000
COPY_TO 10 10 7 3426.450000
COPY_TO 10 10 8 3406.147000
COPY_TO 10 10 9 3401.163000
COPY_TO 10 10 10 3398.863000
COPY_FROM 10 10 1 2918.524000
COPY_FROM 10 10 2 2946.519000
COPY_FROM 10 10 3 2897.459000
COPY_FROM 10 10 4 2949.553000
COPY_FROM 10 10 5 2924.340000
COPY_FROM 10 10 6 2880.430000
COPY_FROM 10 10 7 2943.481000
COPY_FROM 10 10 8 2924.866000
COPY_FROM 10 10 9 2882.415000
COPY_FROM 10 10 10 2939.448000
COPY_TO 15 1 1 481.490000
COPY_TO 15 1 2 480.802000
COPY_TO 15 1 3 505.153000
COPY_TO 15 1 4 480.755000
COPY_TO 15 1 5 487.445000
COPY_TO 15 1 6 478.630000
COPY_TO 15 1 7 471.924000
COPY_TO 15 1 8 484.494000
COPY_TO 15 1 9 475.958000
COPY_TO 15 1 10 476.259000
COPY_FROM 15 1 1 404.762000
COPY_FROM 15 1 2 411.539000
COPY_FROM 15 1 3 396.594000
COPY_FROM 15 1 4 402.033000
COPY_FROM 15 1 5 399.084000
COPY_FROM 15 1 6 402.425000
COPY_FROM 15 1 7 399.751000
COPY_FROM 15 1 8 396.732000
COPY_FROM 15 1 9 408.485000
COPY_FROM 15 1 10 401.768000
COPY_TO 15 2 1 948.346000
COPY_TO 15 2 2 960.359000
COPY_TO 15 2 3 945.425000
COPY_TO 15 2 4 942.055000
COPY_TO 15 2 5 946.342000
COPY_TO 15 2 6 974.876000
COPY_TO 15 2 7 935.041000
COPY_TO 15 2 8 962.795000
COPY_TO 15 2 9 934.524000
COPY_TO 15 2 10 944.476000
COPY_FROM 15 2 1 794.640000
COPY_FROM 15 2 2 764.601000
COPY_FROM 15 2 3 785.607000
COPY_FROM 15 2 4 768.691000
COPY_FROM 15 2 5 789.261000
COPY_FROM 15 2 6 766.484000
COPY_FROM 15 2 7 762.206000
COPY_FROM 15 2 8 777.008000
COPY_FROM 15 2 9 777.736000
COPY_FROM 15 2 10 768.562000
COPY_TO 15 3 1 1471.715000
COPY_TO 15 3 2 1425.784000
COPY_TO 15 3 3 1430.887000
COPY_TO 15 3 4 1411.350000
COPY_TO 15 3 5 1399.500000
COPY_TO 15 3 6 1414.848000
COPY_TO 15 3 7 1471.325000
COPY_TO 15 3 8 1424.225000
COPY_TO 15 3 9 1438.927000
COPY_TO 15 3 10 1383.432000
COPY_FROM 15 3 1 1157.842000
COPY_FROM 15 3 2 1148.168000
COPY_FROM 15 3 3 1170.290000
COPY_FROM 15 3 4 1163.281000
COPY_FROM 15 3 5 1164.792000
COPY_FROM 15 3 6 1170.901000
COPY_FROM 15 3 7 1167.411000
COPY_FROM 15 3 8 1136.925000
COPY_FROM 15 3 9 1163.268000
COPY_FROM 15 3 10 1167.786000
COPY_TO 15 4 1 1879.456000
COPY_TO 15 4 2 1851.491000
COPY_TO 15 4 3 1834.399000
COPY_TO 15 4 4 1909.106000
COPY_TO 15 4 5 1939.416000
COPY_TO 15 4 6 1856.175000
COPY_TO 15 4 7 1936.540000
COPY_TO 15 4 8 1872.650000
COPY_TO 15 4 9 1846.497000
COPY_TO 15 4 10 1851.336000
COPY_FROM 15 4 1 1527.390000
COPY_FROM 15 4 2 1559.869000
COPY_FROM 15 4 3 1549.983000
COPY_FROM 15 4 4 1519.352000
COPY_FROM 15 4 5 1534.720000
COPY_FROM 15 4 6 1531.672000
COPY_FROM 15 4 7 1514.365000
COPY_FROM 15 4 8 1524.385000
COPY_FROM 15 4 9 1519.783000
COPY_FROM 15 4 10 1518.088000
COPY_TO 15 5 1 2315.126000
COPY_TO 15 5 2 2403.590000
COPY_TO 15 5 3 2353.186000
COPY_TO 15 5 4 2362.140000
COPY_TO 15 5 5 2337.372000
COPY_TO 15 5 6 2369.436000
COPY_TO 15 5 7 2344.194000
COPY_TO 15 5 8 2345.627000
COPY_TO 15 5 9 2393.136000
COPY_TO 15 5 10 2390.355000
COPY_FROM 15 5 1 1904.628000
COPY_FROM 15 5 2 1910.340000
COPY_FROM 15 5 3 1918.427000
COPY_FROM 15 5 4 1912.737000
COPY_FROM 15 5 5 1955.806000
COPY_FROM 15 5 6 1892.326000
COPY_FROM 15 5 7 1915.079000
COPY_FROM 15 5 8 1920.116000
COPY_FROM 15 5 9 1914.245000
COPY_FROM 15 5 10 1887.371000
COPY_TO 15 6 1 2825.865000
COPY_TO 15 6 2 2834.549000
COPY_TO 15 6 3 2838.698000
COPY_TO 15 6 4 2769.660000
COPY_TO 15 6 5 2771.549000
COPY_TO 15 6 6 2824.433000
COPY_TO 15 6 7 2850.494000
COPY_TO 15 6 8 2873.406000
COPY_TO 15 6 9 2819.338000
COPY_TO 15 6 10 2800.095000
COPY_FROM 15 6 1 2312.919000
COPY_FROM 15 6 2 2280.861000
COPY_FROM 15 6 3 2276.382000
COPY_FROM 15 6 4 2328.440000
COPY_FROM 15 6 5 2306.146000
COPY_FROM 15 6 6 2290.642000
COPY_FROM 15 6 7 2318.425000
COPY_FROM 15 6 8 2319.431000
COPY_FROM 15 6 9 2271.906000
COPY_FROM 15 6 10 2307.933000
COPY_TO 15 7 1 3276.071000
COPY_TO 15 7 2 3302.277000
COPY_TO 15 7 3 3227.547000
COPY_TO 15 7 4 3205.014000
COPY_TO 15 7 5 3216.083000
COPY_TO 15 7 6 3288.328000
COPY_TO 15 7 7 3273.990000
COPY_TO 15 7 8 3269.459000
COPY_TO 15 7 9 3276.029000
COPY_TO 15 7 10 3257.944000
COPY_FROM 15 7 1 2650.110000
COPY_FROM 15 7 2 2687.294000
COPY_FROM 15 7 3 2679.115000
COPY_FROM 15 7 4 2642.092000
COPY_FROM 15 7 5 2754.050000
COPY_FROM 15 7 6 2670.190000
COPY_FROM 15 7 7 2659.509000
COPY_FROM 15 7 8 2680.944000
COPY_FROM 15 7 9 2702.110000
COPY_FROM 15 7 10 2714.737000
COPY_TO 15 8 1 3743.204000
COPY_TO 15 8 2 3728.396000
COPY_TO 15 8 3 3694.741000
COPY_TO 15 8 4 3792.445000
COPY_TO 15 8 5 3774.482000
COPY_TO 15 8 6 3741.767000
COPY_TO 15 8 7 3763.394000
COPY_TO 15 8 8 3760.802000
COPY_TO 15 8 9 3778.522000
COPY_TO 15 8 10 3719.006000
COPY_FROM 15 8 1 3036.831000
COPY_FROM 15 8 2 3053.770000
COPY_FROM 15 8 3 3084.446000
COPY_FROM 15 8 4 3083.344000
COPY_FROM 15 8 5 3063.590000
COPY_FROM 15 8 6 2994.047000
COPY_FROM 15 8 7 2997.051000
COPY_FROM 15 8 8 3028.378000
COPY_FROM 15 8 9 2985.766000
COPY_FROM 15 8 10 3048.673000
COPY_TO 15 9 1 4209.894000
COPY_TO 15 9 2 4166.767000
COPY_TO 15 9 3 4381.260000
COPY_TO 15 9 4 4166.933000
COPY_TO 15 9 5 4208.765000
COPY_TO 15 9 6 4221.622000
COPY_TO 15 9 7 4243.140000
COPY_TO 15 9 8 4221.371000
COPY_TO 15 9 9 4206.701000
COPY_TO 15 9 10 4173.130000
COPY_FROM 15 9 1 3509.903000
COPY_FROM 15 9 2 3429.411000
COPY_FROM 15 9 3 3476.601000
COPY_FROM 15 9 4 3532.142000
COPY_FROM 15 9 5 3482.214000
COPY_FROM 15 9 6 3481.428000
COPY_FROM 15 9 7 3512.648000
COPY_FROM 15 9 8 3429.037000
COPY_FROM 15 9 9 3481.796000
COPY_FROM 15 9 10 3405.251000
COPY_TO 15 10 1 4783.435000
COPY_TO 15 10 2 4644.036000
COPY_TO 15 10 3 4681.145000
COPY_TO 15 10 4 4649.832000
COPY_TO 15 10 5 4695.900000
COPY_TO 15 10 6 4741.715000
COPY_TO 15 10 7 4645.073000
COPY_TO 15 10 8 4685.227000
COPY_TO 15 10 9 4667.401000
COPY_TO 15 10 10 4674.533000
COPY_FROM 15 10 1 3791.541000
COPY_FROM 15 10 2 3769.122000
COPY_FROM 15 10 3 3875.967000
COPY_FROM 15 10 4 3898.442000
COPY_FROM 15 10 5 3757.463000
COPY_FROM 15 10 6 3803.542000
COPY_FROM 15 10 7 3898.392000
COPY_FROM 15 10 8 3853.286000
COPY_FROM 15 10 9 3796.088000
COPY_FROM 15 10 10 3775.505000
COPY_TO 20 1 1 588.371000
COPY_TO 20 1 2 588.534000
COPY_TO 20 1 3 604.939000
COPY_TO 20 1 4 609.174000
COPY_TO 20 1 5 599.870000
COPY_TO 20 1 6 606.180000
COPY_TO 20 1 7 593.885000
COPY_TO 20 1 8 619.903000
COPY_TO 20 1 9 613.359000
COPY_TO 20 1 10 579.570000
COPY_FROM 20 1 1 512.783000
COPY_FROM 20 1 2 520.086000
COPY_FROM 20 1 3 526.413000
COPY_FROM 20 1 4 503.333000
COPY_FROM 20 1 5 499.622000
COPY_FROM 20 1 6 504.114000
COPY_FROM 20 1 7 522.611000
COPY_FROM 20 1 8 504.078000
COPY_FROM 20 1 9 519.839000
COPY_FROM 20 1 10 521.364000
COPY_TO 20 2 1 1192.504000
COPY_TO 20 2 2 1204.486000
COPY_TO 20 2 3 1184.451000
COPY_TO 20 2 4 1224.046000
COPY_TO 20 2 5 1168.863000
COPY_TO 20 2 6 1185.104000
COPY_TO 20 2 7 1213.106000
COPY_TO 20 2 8 1219.240000
COPY_TO 20 2 9 1239.428000
COPY_TO 20 2 10 1202.900000
COPY_FROM 20 2 1 970.190000
COPY_FROM 20 2 2 968.745000
COPY_FROM 20 2 3 968.316000
COPY_FROM 20 2 4 963.391000
COPY_FROM 20 2 5 977.050000
COPY_FROM 20 2 6 977.689000
COPY_FROM 20 2 7 986.514000
COPY_FROM 20 2 8 996.876000
COPY_FROM 20 2 9 988.527000
COPY_FROM 20 2 10 973.651000
COPY_TO 20 3 1 1830.128000
COPY_TO 20 3 2 1783.896000
COPY_TO 20 3 3 1824.977000
COPY_TO 20 3 4 1808.527000
COPY_TO 20 3 5 1831.361000
COPY_TO 20 3 6 1793.910000
COPY_TO 20 3 7 1790.156000
COPY_TO 20 3 8 1901.378000
COPY_TO 20 3 9 1808.887000
COPY_TO 20 3 10 1820.944000
COPY_FROM 20 3 1 1463.492000
COPY_FROM 20 3 2 1473.649000
COPY_FROM 20 3 3 1470.260000
COPY_FROM 20 3 4 1460.868000
COPY_FROM 20 3 5 1472.357000
COPY_FROM 20 3 6 1467.554000
COPY_FROM 20 3 7 1447.413000
COPY_FROM 20 3 8 1477.952000
COPY_FROM 20 3 9 1428.962000
COPY_FROM 20 3 10 1488.391000
COPY_TO 20 4 1 2425.563000
COPY_TO 20 4 2 2389.778000
COPY_TO 20 4 3 2423.539000
COPY_TO 20 4 4 2402.541000
COPY_TO 20 4 5 2435.064000
COPY_TO 20 4 6 2375.980000
COPY_TO 20 4 7 2405.196000
COPY_TO 20 4 8 2516.577000
COPY_TO 20 4 9 2420.785000
COPY_TO 20 4 10 2396.486000
COPY_FROM 20 4 1 1982.045000
COPY_FROM 20 4 2 1936.064000
COPY_FROM 20 4 3 1949.428000
COPY_FROM 20 4 4 1962.934000
COPY_FROM 20 4 5 1970.196000
COPY_FROM 20 4 6 2011.610000
COPY_FROM 20 4 7 1990.147000
COPY_FROM 20 4 8 1932.480000
COPY_FROM 20 4 9 1988.822000
COPY_FROM 20 4 10 1971.847000
COPY_TO 20 5 1 2981.284000
COPY_TO 20 5 2 2976.269000
COPY_TO 20 5 3 3098.910000
COPY_TO 20 5 4 2959.358000
COPY_TO 20 5 5 3077.661000
COPY_TO 20 5 6 2964.573000
COPY_TO 20 5 7 3000.287000
COPY_TO 20 5 8 3028.061000
COPY_TO 20 5 9 2981.529000
COPY_TO 20 5 10 2996.331000
COPY_FROM 20 5 1 2432.749000
COPY_FROM 20 5 2 2431.906000
COPY_FROM 20 5 3 2439.713000
COPY_FROM 20 5 4 2412.802000
COPY_FROM 20 5 5 2473.731000
COPY_FROM 20 5 6 2481.380000
COPY_FROM 20 5 7 2434.936000
COPY_FROM 20 5 8 2435.429000
COPY_FROM 20 5 9 2424.148000
COPY_FROM 20 5 10 2435.032000
COPY_TO 20 6 1 3515.496000
COPY_TO 20 6 2 3600.580000
COPY_TO 20 6 3 3661.207000
COPY_TO 20 6 4 3560.876000
COPY_TO 20 6 5 3560.392000
COPY_TO 20 6 6 3610.652000
COPY_TO 20 6 7 3592.674000
COPY_TO 20 6 8 3572.048000
COPY_TO 20 6 9 3552.225000
COPY_TO 20 6 10 3544.525000
COPY_FROM 20 6 1 3029.925000
COPY_FROM 20 6 2 2868.143000
COPY_FROM 20 6 3 3003.149000
COPY_FROM 20 6 4 3004.859000
COPY_FROM 20 6 5 2975.037000
COPY_FROM 20 6 6 2941.359000
COPY_FROM 20 6 7 2958.822000
COPY_FROM 20 6 8 2929.398000
COPY_FROM 20 6 9 2942.849000
COPY_FROM 20 6 10 2980.994000
COPY_TO 20 7 1 4116.153000
COPY_TO 20 7 2 4151.977000
COPY_TO 20 7 3 4191.156000
COPY_TO 20 7 4 4122.951000
COPY_TO 20 7 5 4132.277000
COPY_TO 20 7 6 4117.195000
COPY_TO 20 7 7 4102.429000
COPY_TO 20 7 8 4167.083000
COPY_TO 20 7 9 4186.748000
COPY_TO 20 7 10 4176.378000
COPY_FROM 20 7 1 3385.308000
COPY_FROM 20 7 2 3424.239000
COPY_FROM 20 7 3 3469.448000
COPY_FROM 20 7 4 3427.053000
COPY_FROM 20 7 5 3393.027000
COPY_FROM 20 7 6 3390.030000
COPY_FROM 20 7 7 3412.631000
COPY_FROM 20 7 8 3416.462000
COPY_FROM 20 7 9 3440.032000
COPY_FROM 20 7 10 3406.263000
COPY_TO 20 8 1 4763.503000
COPY_TO 20 8 2 4747.748000
COPY_TO 20 8 3 4701.871000
COPY_TO 20 8 4 4792.958000
COPY_TO 20 8 5 4802.671000
COPY_TO 20 8 6 4700.079000
COPY_TO 20 8 7 4735.823000
COPY_TO 20 8 8 4697.905000
COPY_TO 20 8 9 4792.274000
COPY_TO 20 8 10 4824.296000
COPY_FROM 20 8 1 3911.984000
COPY_FROM 20 8 2 3964.766000
COPY_FROM 20 8 3 3868.331000
COPY_FROM 20 8 4 3864.961000
COPY_FROM 20 8 5 3901.993000
COPY_FROM 20 8 6 3913.048000
COPY_FROM 20 8 7 3913.909000
COPY_FROM 20 8 8 3911.791000
COPY_FROM 20 8 9 3922.104000
COPY_FROM 20 8 10 3865.939000
COPY_TO 20 9 1 5345.060000
COPY_TO 20 9 2 5332.009000
COPY_TO 20 9 3 5396.687000
COPY_TO 20 9 4 5465.947000
COPY_TO 20 9 5 5338.527000
COPY_TO 20 9 6 5336.702000
COPY_TO 20 9 7 5312.314000
COPY_TO 20 9 8 5355.565000
COPY_TO 20 9 9 5299.061000
COPY_TO 20 9 10 5444.467000
COPY_FROM 20 9 1 4348.639000
COPY_FROM 20 9 2 4324.543000
COPY_FROM 20 9 3 4405.188000
COPY_FROM 20 9 4 4371.563000
COPY_FROM 20 9 5 4362.497000
COPY_FROM 20 9 6 4399.904000
COPY_FROM 20 9 7 4360.325000
COPY_FROM 20 9 8 4319.934000
COPY_FROM 20 9 9 4374.741000
COPY_FROM 20 9 10 4312.770000
COPY_TO 20 10 1 5898.919000
COPY_TO 20 10 2 5937.106000
COPY_TO 20 10 3 5982.895000
COPY_TO 20 10 4 6006.050000
COPY_TO 20 10 5 6017.783000
COPY_TO 20 10 6 6027.734000
COPY_TO 20 10 7 5980.353000
COPY_TO 20 10 8 5880.622000
COPY_TO 20 10 9 6004.364000
COPY_TO 20 10 10 5991.804000
COPY_FROM 20 10 1 4898.844000
COPY_FROM 20 10 2 4803.747000
COPY_FROM 20 10 3 4811.804000
COPY_FROM 20 10 4 4876.269000
COPY_FROM 20 10 5 4880.505000
COPY_FROM 20 10 6 4911.774000
COPY_FROM 20 10 7 4798.363000
COPY_FROM 20 10 8 4992.114000
COPY_FROM 20 10 9 4979.897000
COPY_FROM 20 10 10 4991.956000
COPY_TO 25 1 1 734.355000
COPY_TO 25 1 2 751.684000
COPY_TO 25 1 3 741.651000
COPY_TO 25 1 4 741.016000
COPY_TO 25 1 5 758.336000
COPY_TO 25 1 6 742.133000
COPY_TO 25 1 7 754.167000
COPY_TO 25 1 8 756.050000
COPY_TO 25 1 9 734.336000
COPY_TO 25 1 10 755.085000
COPY_FROM 25 1 1 596.991000
COPY_FROM 25 1 2 595.280000
COPY_FROM 25 1 3 592.821000
COPY_FROM 25 1 4 598.301000
COPY_FROM 25 1 5 592.075000
COPY_FROM 25 1 6 598.580000
COPY_FROM 25 1 7 612.489000
COPY_FROM 25 1 8 594.139000
COPY_FROM 25 1 9 606.733000
COPY_FROM 25 1 10 599.884000
COPY_TO 25 2 1 1454.990000
COPY_TO 25 2 2 1464.926000
COPY_TO 25 2 3 1442.608000
COPY_TO 25 2 4 1467.204000
COPY_TO 25 2 5 1469.535000
COPY_TO 25 2 6 1441.349000
COPY_TO 25 2 7 1465.709000
COPY_TO 25 2 8 1477.000000
COPY_TO 25 2 9 1466.388000
COPY_TO 25 2 10 1491.142000
COPY_FROM 25 2 1 1186.171000
COPY_FROM 25 2 2 1197.788000
COPY_FROM 25 2 3 1189.075000
COPY_FROM 25 2 4 1168.622000
COPY_FROM 25 2 5 1194.059000
COPY_FROM 25 2 6 1175.976000
COPY_FROM 25 2 7 1180.774000
COPY_FROM 25 2 8 1240.536000
COPY_FROM 25 2 9 1216.618000
COPY_FROM 25 2 10 1162.956000
COPY_TO 25 3 1 2124.055000
COPY_TO 25 3 2 2208.083000
COPY_TO 25 3 3 2171.897000
COPY_TO 25 3 4 2200.009000
COPY_TO 25 3 5 2157.130000
COPY_TO 25 3 6 2178.217000
COPY_TO 25 3 7 2170.618000
COPY_TO 25 3 8 2159.246000
COPY_TO 25 3 9 2180.852000
COPY_TO 25 3 10 2162.987000
COPY_FROM 25 3 1 1814.018000
COPY_FROM 25 3 2 1777.425000
COPY_FROM 25 3 3 1804.940000
COPY_FROM 25 3 4 1777.249000
COPY_FROM 25 3 5 1755.835000
COPY_FROM 25 3 6 1762.743000
COPY_FROM 25 3 7 1791.698000
COPY_FROM 25 3 8 1812.374000
COPY_FROM 25 3 9 1807.720000
COPY_FROM 25 3 10 1779.357000
COPY_TO 25 4 1 2880.647000
COPY_TO 25 4 2 2914.773000
COPY_TO 25 4 3 2935.268000
COPY_TO 25 4 4 2899.907000
COPY_TO 25 4 5 2973.967000
COPY_TO 25 4 6 2957.416000
COPY_TO 25 4 7 2988.516000
COPY_TO 25 4 8 2960.672000
COPY_TO 25 4 9 2995.815000
COPY_TO 25 4 10 2938.285000
COPY_FROM 25 4 1 2390.532000
COPY_FROM 25 4 2 2402.782000
COPY_FROM 25 4 3 2380.246000
COPY_FROM 25 4 4 2313.968000
COPY_FROM 25 4 5 2336.368000
COPY_FROM 25 4 6 2333.098000
COPY_FROM 25 4 7 2341.465000
COPY_FROM 25 4 8 2364.110000
COPY_FROM 25 4 9 2347.709000
COPY_FROM 25 4 10 2440.437000
COPY_TO 25 5 1 3615.292000
COPY_TO 25 5 2 3706.462000
COPY_TO 25 5 3 3629.459000
COPY_TO 25 5 4 3614.923000
COPY_TO 25 5 5 3646.021000
COPY_TO 25 5 6 3622.527000
COPY_TO 25 5 7 3614.309000
COPY_TO 25 5 8 3590.665000
COPY_TO 25 5 9 3570.947000
COPY_TO 25 5 10 3616.614000
COPY_FROM 25 5 1 3025.729000
COPY_FROM 25 5 2 2890.083000
COPY_FROM 25 5 3 2832.641000
COPY_FROM 25 5 4 2896.295000
COPY_FROM 25 5 5 2906.869000
COPY_FROM 25 5 6 2964.634000
COPY_FROM 25 5 7 2981.976000
COPY_FROM 25 5 8 2899.424000
COPY_FROM 25 5 9 2952.928000
COPY_FROM 25 5 10 2956.315000
COPY_TO 25 6 1 4335.410000
COPY_TO 25 6 2 4352.629000
COPY_TO 25 6 3 4328.721000
COPY_TO 25 6 4 4329.229000
COPY_TO 25 6 5 4326.110000
COPY_TO 25 6 6 4350.785000
COPY_TO 25 6 7 4422.414000
COPY_TO 25 6 8 4327.600000
COPY_TO 25 6 9 4399.015000
COPY_TO 25 6 10 4344.686000
COPY_FROM 25 6 1 3546.817000
COPY_FROM 25 6 2 3583.911000
COPY_FROM 25 6 3 3577.654000
COPY_FROM 25 6 4 3545.870000
COPY_FROM 25 6 5 3517.331000
COPY_FROM 25 6 6 3621.318000
COPY_FROM 25 6 7 3518.093000
COPY_FROM 25 6 8 3487.595000
COPY_FROM 25 6 9 3502.635000
COPY_FROM 25 6 10 3442.832000
COPY_TO 25 7 1 5127.114000
COPY_TO 25 7 2 5147.491000
COPY_TO 25 7 3 5030.220000
COPY_TO 25 7 4 5039.242000
COPY_TO 25 7 5 5024.293000
COPY_TO 25 7 6 5177.402000
COPY_TO 25 7 7 5091.543000
COPY_TO 25 7 8 5047.738000
COPY_TO 25 7 9 5032.130000
COPY_TO 25 7 10 5125.969000
COPY_FROM 25 7 1 4236.840000
COPY_FROM 25 7 2 4166.021000
COPY_FROM 25 7 3 4072.998000
COPY_FROM 25 7 4 4044.735000
COPY_FROM 25 7 5 4095.923000
COPY_FROM 25 7 6 4100.569000
COPY_FROM 25 7 7 4065.397000
COPY_FROM 25 7 8 4038.183000
COPY_FROM 25 7 9 4051.760000
COPY_FROM 25 7 10 4100.604000
COPY_TO 25 8 1 5755.047000
COPY_TO 25 8 2 5882.932000
COPY_TO 25 8 3 5711.378000
COPY_TO 25 8 4 5750.234000
COPY_TO 25 8 5 5813.714000
COPY_TO 25 8 6 5818.114000
COPY_TO 25 8 7 5869.900000
COPY_TO 25 8 8 5792.470000
COPY_TO 25 8 9 5842.988000
COPY_TO 25 8 10 5826.206000
COPY_FROM 25 8 1 4633.711000
COPY_FROM 25 8 2 4604.926000
COPY_FROM 25 8 3 4843.423000
COPY_FROM 25 8 4 4623.654000
COPY_FROM 25 8 5 4764.668000
COPY_FROM 25 8 6 4665.590000
COPY_FROM 25 8 7 4743.428000
COPY_FROM 25 8 8 4684.806000
COPY_FROM 25 8 9 4625.929000
COPY_FROM 25 8 10 4796.581000
COPY_TO 25 9 1 6537.881000
COPY_TO 25 9 2 6467.843000
COPY_TO 25 9 3 6485.727000
COPY_TO 25 9 4 6419.503000
COPY_TO 25 9 5 6547.430000
COPY_TO 25 9 6 6647.516000
COPY_TO 25 9 7 6545.266000
COPY_TO 25 9 8 6483.089000
COPY_TO 25 9 9 6488.061000
COPY_TO 25 9 10 6501.837000
COPY_FROM 25 9 1 5218.724000
COPY_FROM 25 9 2 5322.326000
COPY_FROM 25 9 3 5134.734000
COPY_FROM 25 9 4 5181.265000
COPY_FROM 25 9 5 5236.810000
COPY_FROM 25 9 6 5394.804000
COPY_FROM 25 9 7 5216.023000
COPY_FROM 25 9 8 5228.834000
COPY_FROM 25 9 9 5233.673000
COPY_FROM 25 9 10 5370.429000
COPY_TO 25 10 1 7187.403000
COPY_TO 25 10 2 7237.522000
COPY_TO 25 10 3 7215.015000
COPY_TO 25 10 4 7310.549000
COPY_TO 25 10 5 7204.966000
COPY_TO 25 10 6 7395.831000
COPY_TO 25 10 7 7235.840000
COPY_TO 25 10 8 7335.155000
COPY_TO 25 10 9 7320.366000
COPY_TO 25 10 10 7399.998000
COPY_FROM 25 10 1 5766.539000
COPY_FROM 25 10 2 5868.594000
COPY_FROM 25 10 3 5717.698000
COPY_FROM 25 10 4 5824.276000
COPY_FROM 25 10 5 5865.970000
COPY_FROM 25 10 6 5943.953000
COPY_FROM 25 10 7 5730.136000
COPY_FROM 25 10 8 5856.029000
COPY_FROM 25 10 9 5782.006000
COPY_FROM 25 10 10 5872.458000
COPY_TO 30 1 1 855.345000
COPY_TO 30 1 2 871.867000
COPY_TO 30 1 3 855.503000
COPY_TO 30 1 4 852.658000
COPY_TO 30 1 5 872.763000
COPY_TO 30 1 6 840.381000
COPY_TO 30 1 7 858.728000
COPY_TO 30 1 8 854.579000
COPY_TO 30 1 9 849.983000
COPY_TO 30 1 10 865.011000
COPY_FROM 30 1 1 722.491000
COPY_FROM 30 1 2 697.416000
COPY_FROM 30 1 3 724.112000
COPY_FROM 30 1 4 723.154000
COPY_FROM 30 1 5 742.389000
COPY_FROM 30 1 6 736.040000
COPY_FROM 30 1 7 706.634000
COPY_FROM 30 1 8 703.661000
COPY_FROM 30 1 9 711.113000
COPY_FROM 30 1 10 698.277000
COPY_TO 30 2 1 1724.559000
COPY_TO 30 2 2 1719.146000
COPY_TO 30 2 3 1713.379000
COPY_TO 30 2 4 1715.448000
COPY_TO 30 2 5 1731.645000
COPY_TO 30 2 6 1715.003000
COPY_TO 30 2 7 1694.419000
COPY_TO 30 2 8 1692.365000
COPY_TO 30 2 9 1734.901000
COPY_TO 30 2 10 1753.928000
COPY_FROM 30 2 1 1381.412000
COPY_FROM 30 2 2 1393.055000
COPY_FROM 30 2 3 1429.064000
COPY_FROM 30 2 4 1400.549000
COPY_FROM 30 2 5 1390.625000
COPY_FROM 30 2 6 1399.524000
COPY_FROM 30 2 7 1428.245000
COPY_FROM 30 2 8 1396.228000
COPY_FROM 30 2 9 1394.769000
COPY_FROM 30 2 10 1376.140000
COPY_TO 30 3 1 2549.615000
COPY_TO 30 3 2 2549.549000
COPY_TO 30 3 3 2554.699000
COPY_TO 30 3 4 2620.901000
COPY_TO 30 3 5 2542.416000
COPY_TO 30 3 6 2463.919000
COPY_TO 30 3 7 2514.404000
COPY_TO 30 3 8 2606.338000
COPY_TO 30 3 9 2549.300000
COPY_TO 30 3 10 2614.069000
COPY_FROM 30 3 1 2061.723000
COPY_FROM 30 3 2 2054.595000
COPY_FROM 30 3 3 2064.499000
COPY_FROM 30 3 4 2029.387000
COPY_FROM 30 3 5 2060.673000
COPY_FROM 30 3 6 2071.234000
COPY_FROM 30 3 7 2039.847000
COPY_FROM 30 3 8 2034.512000
COPY_FROM 30 3 9 2048.970000
COPY_FROM 30 3 10 2070.192000
COPY_TO 30 4 1 3439.779000
COPY_TO 30 4 2 3415.562000
COPY_TO 30 4 3 3472.690000
COPY_TO 30 4 4 3426.406000
COPY_TO 30 4 5 3417.655000
COPY_TO 30 4 6 3420.833000
COPY_TO 30 4 7 3380.506000
COPY_TO 30 4 8 3462.000000
COPY_TO 30 4 9 3402.428000
COPY_TO 30 4 10 3428.111000
COPY_FROM 30 4 1 2733.262000
COPY_FROM 30 4 2 2683.878000
COPY_FROM 30 4 3 2821.240000
COPY_FROM 30 4 4 2768.113000
COPY_FROM 30 4 5 2867.414000
COPY_FROM 30 4 6 2759.740000
COPY_FROM 30 4 7 2796.335000
COPY_FROM 30 4 8 2688.241000
COPY_FROM 30 4 9 2693.820000
COPY_FROM 30 4 10 2731.140000
COPY_TO 30 5 1 4242.226000
COPY_TO 30 5 2 4337.764000
COPY_TO 30 5 3 4201.378000
COPY_TO 30 5 4 4276.924000
COPY_TO 30 5 5 4195.586000
COPY_TO 30 5 6 4147.869000
COPY_TO 30 5 7 4262.615000
COPY_TO 30 5 8 4283.672000
COPY_TO 30 5 9 4316.076000
COPY_TO 30 5 10 4265.417000
COPY_FROM 30 5 1 3414.952000
COPY_FROM 30 5 2 3484.110000
COPY_FROM 30 5 3 3410.230000
COPY_FROM 30 5 4 3456.846000
COPY_FROM 30 5 5 3383.937000
COPY_FROM 30 5 6 3430.556000
COPY_FROM 30 5 7 3430.628000
COPY_FROM 30 5 8 3428.378000
COPY_FROM 30 5 9 3396.417000
COPY_FROM 30 5 10 3432.408000
COPY_TO 30 6 1 5074.778000
COPY_TO 30 6 2 5101.994000
COPY_TO 30 6 3 5069.600000
COPY_TO 30 6 4 5222.574000
COPY_TO 30 6 5 5071.946000
COPY_TO 30 6 6 5076.127000
COPY_TO 30 6 7 5080.155000
COPY_TO 30 6 8 5189.124000
COPY_TO 30 6 9 5172.174000
COPY_TO 30 6 10 5100.780000
COPY_FROM 30 6 1 4211.710000
COPY_FROM 30 6 2 4088.827000
COPY_FROM 30 6 3 4140.018000
COPY_FROM 30 6 4 4200.005000
COPY_FROM 30 6 5 4083.156000
COPY_FROM 30 6 6 4142.306000
COPY_FROM 30 6 7 4302.596000
COPY_FROM 30 6 8 4166.638000
COPY_FROM 30 6 9 4063.275000
COPY_FROM 30 6 10 3989.077000
COPY_TO 30 7 1 5985.682000
COPY_TO 30 7 2 5944.822000
COPY_TO 30 7 3 5909.677000
COPY_TO 30 7 4 5959.397000
COPY_TO 30 7 5 5973.909000
COPY_TO 30 7 6 5971.125000
COPY_TO 30 7 7 5970.800000
COPY_TO 30 7 8 5928.120000
COPY_TO 30 7 9 6065.392000
COPY_TO 30 7 10 5967.311000
COPY_FROM 30 7 1 4832.597000
COPY_FROM 30 7 2 4763.587000
COPY_FROM 30 7 3 5007.212000
COPY_FROM 30 7 4 4831.589000
COPY_FROM 30 7 5 4761.464000
COPY_FROM 30 7 6 4964.790000
COPY_FROM 30 7 7 4911.089000
COPY_FROM 30 7 8 4804.915000
COPY_FROM 30 7 9 4830.199000
COPY_FROM 30 7 10 4821.159000
COPY_TO 30 8 1 6780.338000
COPY_TO 30 8 2 6780.465000
COPY_TO 30 8 3 6891.504000
COPY_TO 30 8 4 6924.545000
COPY_TO 30 8 5 6887.753000
COPY_TO 30 8 6 6667.140000
COPY_TO 30 8 7 6766.440000
COPY_TO 30 8 8 6847.607000
COPY_TO 30 8 9 6949.330000
COPY_TO 30 8 10 6807.099000
COPY_FROM 30 8 1 5408.566000
COPY_FROM 30 8 2 5430.909000
COPY_FROM 30 8 3 5413.220000
COPY_FROM 30 8 4 5426.873000
COPY_FROM 30 8 5 5471.004000
COPY_FROM 30 8 6 5454.879000
COPY_FROM 30 8 7 5467.374000
COPY_FROM 30 8 8 5463.669000
COPY_FROM 30 8 9 5382.302000
COPY_FROM 30 8 10 5430.827000
COPY_TO 30 9 1 7646.096000
COPY_TO 30 9 2 7663.106000
COPY_TO 30 9 3 7649.568000
COPY_TO 30 9 4 7582.509000
COPY_TO 30 9 5 7677.910000
COPY_TO 30 9 6 7649.933000
COPY_TO 30 9 7 7639.381000
COPY_TO 30 9 8 7628.082000
COPY_TO 30 9 9 7742.443000
COPY_TO 30 9 10 7749.198000
COPY_FROM 30 9 1 6254.021000
COPY_FROM 30 9 2 6189.310000
COPY_FROM 30 9 3 6080.114000
COPY_FROM 30 9 4 6117.857000
COPY_FROM 30 9 5 6120.318000
COPY_FROM 30 9 6 6131.465000
COPY_FROM 30 9 7 6119.603000
COPY_FROM 30 9 8 6132.356000
COPY_FROM 30 9 9 6217.884000
COPY_FROM 30 9 10 6169.986000
COPY_TO 30 10 1 8511.796000
COPY_TO 30 10 2 8541.021000
COPY_TO 30 10 3 8470.991000
COPY_TO 30 10 4 8429.901000
COPY_TO 30 10 5 8399.581000
COPY_TO 30 10 6 8449.127000
COPY_TO 30 10 7 8421.535000
COPY_TO 30 10 8 8409.578000
COPY_TO 30 10 9 8588.901000
COPY_TO 30 10 10 8615.748000
COPY_FROM 30 10 1 6838.794000
COPY_FROM 30 10 2 6835.900000
COPY_FROM 30 10 3 6685.443000
COPY_FROM 30 10 4 6878.933000
COPY_FROM 30 10 5 6862.674000
COPY_FROM 30 10 6 6709.240000
COPY_FROM 30 10 7 6805.730000
COPY_FROM 30 10 8 6793.489000
COPY_FROM 30 10 9 6638.819000
COPY_FROM 30 10 10 6852.015000
TO    5    1    100.56%    218.376000    219.609000
FROM    5    1    113.33%    168.493000    190.954000
TO    5    2    100.92%    421.387000    425.265000
FROM    5    2    115.55%    317.101000    366.403000
TO    5    3    101.80%    624.457000    635.709000
FROM    5    3    115.15%    468.651000    539.630000
TO    5    4    99.53%    845.936000    841.990000
FROM    5    4    115.26%    617.653000    711.922000
TO    5    5    100.60%    1037.773000    1044.045000
FROM    5    5    116.46%    767.966000    894.377000
TO    5    6    100.93%    1254.507000    1266.219000
FROM    5    6    115.60%    920.494000    1064.119000
TO    5    7    100.67%    1474.119000    1483.944000
FROM    5    7    114.04%    1079.762000    1231.400000
TO    5    8    99.77%    1690.789000    1686.910000
FROM    5    8    114.03%    1245.100000    1419.742000
TO    5    9    100.40%    1866.939000    1874.485000
FROM    5    9    115.12%    1371.727000    1579.066000
TO    5    10    100.15%    2092.245000    2095.472000
FROM    5    10    115.91%    1508.160000    1748.130000
TO    10    1    98.62%    353.087000    348.214000
FROM    10    1    118.65%    260.551000    309.133000
TO    10    2    97.77%    696.468000    680.964000
FROM    10    2    117.55%    507.076000    596.066000
TO    10    3    98.57%    1034.388000    1019.610000
FROM    10    3    118.70%    747.307000    887.084000
TO    10    4    97.77%    1391.879000    1360.787000
FROM    10    4    119.64%    988.250000    1182.343000
TO    10    5    96.89%    1724.061000    1670.427000
FROM    10    5    119.92%    1224.098000    1467.941000
TO    10    6    98.43%    2059.930000    2027.488000
FROM    10    6    119.10%    1470.005000    1750.763000
TO    10    7    98.50%    2409.333000    2373.267000
FROM    10    7    119.12%    1723.536000    2053.141000
TO    10    8    97.51%    2761.445000    2692.732000
FROM    10    8    118.76%    1960.546000    2328.340000
TO    10    9    98.34%    3100.206000    3048.751000
FROM    10    9    119.07%    2214.820000    2637.134000
TO    10    10    98.70%    3444.291000    3399.538000
FROM    10    10    118.79%    2462.314000    2924.866000
TO    15    1    97.71%    492.082000    480.802000
FROM    15    1    115.59%    347.820000    402.033000
TO    15    2    98.20%    963.658000    946.342000
FROM    15    2    115.79%    671.073000    777.008000
TO    15    3    97.90%    1456.382000    1425.784000
FROM    15    3    115.27%    1010.479000    1164.792000
TO    15    4    96.85%    1933.560000    1872.650000
FROM    15    4    113.92%    1340.700000    1527.390000
TO    15    5    98.32%    2402.419000    2362.140000
FROM    15    5    115.48%    1657.594000    1914.245000
TO    15    6    97.39%    2901.545000    2825.865000
FROM    15    6    116.00%    1989.522000    2307.933000
TO    15    7    97.47%    3359.085000    3273.990000
FROM    15    7    116.48%    2301.570000    2680.944000
TO    15    8    97.82%    3844.652000    3760.802000
FROM    15    8    114.43%    2664.116000    3048.673000
TO    15    9    97.71%    4308.416000    4209.894000
FROM    15    9    116.96%    2976.833000    3481.796000
TO    15    10    96.91%    4830.319000    4681.145000
FROM    15    10    115.09%    3304.798000    3803.542000
TO    20    1    96.05%    629.828000    604.939000
FROM    20    1    118.50%    438.673000    519.839000
TO    20    2    98.35%    1224.716000    1204.486000
FROM    20    2    112.61%    867.634000    977.050000
TO    20    3    97.96%    1858.945000    1820.944000
FROM    20    3    115.08%    1277.634000    1470.260000
TO    20    4    99.05%    2444.051000    2420.785000
FROM    20    4    116.54%    1692.007000    1971.847000
TO    20    5    97.15%    3084.210000    2996.331000
FROM    20    5    115.35%    2110.909000    2435.032000
TO    20    6    96.52%    3700.704000    3572.048000
FROM    20    6    117.61%    2529.492000    2975.037000
TO    20    7    96.11%    4320.033000    4151.977000
FROM    20    7    116.20%    2940.254000    3416.462000
TO    20    8    97.94%    4863.534000    4763.503000
FROM    20    8    115.93%    3374.520000    3911.984000
TO    20    9    97.82%    5463.960000    5345.060000
FROM    20    9    115.69%    3770.921000    4362.497000
TO    20    10    99.14%    6043.915000    5991.804000
FROM    20    10    116.88%    4191.494000    4898.844000
TO    25    1    98.29%    764.779000    751.684000
FROM    25    1    115.13%    519.686000    598.301000
TO    25    2    96.77%    1515.332000    1466.388000
FROM    25    2    116.70%    1018.943000    1189.075000
TO    25    3    94.74%    2292.456000    2171.897000
FROM    25    3    117.49%    1524.962000    1791.698000
TO    25    4    96.88%    3052.605000    2957.416000
FROM    25    4    117.70%    2008.544000    2364.110000
TO    25    5    94.08%    3843.996000    3616.614000
FROM    25    5    115.62%    2554.008000    2952.928000
TO    25    6    95.21%    4563.316000    4344.686000
FROM    25    6    118.56%    2990.859000    3545.870000
TO    25    7    95.55%    5328.781000    5091.543000
FROM    25    7    116.33%    3521.010000    4095.923000
TO    25    8    95.79%    6073.973000    5818.114000
FROM    25    8    116.83%    4009.777000    4684.806000
TO    25    9    95.80%    6787.185000    6501.837000
FROM    25    9    116.23%    4502.731000    5233.673000
TO    25    10    97.41%    7504.865000    7310.549000
FROM    25    10    117.25%    4994.463000    5856.029000
TO    30    1    94.39%    906.324000    855.503000
FROM    30    1    119.60%    604.110000    722.491000
TO    30    2    95.56%    1799.114000    1719.146000
FROM    30    2    117.45%    1188.794000    1396.228000
TO    30    3    95.76%    2662.493000    2549.615000
FROM    30    3    117.43%    1754.809000    2060.673000
TO    30    4    96.52%    3549.913000    3426.406000
FROM    30    4    117.23%    2354.055000    2759.740000
TO    30    5    96.50%    4419.907000    4265.417000
FROM    30    5    116.97%    2932.883000    3430.556000
TO    30    6    94.76%    5382.615000    5100.780000
FROM    30    6    117.88%    3514.108000    4142.306000
TO    30    7    95.52%    6250.630000    5970.800000
FROM    30    7    117.46%    4113.331000    4831.589000
TO    30    8    95.62%    7161.077000    6847.607000
FROM    30    8    116.31%    4669.370000    5430.909000
TO    30    9    95.07%    8046.895000    7649.933000
FROM    30    9    117.15%    5234.632000    6132.356000
TO    30    10    94.39%    8974.878000    8470.991000
FROM    30    10    117.84%    5800.793000    6835.900000
#!/usr/bin/env ruby
from = File.open(ARGV[0])
to = File.open(ARGV[1])
loop do
  from_line = from.gets
  to_line = to.gets
  break if from_line.nil?
  break if to_line.nil?
  from_type, from_n_columns, from_n_rows, from_runtime = from_line.split
  to_type, to_n_columns, to_n_rows, to_runtime = to_line.split
  break if from_type != to_type
  break if from_n_columns != to_n_columns
  break if from_n_rows != to_n_rows
  runtime_diff = to_runtime.to_f/ from_runtime.to_f
  puts("%s\t%s\t%s\t%5.2f%%\t%s\t%s" % [
         from_type[5..-1],
         from_n_columns,
         from_n_rows,
         runtime_diff * 100,
         from_runtime,
         to_runtime,
       ])
end
#!/usr/bin/env ruby
statistics = {}
ARGF.each_line do |line|
  type, n_columns, n_rows, round, runtime = line.split
  statistics[[type, n_columns, n_rows]] ||= []
  statistics[[type, n_columns, n_rows]] << runtime
end
require "pp"
statistics.each do |(type, n_columns, n_rows), runtimes|
  runtime_median = runtimes.sort[runtimes.size / 2]
  puts("#{type}\t#{n_columns}\t#{n_rows}\t#{runtime_median}")
end
From 15505e1e2ed59644a567e40a7e8987823b49978d Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Mon, 4 Mar 2024 13:52:34 +0900
Subject: [PATCH v19 1/5] Add CopyFromRoutine/CopyToRountine
They are for implementing custom COPY FROM/TO format. But this is not
enough to implement custom COPY FROM/TO format yet. We'll export some
APIs to receive/send data and add "format" option to COPY FROM/TO
later.
Existing text/csv/binary format implementations don't use
CopyFromRoutine/CopyToRoutine for now. We have a patch for it but we
defer it. Because there are some mysterious profile results in spite
of we get faster runtimes. See [1] for details.
[1] https://www.postgresql.org/message-id/ZdbtQJ-p5H1_EDwE%40paquier.xyz
Note that this doesn't change existing text/csv/binary format
implementations.
---
 src/backend/commands/copyfrom.c          |  24 +++++-
 src/backend/commands/copyfromparse.c     |   5 ++
 src/backend/commands/copyto.c            |  31 ++++++-
 src/include/commands/copyapi.h           | 101 +++++++++++++++++++++++
 src/include/commands/copyfrom_internal.h |   4 +
 src/tools/pgindent/typedefs.list         |   2 +
 6 files changed, 159 insertions(+), 8 deletions(-)
 create mode 100644 src/include/commands/copyapi.h
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index ce4d62e707c..ff13b3e3592 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1618,12 +1618,22 @@ BeginCopyFrom(ParseState *pstate,
 
         /* Fetch the input function and typioparam info */
         if (cstate->opts.binary)
+        {
             getTypeBinaryInputInfo(att->atttypid,
                                    &in_func_oid, &typioparams[attnum - 1]);
+            fmgr_info(in_func_oid, &in_functions[attnum - 1]);
+        }
+        else if (cstate->routine)
+            cstate->routine->CopyFromInFunc(cstate, att->atttypid,
+                                            &in_functions[attnum - 1],
+                                            &typioparams[attnum - 1]);
+
         else
+        {
             getTypeInputInfo(att->atttypid,
                              &in_func_oid, &typioparams[attnum - 1]);
-        fmgr_info(in_func_oid, &in_functions[attnum - 1]);
+            fmgr_info(in_func_oid, &in_functions[attnum - 1]);
+        }
 
         /* Get default info if available */
         defexprs[attnum - 1] = NULL;
@@ -1763,10 +1773,13 @@ BeginCopyFrom(ParseState *pstate,
         /* Read and verify binary header */
         ReceiveCopyBinaryHeader(cstate);
     }
-
-    /* create workspace for CopyReadAttributes results */
-    if (!cstate->opts.binary)
+    else if (cstate->routine)
     {
+        cstate->routine->CopyFromStart(cstate, tupDesc);
+    }
+    else
+    {
+        /* create workspace for CopyReadAttributes results */
         AttrNumber    attr_count = list_length(cstate->attnumlist);
 
         cstate->max_fields = attr_count;
@@ -1784,6 +1797,9 @@ BeginCopyFrom(ParseState *pstate,
 void
 EndCopyFrom(CopyFromState cstate)
 {
+    if (cstate->routine)
+        cstate->routine->CopyFromEnd(cstate);
+
     /* No COPY FROM related resources except memory. */
     if (cstate->is_program)
     {
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 7efcb891598..92b8d5e72d5 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -1012,6 +1012,11 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 
         Assert(fieldno == attr_count);
     }
+    else if (cstate->routine)
+    {
+        if (!cstate->routine->CopyFromOneRow(cstate, econtext, values, nulls))
+            return false;
+    }
     else
     {
         /* binary */
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index ae8b2e36d72..ff19c457abf 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -20,6 +20,7 @@
 
 #include "access/tableam.h"
 #include "commands/copy.h"
+#include "commands/copyapi.h"
 #include "commands/progress.h"
 #include "executor/execdesc.h"
 #include "executor/executor.h"
@@ -64,6 +65,9 @@ typedef enum CopyDest
  */
 typedef struct CopyToStateData
 {
+    /* format routine */
+    const CopyToRoutine *routine;
+
     /* low-level state data */
     CopyDest    copy_dest;        /* type of copy source/destination */
     FILE       *copy_file;        /* used if copy_dest == COPY_FILE */
@@ -771,14 +775,22 @@ DoCopyTo(CopyToState cstate)
         Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
 
         if (cstate->opts.binary)
+        {
             getTypeBinaryOutputInfo(attr->atttypid,
                                     &out_func_oid,
                                     &isvarlena);
+            fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
+        }
+        else if (cstate->routine)
+            cstate->routine->CopyToOutFunc(cstate, attr->atttypid,
+                                           &cstate->out_functions[attnum - 1]);
         else
+        {
             getTypeOutputInfo(attr->atttypid,
                               &out_func_oid,
                               &isvarlena);
-        fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
+            fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
+        }
     }
 
     /*
@@ -805,6 +817,8 @@ DoCopyTo(CopyToState cstate)
         tmp = 0;
         CopySendInt32(cstate, tmp);
     }
+    else if (cstate->routine)
+        cstate->routine->CopyToStart(cstate, tupDesc);
     else
     {
         /*
@@ -886,6 +900,8 @@ DoCopyTo(CopyToState cstate)
         /* Need to flush out the trailer */
         CopySendEndOfRow(cstate);
     }
+    else if (cstate->routine)
+        cstate->routine->CopyToEnd(cstate);
 
     MemoryContextDelete(cstate->rowcontext);
 
@@ -910,15 +926,22 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
     MemoryContextReset(cstate->rowcontext);
     oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
 
+    /* Make sure the tuple is fully deconstructed */
+    slot_getallattrs(slot);
+
+    if (cstate->routine)
+    {
+        cstate->routine->CopyToOneRow(cstate, slot);
+        MemoryContextSwitchTo(oldcontext);
+        return;
+    }
+
     if (cstate->opts.binary)
     {
         /* Binary per-tuple header */
         CopySendInt16(cstate, list_length(cstate->attnumlist));
     }
 
-    /* Make sure the tuple is fully deconstructed */
-    slot_getallattrs(slot);
-
     foreach(cur, cstate->attnumlist)
     {
         int            attnum = lfirst_int(cur);
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
new file mode 100644
index 00000000000..d1289424c67
--- /dev/null
+++ b/src/include/commands/copyapi.h
@@ -0,0 +1,101 @@
+/*-------------------------------------------------------------------------
+ *
+ * copyapi.h
+ *      API for COPY TO/FROM handlers
+ *
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/commands/copyapi.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef COPYAPI_H
+#define COPYAPI_H
+
+#include "executor/tuptable.h"
+#include "nodes/execnodes.h"
+
+/* These are private in commands/copy[from|to].c */
+typedef struct CopyFromStateData *CopyFromState;
+typedef struct CopyToStateData *CopyToState;
+
+/*
+ * API structure for a COPY FROM format implementation.  Note this must be
+ * allocated in a server-lifetime manner, typically as a static const struct.
+ */
+typedef struct CopyFromRoutine
+{
+    /*
+     * Called when COPY FROM is started to set up the input functions
+     * associated with the relation's attributes writing to.  `finfo` can be
+     * optionally filled to provide the catalog information of the input
+     * function.  `typioparam` can be optionally filled to define the OID of
+     * the type to pass to the input function.  `atttypid` is the OID of data
+     * type used by the relation's attribute.
+     */
+    void        (*CopyFromInFunc) (CopyFromState cstate, Oid atttypid,
+                                   FmgrInfo *finfo, Oid *typioparam);
+
+    /*
+     * Called when COPY FROM is started.
+     *
+     * `tupDesc` is the tuple descriptor of the relation where the data needs
+     * to be copied.  This can be used for any initialization steps required
+     * by a format.
+     */
+    void        (*CopyFromStart) (CopyFromState cstate, TupleDesc tupDesc);
+
+    /*
+     * Copy one row to a set of `values` and `nulls` of size tupDesc->natts.
+     *
+     * 'econtext' is used to evaluate default expression for each column that
+     * is either not read from the file or is using the DEFAULT option of COPY
+     * FROM.  It is NULL if no default values are used.
+     *
+     * Returns false if there are no more tuples to copy.
+     */
+    bool        (*CopyFromOneRow) (CopyFromState cstate, ExprContext *econtext,
+                                   Datum *values, bool *nulls);
+
+    /* Called when COPY FROM has ended. */
+    void        (*CopyFromEnd) (CopyFromState cstate);
+} CopyFromRoutine;
+
+/*
+ * API structure for a COPY TO format implementation.   Note this must be
+ * allocated in a server-lifetime manner, typically as a static const struct.
+ */
+typedef struct CopyToRoutine
+{
+    /*
+     * Called when COPY TO is started to set up the output functions
+     * associated with the relation's attributes reading from.  `finfo` can be
+     * optionally filled to provide the catalog information of the output
+     * function.  `atttypid` is the OID of data type used by the relation's
+     * attribute.
+     */
+    void        (*CopyToOutFunc) (CopyToState cstate, Oid atttypid,
+                                  FmgrInfo *finfo);
+
+    /*
+     * Called when COPY TO is started.
+     *
+     * `tupDesc` is the tuple descriptor of the relation from where the data
+     * is read.
+     */
+    void        (*CopyToStart) (CopyToState cstate, TupleDesc tupDesc);
+
+    /*
+     * Copy one row for COPY TO.
+     *
+     * `slot` is the tuple slot where the data is emitted.
+     */
+    void        (*CopyToOneRow) (CopyToState cstate, TupleTableSlot *slot);
+
+    /* Called when COPY TO has ended */
+    void        (*CopyToEnd) (CopyToState cstate);
+} CopyToRoutine;
+
+#endif                            /* COPYAPI_H */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index cad52fcc783..509b9e92a18 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -15,6 +15,7 @@
 #define COPYFROM_INTERNAL_H
 
 #include "commands/copy.h"
+#include "commands/copyapi.h"
 #include "commands/trigger.h"
 #include "nodes/miscnodes.h"
 
@@ -58,6 +59,9 @@ typedef enum CopyInsertMethod
  */
 typedef struct CopyFromStateData
 {
+    /* format routine */
+    const CopyFromRoutine *routine;
+
     /* low-level state data */
     CopySource    copy_src;        /* type of copy source */
     FILE       *copy_file;        /* used if copy_src == COPY_FILE */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b4d7f9217ce..3ce855c8f17 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -490,6 +490,7 @@ ConvertRowtypeExpr
 CookedConstraint
 CopyDest
 CopyFormatOptions
+CopyFromRoutine
 CopyFromState
 CopyFromStateData
 CopyHeaderChoice
@@ -501,6 +502,7 @@ CopyMultiInsertInfo
 CopyOnErrorChoice
 CopySource
 CopyStmt
+CopyToRoutine
 CopyToState
 CopyToStateData
 Cost
-- 
2.45.2
From b26d39f09dc78b3338aa4527d43cd0468f5ddf69 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Tue, 23 Jul 2024 16:44:44 +0900
Subject: [PATCH v19 2/5] Use CopyFromRoutine/CopyToRountine for the existing
 formats
The existing formats are text, csv and binary. If we find any
performance regression by this, we will not merge this to master.
This will increase indirect function call costs but this will reduce
runtime "if (cstate->opts.binary)" and "if (cstate->opts.csv_mode)"
branch costs.
This uses an optimization based of static inline function and a
constant argument call for cstate->opts.csv_mode. For example,
CopyFromTextLikeOneRow() uses this optimization. It accepts the "bool
is_csv" argument instead of using cstate->opts.csv_mode in
it. CopyFromTextOneRow() calls CopyFromTextLikeOneRow() with
false (constant) for "bool is_csv". Compiler will remove "if (is_csv)"
branch in it by this optimization.
This doesn't change existing logic. This just moves existing codes.
---
 src/backend/commands/copyfrom.c          | 215 ++++++---
 src/backend/commands/copyfromparse.c     | 556 +++++++++++++----------
 src/backend/commands/copyto.c            | 480 ++++++++++++-------
 src/include/commands/copy.h              |   2 -
 src/include/commands/copyfrom_internal.h |   8 +
 5 files changed, 813 insertions(+), 448 deletions(-)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index ff13b3e3592..1a59202f5ab 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -103,6 +103,157 @@ typedef struct CopyMultiInsertInfo
 /* non-export function prototypes */
 static void ClosePipeFromProgram(CopyFromState cstate);
 
+
+/*
+ * CopyFromRoutine implementations for text and CSV.
+ */
+
+/*
+ * CopyFromTextLikeInFunc
+ *
+ * Assign input function data for a relation's attribute in text/CSV format.
+ */
+static void
+CopyFromTextLikeInFunc(CopyFromState cstate, Oid atttypid,
+                       FmgrInfo *finfo, Oid *typioparam)
+{
+    Oid            func_oid;
+
+    getTypeInputInfo(atttypid, &func_oid, typioparam);
+    fmgr_info(func_oid, finfo);
+}
+
+/*
+ * CopyFromTextLikeStart
+ *
+ * Start of COPY FROM for text/CSV format.
+ */
+static void
+CopyFromTextLikeStart(CopyFromState cstate, TupleDesc tupDesc)
+{
+    AttrNumber    attr_count;
+
+    /*
+     * If encoding conversion is needed, we need another buffer to hold the
+     * converted input data.  Otherwise, we can just point input_buf to the
+     * same buffer as raw_buf.
+     */
+    if (cstate->need_transcoding)
+    {
+        cstate->input_buf = (char *) palloc(INPUT_BUF_SIZE + 1);
+        cstate->input_buf_index = cstate->input_buf_len = 0;
+    }
+    else
+        cstate->input_buf = cstate->raw_buf;
+    cstate->input_reached_eof = false;
+
+    initStringInfo(&cstate->line_buf);
+
+    /*
+     * Create workspace for CopyReadAttributes results; used by CSV and text
+     * format.
+     */
+    attr_count = list_length(cstate->attnumlist);
+    cstate->max_fields = attr_count;
+    cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *));
+}
+
+/*
+ * CopyFromTextLikeEnd
+ *
+ * End of COPY FROM for text/CSV format.
+ */
+static void
+CopyFromTextLikeEnd(CopyFromState cstate)
+{
+    /* nothing to do */
+}
+
+/*
+ * CopyFromRoutine implementation for "binary".
+ */
+
+/*
+ * CopyFromBinaryInFunc
+ *
+ * Assign input function data for a relation's attribute in binary format.
+ */
+static void
+CopyFromBinaryInFunc(CopyFromState cstate, Oid atttypid,
+                     FmgrInfo *finfo, Oid *typioparam)
+{
+    Oid            func_oid;
+
+    getTypeBinaryInputInfo(atttypid, &func_oid, typioparam);
+    fmgr_info(func_oid, finfo);
+}
+
+/*
+ * CopyFromBinaryStart
+ *
+ * Start of COPY FROM for binary format.
+ */
+static void
+CopyFromBinaryStart(CopyFromState cstate, TupleDesc tupDesc)
+{
+    /* Read and verify binary header */
+    ReceiveCopyBinaryHeader(cstate);
+}
+
+/*
+ * CopyFromBinaryEnd
+ *
+ * End of COPY FROM for binary format.
+ */
+static void
+CopyFromBinaryEnd(CopyFromState cstate)
+{
+    /* nothing to do */
+}
+
+/*
+ * Routines assigned to each format.
++
+ * CSV and text share the same implementation, at the exception of the
+ * per-row callback.
+ */
+static const CopyFromRoutine CopyFromRoutineText = {
+    .CopyFromInFunc = CopyFromTextLikeInFunc,
+    .CopyFromStart = CopyFromTextLikeStart,
+    .CopyFromOneRow = CopyFromTextOneRow,
+    .CopyFromEnd = CopyFromTextLikeEnd,
+};
+
+static const CopyFromRoutine CopyFromRoutineCSV = {
+    .CopyFromInFunc = CopyFromTextLikeInFunc,
+    .CopyFromStart = CopyFromTextLikeStart,
+    .CopyFromOneRow = CopyFromCSVOneRow,
+    .CopyFromEnd = CopyFromTextLikeEnd,
+};
+
+static const CopyFromRoutine CopyFromRoutineBinary = {
+    .CopyFromInFunc = CopyFromBinaryInFunc,
+    .CopyFromStart = CopyFromBinaryStart,
+    .CopyFromOneRow = CopyFromBinaryOneRow,
+    .CopyFromEnd = CopyFromBinaryEnd,
+};
+
+/*
+ * Define the COPY FROM routines to use for a format.
+ */
+static const CopyFromRoutine *
+CopyFromGetRoutine(CopyFormatOptions opts)
+{
+    if (opts.csv_mode)
+        return &CopyFromRoutineCSV;
+    else if (opts.binary)
+        return &CopyFromRoutineBinary;
+
+    /* default is text */
+    return &CopyFromRoutineText;
+}
+
+
 /*
  * error context callback for COPY FROM
  *
@@ -1381,7 +1532,6 @@ BeginCopyFrom(ParseState *pstate,
                 num_defaults;
     FmgrInfo   *in_functions;
     Oid           *typioparams;
-    Oid            in_func_oid;
     int           *defmap;
     ExprState **defexprs;
     MemoryContext oldcontext;
@@ -1413,6 +1563,9 @@ BeginCopyFrom(ParseState *pstate,
     /* Extract options from the statement node tree */
     ProcessCopyOptions(pstate, &cstate->opts, true /* is_from */ , options);
 
+    /* Set format routine */
+    cstate->routine = CopyFromGetRoutine(cstate->opts);
+
     /* Process the target relation */
     cstate->rel = rel;
 
@@ -1566,25 +1719,6 @@ BeginCopyFrom(ParseState *pstate,
     cstate->raw_buf_index = cstate->raw_buf_len = 0;
     cstate->raw_reached_eof = false;
 
-    if (!cstate->opts.binary)
-    {
-        /*
-         * If encoding conversion is needed, we need another buffer to hold
-         * the converted input data.  Otherwise, we can just point input_buf
-         * to the same buffer as raw_buf.
-         */
-        if (cstate->need_transcoding)
-        {
-            cstate->input_buf = (char *) palloc(INPUT_BUF_SIZE + 1);
-            cstate->input_buf_index = cstate->input_buf_len = 0;
-        }
-        else
-            cstate->input_buf = cstate->raw_buf;
-        cstate->input_reached_eof = false;
-
-        initStringInfo(&cstate->line_buf);
-    }
-
     initStringInfo(&cstate->attribute_buf);
 
     /* Assign range table and rteperminfos, we'll need them in CopyFrom. */
@@ -1617,23 +1751,9 @@ BeginCopyFrom(ParseState *pstate,
             continue;
 
         /* Fetch the input function and typioparam info */
-        if (cstate->opts.binary)
-        {
-            getTypeBinaryInputInfo(att->atttypid,
-                                   &in_func_oid, &typioparams[attnum - 1]);
-            fmgr_info(in_func_oid, &in_functions[attnum - 1]);
-        }
-        else if (cstate->routine)
-            cstate->routine->CopyFromInFunc(cstate, att->atttypid,
-                                            &in_functions[attnum - 1],
-                                            &typioparams[attnum - 1]);
-
-        else
-        {
-            getTypeInputInfo(att->atttypid,
-                             &in_func_oid, &typioparams[attnum - 1]);
-            fmgr_info(in_func_oid, &in_functions[attnum - 1]);
-        }
+        cstate->routine->CopyFromInFunc(cstate, att->atttypid,
+                                        &in_functions[attnum - 1],
+                                        &typioparams[attnum - 1]);
 
         /* Get default info if available */
         defexprs[attnum - 1] = NULL;
@@ -1768,23 +1888,7 @@ BeginCopyFrom(ParseState *pstate,
 
     pgstat_progress_update_multi_param(3, progress_cols, progress_vals);
 
-    if (cstate->opts.binary)
-    {
-        /* Read and verify binary header */
-        ReceiveCopyBinaryHeader(cstate);
-    }
-    else if (cstate->routine)
-    {
-        cstate->routine->CopyFromStart(cstate, tupDesc);
-    }
-    else
-    {
-        /* create workspace for CopyReadAttributes results */
-        AttrNumber    attr_count = list_length(cstate->attnumlist);
-
-        cstate->max_fields = attr_count;
-        cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *));
-    }
+    cstate->routine->CopyFromStart(cstate, tupDesc);
 
     MemoryContextSwitchTo(oldcontext);
 
@@ -1797,8 +1901,7 @@ BeginCopyFrom(ParseState *pstate,
 void
 EndCopyFrom(CopyFromState cstate)
 {
-    if (cstate->routine)
-        cstate->routine->CopyFromEnd(cstate);
+    cstate->routine->CopyFromEnd(cstate);
 
     /* No COPY FROM related resources except memory. */
     if (cstate->is_program)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 92b8d5e72d5..90824b47785 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -149,10 +149,10 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 
 
 /* non-export function prototypes */
-static bool CopyReadLine(CopyFromState cstate);
-static bool CopyReadLineText(CopyFromState cstate);
-static int    CopyReadAttributesText(CopyFromState cstate);
-static int    CopyReadAttributesCSV(CopyFromState cstate);
+static inline bool CopyReadLine(CopyFromState cstate, bool is_csv);
+static inline bool CopyReadLineText(CopyFromState cstate, bool is_csv);
+static inline int CopyReadAttributesText(CopyFromState cstate);
+static inline int CopyReadAttributesCSV(CopyFromState cstate);
 static Datum CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
                                      Oid typioparam, int32 typmod,
                                      bool *isnull);
@@ -750,8 +750,8 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes)
  *
  * NOTE: force_not_null option are not applied to the returned fields.
  */
-bool
-NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
+static inline bool
+NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields, bool is_csv)
 {
     int            fldct;
     bool        done;
@@ -768,13 +768,17 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
         tupDesc = RelationGetDescr(cstate->rel);
 
         cstate->cur_lineno++;
-        done = CopyReadLine(cstate);
+        done = CopyReadLine(cstate, is_csv);
 
         if (cstate->opts.header_line == COPY_HEADER_MATCH)
         {
             int            fldnum;
 
-            if (cstate->opts.csv_mode)
+            /*
+             * is_csv will be optimized away by compiler, as argument is
+             * constant at caller.
+             */
+            if (is_csv)
                 fldct = CopyReadAttributesCSV(cstate);
             else
                 fldct = CopyReadAttributesText(cstate);
@@ -818,7 +822,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
     cstate->cur_lineno++;
 
     /* Actually read the line into memory here */
-    done = CopyReadLine(cstate);
+    done = CopyReadLine(cstate, is_csv);
 
     /*
      * EOF at start of line means we're done.  If we see EOF after some
@@ -828,8 +832,13 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
     if (done && cstate->line_buf.len == 0)
         return false;
 
-    /* Parse the line into de-escaped field values */
-    if (cstate->opts.csv_mode)
+    /*
+     * Parse the line into de-escaped field values
+     *
+     * is_csv will be optimized away by compiler, as argument is constant at
+     * caller.
+     */
+    if (is_csv)
         fldct = CopyReadAttributesCSV(cstate);
     else
         fldct = CopyReadAttributesText(cstate);
@@ -839,6 +848,267 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
     return true;
 }
 
+/*
+ * CopyFromTextLikeOneRow
+ *
+ * Copy one row to a set of `values` and `nulls` for the text and CSV
+ * formats.
+ *
+ * Workhorse for CopyFromTextOneRow() and CopyFromCSVOneRow().
+ */
+static inline bool
+CopyFromTextLikeOneRow(CopyFromState cstate,
+                       ExprContext *econtext,
+                       Datum *values,
+                       bool *nulls,
+                       bool is_csv)
+{
+    TupleDesc    tupDesc;
+    AttrNumber    attr_count;
+    FmgrInfo   *in_functions = cstate->in_functions;
+    Oid           *typioparams = cstate->typioparams;
+    ExprState **defexprs = cstate->defexprs;
+    char      **field_strings;
+    ListCell   *cur;
+    int            fldct;
+    int            fieldno;
+    char       *string;
+
+    tupDesc = RelationGetDescr(cstate->rel);
+    attr_count = list_length(cstate->attnumlist);
+
+    /* read raw fields in the next line */
+    if (!NextCopyFromRawFields(cstate, &field_strings, &fldct, is_csv))
+        return false;
+
+    /* check for overflowing fields */
+    if (attr_count > 0 && fldct > attr_count)
+        ereport(ERROR,
+                (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                 errmsg("extra data after last expected column")));
+
+    fieldno = 0;
+
+    /* Loop to read the user attributes on the line. */
+    foreach(cur, cstate->attnumlist)
+    {
+        int            attnum = lfirst_int(cur);
+        int            m = attnum - 1;
+        Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+        if (fieldno >= fldct)
+            ereport(ERROR,
+                    (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                     errmsg("missing data for column \"%s\"",
+                            NameStr(att->attname))));
+        string = field_strings[fieldno++];
+
+        if (cstate->convert_select_flags &&
+            !cstate->convert_select_flags[m])
+        {
+            /* ignore input field, leaving column as NULL */
+            continue;
+        }
+
+        if (is_csv)
+        {
+            if (string == NULL &&
+                cstate->opts.force_notnull_flags[m])
+            {
+                /*
+                 * FORCE_NOT_NULL option is set and column is NULL - convert
+                 * it to the NULL string.
+                 */
+                string = cstate->opts.null_print;
+            }
+            else if (string != NULL && cstate->opts.force_null_flags[m]
+                     && strcmp(string, cstate->opts.null_print) == 0)
+            {
+                /*
+                 * FORCE_NULL option is set and column matches the NULL
+                 * string. It must have been quoted, or otherwise the string
+                 * would already have been set to NULL. Convert it to NULL as
+                 * specified.
+                 */
+                string = NULL;
+            }
+        }
+
+        cstate->cur_attname = NameStr(att->attname);
+        cstate->cur_attval = string;
+
+        if (string != NULL)
+            nulls[m] = false;
+
+        if (cstate->defaults[m])
+        {
+            /*
+             * The caller must supply econtext and have switched into the
+             * per-tuple memory context in it.
+             */
+            Assert(econtext != NULL);
+            Assert(CurrentMemoryContext == econtext->ecxt_per_tuple_memory);
+
+            values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]);
+        }
+
+        /*
+         * If ON_ERROR is specified with IGNORE, skip rows with soft errors
+         */
+        else if (!InputFunctionCallSafe(&in_functions[m],
+                                        string,
+                                        typioparams[m],
+                                        att->atttypmod,
+                                        (Node *) cstate->escontext,
+                                        &values[m]))
+        {
+            Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
+
+            cstate->num_errors++;
+
+            if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+            {
+                /*
+                 * Since we emit line number and column info in the below
+                 * notice message, we suppress error context information other
+                 * than the relation name.
+                 */
+                Assert(!cstate->relname_only);
+                cstate->relname_only = true;
+
+                if (cstate->cur_attval)
+                {
+                    char       *attval;
+
+                    attval = CopyLimitPrintoutLength(cstate->cur_attval);
+                    ereport(NOTICE,
+                            errmsg("skipping row due to data type incompatibility at line %llu for column %s:
\"%s\"",
+                                   (unsigned long long) cstate->cur_lineno,
+                                   cstate->cur_attname,
+                                   attval));
+                    pfree(attval);
+                }
+                else
+                    ereport(NOTICE,
+                            errmsg("skipping row due to data type incompatibility at line %llu for column %s: null
input",
+                                   (unsigned long long) cstate->cur_lineno,
+                                   cstate->cur_attname));
+
+                /* reset relname_only */
+                cstate->relname_only = false;
+            }
+
+            return true;
+        }
+
+        cstate->cur_attname = NULL;
+        cstate->cur_attval = NULL;
+    }
+
+    Assert(fieldno == attr_count);
+
+    return true;
+}
+
+
+/*
+ * CopyFromTextOneRow
+ *
+ * Per-row callback for COPY FROM with text format.
+ */
+bool
+CopyFromTextOneRow(CopyFromState cstate,
+                   ExprContext *econtext,
+                   Datum *values,
+                   bool *nulls)
+{
+    return CopyFromTextLikeOneRow(cstate, econtext, values, nulls, false);
+}
+
+/*
+ * CopyFromCSVOneRow
+ *
+ * Per-row callback for COPY FROM with CSV format.
+ */
+bool
+CopyFromCSVOneRow(CopyFromState cstate,
+                  ExprContext *econtext,
+                  Datum *values,
+                  bool *nulls)
+{
+    return CopyFromTextLikeOneRow(cstate, econtext, values, nulls, true);
+}
+
+/*
+ * CopyFromBinaryOneRow
+ *
+ * Copy one row to a set of `values` and `nulls` for the binary format.
+ */
+bool
+CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext,
+                     Datum *values, bool *nulls)
+{
+    TupleDesc    tupDesc;
+    AttrNumber    attr_count;
+    FmgrInfo   *in_functions = cstate->in_functions;
+    Oid           *typioparams = cstate->typioparams;
+    int16        fld_count;
+    ListCell   *cur;
+
+    tupDesc = RelationGetDescr(cstate->rel);
+    attr_count = list_length(cstate->attnumlist);
+
+    cstate->cur_lineno++;
+
+    if (!CopyGetInt16(cstate, &fld_count))
+    {
+        /* EOF detected (end of file, or protocol-level EOF) */
+        return false;
+    }
+
+    if (fld_count == -1)
+    {
+        /*
+         * Received EOF marker.  Wait for the protocol-level EOF, and complain
+         * if it doesn't come immediately.  In COPY FROM STDIN, this ensures
+         * that we correctly handle CopyFail, if client chooses to send that
+         * now.  When copying from file, we could ignore the rest of the file
+         * like in text mode, but we choose to be consistent with the COPY
+         * FROM STDIN case.
+         */
+        char        dummy;
+
+        if (CopyReadBinaryData(cstate, &dummy, 1) > 0)
+            ereport(ERROR,
+                    (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                     errmsg("received copy data after EOF marker")));
+        return false;
+    }
+
+    if (fld_count != attr_count)
+        ereport(ERROR,
+                (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                 errmsg("row field count is %d, expected %d",
+                        (int) fld_count, attr_count)));
+
+    foreach(cur, cstate->attnumlist)
+    {
+        int            attnum = lfirst_int(cur);
+        int            m = attnum - 1;
+        Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+        cstate->cur_attname = NameStr(att->attname);
+        values[m] = CopyReadBinaryAttribute(cstate,
+                                            &in_functions[m],
+                                            typioparams[m],
+                                            att->atttypmod,
+                                            &nulls[m]);
+        cstate->cur_attname = NULL;
+    }
+
+    return true;
+}
+
 /*
  * Read next tuple from file for COPY FROM. Return false if no more tuples.
  *
@@ -856,221 +1126,21 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 {
     TupleDesc    tupDesc;
     AttrNumber    num_phys_attrs,
-                attr_count,
                 num_defaults = cstate->num_defaults;
-    FmgrInfo   *in_functions = cstate->in_functions;
-    Oid           *typioparams = cstate->typioparams;
     int            i;
     int           *defmap = cstate->defmap;
     ExprState **defexprs = cstate->defexprs;
 
     tupDesc = RelationGetDescr(cstate->rel);
     num_phys_attrs = tupDesc->natts;
-    attr_count = list_length(cstate->attnumlist);
 
     /* Initialize all values for row to NULL */
     MemSet(values, 0, num_phys_attrs * sizeof(Datum));
     MemSet(nulls, true, num_phys_attrs * sizeof(bool));
     MemSet(cstate->defaults, false, num_phys_attrs * sizeof(bool));
 
-    if (!cstate->opts.binary)
-    {
-        char      **field_strings;
-        ListCell   *cur;
-        int            fldct;
-        int            fieldno;
-        char       *string;
-
-        /* read raw fields in the next line */
-        if (!NextCopyFromRawFields(cstate, &field_strings, &fldct))
-            return false;
-
-        /* check for overflowing fields */
-        if (attr_count > 0 && fldct > attr_count)
-            ereport(ERROR,
-                    (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                     errmsg("extra data after last expected column")));
-
-        fieldno = 0;
-
-        /* Loop to read the user attributes on the line. */
-        foreach(cur, cstate->attnumlist)
-        {
-            int            attnum = lfirst_int(cur);
-            int            m = attnum - 1;
-            Form_pg_attribute att = TupleDescAttr(tupDesc, m);
-
-            if (fieldno >= fldct)
-                ereport(ERROR,
-                        (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                         errmsg("missing data for column \"%s\"",
-                                NameStr(att->attname))));
-            string = field_strings[fieldno++];
-
-            if (cstate->convert_select_flags &&
-                !cstate->convert_select_flags[m])
-            {
-                /* ignore input field, leaving column as NULL */
-                continue;
-            }
-
-            if (cstate->opts.csv_mode)
-            {
-                if (string == NULL &&
-                    cstate->opts.force_notnull_flags[m])
-                {
-                    /*
-                     * FORCE_NOT_NULL option is set and column is NULL -
-                     * convert it to the NULL string.
-                     */
-                    string = cstate->opts.null_print;
-                }
-                else if (string != NULL && cstate->opts.force_null_flags[m]
-                         && strcmp(string, cstate->opts.null_print) == 0)
-                {
-                    /*
-                     * FORCE_NULL option is set and column matches the NULL
-                     * string. It must have been quoted, or otherwise the
-                     * string would already have been set to NULL. Convert it
-                     * to NULL as specified.
-                     */
-                    string = NULL;
-                }
-            }
-
-            cstate->cur_attname = NameStr(att->attname);
-            cstate->cur_attval = string;
-
-            if (string != NULL)
-                nulls[m] = false;
-
-            if (cstate->defaults[m])
-            {
-                /*
-                 * The caller must supply econtext and have switched into the
-                 * per-tuple memory context in it.
-                 */
-                Assert(econtext != NULL);
-                Assert(CurrentMemoryContext == econtext->ecxt_per_tuple_memory);
-
-                values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]);
-            }
-
-            /*
-             * If ON_ERROR is specified with IGNORE, skip rows with soft
-             * errors
-             */
-            else if (!InputFunctionCallSafe(&in_functions[m],
-                                            string,
-                                            typioparams[m],
-                                            att->atttypmod,
-                                            (Node *) cstate->escontext,
-                                            &values[m]))
-            {
-                Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
-
-                cstate->num_errors++;
-
-                if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
-                {
-                    /*
-                     * Since we emit line number and column info in the below
-                     * notice message, we suppress error context information
-                     * other than the relation name.
-                     */
-                    Assert(!cstate->relname_only);
-                    cstate->relname_only = true;
-
-                    if (cstate->cur_attval)
-                    {
-                        char       *attval;
-
-                        attval = CopyLimitPrintoutLength(cstate->cur_attval);
-                        ereport(NOTICE,
-                                errmsg("skipping row due to data type incompatibility at line %llu for column %s:
\"%s\"",
-                                       (unsigned long long) cstate->cur_lineno,
-                                       cstate->cur_attname,
-                                       attval));
-                        pfree(attval);
-                    }
-                    else
-                        ereport(NOTICE,
-                                errmsg("skipping row due to data type incompatibility at line %llu for column %s: null
input",
-                                       (unsigned long long) cstate->cur_lineno,
-                                       cstate->cur_attname));
-
-                    /* reset relname_only */
-                    cstate->relname_only = false;
-                }
-
-                return true;
-            }
-
-            cstate->cur_attname = NULL;
-            cstate->cur_attval = NULL;
-        }
-
-        Assert(fieldno == attr_count);
-    }
-    else if (cstate->routine)
-    {
-        if (!cstate->routine->CopyFromOneRow(cstate, econtext, values, nulls))
-            return false;
-    }
-    else
-    {
-        /* binary */
-        int16        fld_count;
-        ListCell   *cur;
-
-        cstate->cur_lineno++;
-
-        if (!CopyGetInt16(cstate, &fld_count))
-        {
-            /* EOF detected (end of file, or protocol-level EOF) */
-            return false;
-        }
-
-        if (fld_count == -1)
-        {
-            /*
-             * Received EOF marker.  Wait for the protocol-level EOF, and
-             * complain if it doesn't come immediately.  In COPY FROM STDIN,
-             * this ensures that we correctly handle CopyFail, if client
-             * chooses to send that now.  When copying from file, we could
-             * ignore the rest of the file like in text mode, but we choose to
-             * be consistent with the COPY FROM STDIN case.
-             */
-            char        dummy;
-
-            if (CopyReadBinaryData(cstate, &dummy, 1) > 0)
-                ereport(ERROR,
-                        (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                         errmsg("received copy data after EOF marker")));
-            return false;
-        }
-
-        if (fld_count != attr_count)
-            ereport(ERROR,
-                    (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                     errmsg("row field count is %d, expected %d",
-                            (int) fld_count, attr_count)));
-
-        foreach(cur, cstate->attnumlist)
-        {
-            int            attnum = lfirst_int(cur);
-            int            m = attnum - 1;
-            Form_pg_attribute att = TupleDescAttr(tupDesc, m);
-
-            cstate->cur_attname = NameStr(att->attname);
-            values[m] = CopyReadBinaryAttribute(cstate,
-                                                &in_functions[m],
-                                                typioparams[m],
-                                                att->atttypmod,
-                                                &nulls[m]);
-            cstate->cur_attname = NULL;
-        }
-    }
+    if (!cstate->routine->CopyFromOneRow(cstate, econtext, values, nulls))
+        return false;
 
     /*
      * Now compute and insert any defaults available for the columns not
@@ -1100,8 +1170,8 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
  * by newline.  The terminating newline or EOF marker is not included
  * in the final value of line_buf.
  */
-static bool
-CopyReadLine(CopyFromState cstate)
+static inline bool
+CopyReadLine(CopyFromState cstate, bool is_csv)
 {
     bool        result;
 
@@ -1109,7 +1179,7 @@ CopyReadLine(CopyFromState cstate)
     cstate->line_buf_valid = false;
 
     /* Parse data and transfer into line_buf */
-    result = CopyReadLineText(cstate);
+    result = CopyReadLineText(cstate, is_csv);
 
     if (result)
     {
@@ -1176,8 +1246,8 @@ CopyReadLine(CopyFromState cstate)
 /*
  * CopyReadLineText - inner loop of CopyReadLine for text mode
  */
-static bool
-CopyReadLineText(CopyFromState cstate)
+static inline bool
+CopyReadLineText(CopyFromState cstate, bool is_csv)
 {
     char       *copy_input_buf;
     int            input_buf_ptr;
@@ -1193,7 +1263,11 @@ CopyReadLineText(CopyFromState cstate)
     char        quotec = '\0';
     char        escapec = '\0';
 
-    if (cstate->opts.csv_mode)
+    /*
+     * is_csv will be optimized away by compiler, as argument is constant at
+     * caller.
+     */
+    if (is_csv)
     {
         quotec = cstate->opts.quote[0];
         escapec = cstate->opts.escape[0];
@@ -1270,7 +1344,11 @@ CopyReadLineText(CopyFromState cstate)
         prev_raw_ptr = input_buf_ptr;
         c = copy_input_buf[input_buf_ptr++];
 
-        if (cstate->opts.csv_mode)
+        /*
+         * is_csv will be optimized away by compiler, as argument is constant
+         * at caller.
+         */
+        if (is_csv)
         {
             /*
              * If character is '\\' or '\r', we may need to look ahead below.
@@ -1309,7 +1387,7 @@ CopyReadLineText(CopyFromState cstate)
         }
 
         /* Process \r */
-        if (c == '\r' && (!cstate->opts.csv_mode || !in_quote))
+        if (c == '\r' && (!is_csv || !in_quote))
         {
             /* Check for \r\n on first line, _and_ handle \r\n. */
             if (cstate->eol_type == EOL_UNKNOWN ||
@@ -1337,10 +1415,10 @@ CopyReadLineText(CopyFromState cstate)
                     if (cstate->eol_type == EOL_CRNL)
                         ereport(ERROR,
                                 (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                                 !cstate->opts.csv_mode ?
+                                 !is_csv ?
                                  errmsg("literal carriage return found in data") :
                                  errmsg("unquoted carriage return found in data"),
-                                 !cstate->opts.csv_mode ?
+                                 !is_csv ?
                                  errhint("Use \"\\r\" to represent carriage return.") :
                                  errhint("Use quoted CSV field to represent carriage return.")));
 
@@ -1354,10 +1432,10 @@ CopyReadLineText(CopyFromState cstate)
             else if (cstate->eol_type == EOL_NL)
                 ereport(ERROR,
                         (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                         !cstate->opts.csv_mode ?
+                         !is_csv ?
                          errmsg("literal carriage return found in data") :
                          errmsg("unquoted carriage return found in data"),
-                         !cstate->opts.csv_mode ?
+                         !is_csv ?
                          errhint("Use \"\\r\" to represent carriage return.") :
                          errhint("Use quoted CSV field to represent carriage return.")));
             /* If reach here, we have found the line terminator */
@@ -1365,15 +1443,15 @@ CopyReadLineText(CopyFromState cstate)
         }
 
         /* Process \n */
-        if (c == '\n' && (!cstate->opts.csv_mode || !in_quote))
+        if (c == '\n' && (!is_csv || !in_quote))
         {
             if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL)
                 ereport(ERROR,
                         (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                         !cstate->opts.csv_mode ?
+                         !is_csv ?
                          errmsg("literal newline found in data") :
                          errmsg("unquoted newline found in data"),
-                         !cstate->opts.csv_mode ?
+                         !is_csv ?
                          errhint("Use \"\\n\" to represent newline.") :
                          errhint("Use quoted CSV field to represent newline.")));
             cstate->eol_type = EOL_NL;    /* in case not set yet */
@@ -1385,7 +1463,7 @@ CopyReadLineText(CopyFromState cstate)
          * In CSV mode, we only recognize \. alone on a line.  This is because
          * \. is a valid CSV data value.
          */
-        if (c == '\\' && (!cstate->opts.csv_mode || first_char_in_line))
+        if (c == '\\' && (!is_csv || first_char_in_line))
         {
             char        c2;
 
@@ -1418,7 +1496,11 @@ CopyReadLineText(CopyFromState cstate)
 
                     if (c2 == '\n')
                     {
-                        if (!cstate->opts.csv_mode)
+                        /*
+                         * is_csv will be optimized away by compiler, as
+                         * argument is constant at caller.
+                         */
+                        if (!is_csv)
                             ereport(ERROR,
                                     (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
                                      errmsg("end-of-copy marker does not match previous newline style")));
@@ -1427,7 +1509,11 @@ CopyReadLineText(CopyFromState cstate)
                     }
                     else if (c2 != '\r')
                     {
-                        if (!cstate->opts.csv_mode)
+                        /*
+                         * is_csv will be optimized away by compiler, as
+                         * argument is constant at caller.
+                         */
+                        if (!is_csv)
                             ereport(ERROR,
                                     (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
                                      errmsg("end-of-copy marker corrupt")));
@@ -1443,7 +1529,11 @@ CopyReadLineText(CopyFromState cstate)
 
                 if (c2 != '\r' && c2 != '\n')
                 {
-                    if (!cstate->opts.csv_mode)
+                    /*
+                     * is_csv will be optimized away by compiler, as argument
+                     * is constant at caller.
+                     */
+                    if (!is_csv)
                         ereport(ERROR,
                                 (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
                                  errmsg("end-of-copy marker corrupt")));
@@ -1472,7 +1562,7 @@ CopyReadLineText(CopyFromState cstate)
                 result = true;    /* report EOF */
                 break;
             }
-            else if (!cstate->opts.csv_mode)
+            else if (!is_csv)
             {
                 /*
                  * If we are here, it means we found a backslash followed by
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index ff19c457abf..c7f69ba606d 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -128,6 +128,321 @@ static void CopySendEndOfRow(CopyToState cstate);
 static void CopySendInt32(CopyToState cstate, int32 val);
 static void CopySendInt16(CopyToState cstate, int16 val);
 
+/*
+ * CopyToRoutine implementations.
+ */
+
+/*
+ * CopyToTextLikeSendEndOfRow
+ *
+ * Apply line terminations for a line sent in text or CSV format depending
+ * on the destination, then send the end of a row.
+ */
+static inline void
+CopyToTextLikeSendEndOfRow(CopyToState cstate)
+{
+    switch (cstate->copy_dest)
+    {
+        case COPY_FILE:
+            /* Default line termination depends on platform */
+#ifndef WIN32
+            CopySendChar(cstate, '\n');
+#else
+            CopySendString(cstate, "\r\n");
+#endif
+            break;
+        case COPY_FRONTEND:
+            /* The FE/BE protocol uses \n as newline for all platforms */
+            CopySendChar(cstate, '\n');
+            break;
+        default:
+            break;
+    }
+
+    /* Now take the actions related to the end of a row */
+    CopySendEndOfRow(cstate);
+}
+
+/*
+ * CopyToTextLikeStart
+ *
+ * Start of COPY TO for text and CSV format.
+ */
+static void
+CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
+{
+    /*
+     * For non-binary copy, we need to convert null_print to file encoding,
+     * because it will be sent directly with CopySendString.
+     */
+    if (cstate->need_transcoding)
+        cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print,
+                                                          cstate->opts.null_print_len,
+                                                          cstate->file_encoding);
+
+    /* if a header has been requested send the line */
+    if (cstate->opts.header_line)
+    {
+        ListCell   *cur;
+        bool        hdr_delim = false;
+
+        foreach(cur, cstate->attnumlist)
+        {
+            int            attnum = lfirst_int(cur);
+            char       *colname;
+
+            if (hdr_delim)
+                CopySendChar(cstate, cstate->opts.delim[0]);
+            hdr_delim = true;
+
+            colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
+
+            if (cstate->opts.csv_mode)
+                CopyAttributeOutCSV(cstate, colname, false);
+            else
+                CopyAttributeOutText(cstate, colname);
+        }
+
+        CopyToTextLikeSendEndOfRow(cstate);
+    }
+}
+
+/*
+ * CopyToTextLikeOutFunc
+ *
+ * Assign output function data for a relation's attribute in text/CSV format.
+ */
+static void
+CopyToTextLikeOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo)
+{
+    Oid            func_oid;
+    bool        is_varlena;
+
+    /* Set output function for an attribute */
+    getTypeOutputInfo(atttypid, &func_oid, &is_varlena);
+    fmgr_info(func_oid, finfo);
+}
+
+
+/*
+ * CopyToTextLikeOneRow
+ *
+ * Process one row for text/CSV format.
+ *
+ * Workhorse for CopyToTextOneRow() and CopyToCSVOneRow().
+ */
+static inline void
+CopyToTextLikeOneRow(CopyToState cstate,
+                     TupleTableSlot *slot,
+                     bool is_csv)
+{
+    bool        need_delim = false;
+    FmgrInfo   *out_functions = cstate->out_functions;
+    ListCell   *cur;
+
+    foreach(cur, cstate->attnumlist)
+    {
+        int            attnum = lfirst_int(cur);
+        Datum        value = slot->tts_values[attnum - 1];
+        bool        isnull = slot->tts_isnull[attnum - 1];
+
+        if (need_delim)
+            CopySendChar(cstate, cstate->opts.delim[0]);
+        need_delim = true;
+
+        if (isnull)
+        {
+            CopySendString(cstate, cstate->opts.null_print_client);
+        }
+        else
+        {
+            char       *string;
+
+            string = OutputFunctionCall(&out_functions[attnum - 1],
+                                        value);
+
+            /*
+             * is_csv will be optimized away by compiler, as argument is
+             * constant at caller.
+             */
+            if (is_csv)
+                CopyAttributeOutCSV(cstate, string,
+                                    cstate->opts.force_quote_flags[attnum - 1]);
+            else
+                CopyAttributeOutText(cstate, string);
+        }
+    }
+
+    CopyToTextLikeSendEndOfRow(cstate);
+}
+
+/*
+ * CopyToTextOneRow
+ *
+ * Per-row callback for COPY TO with text format.
+ */
+static void
+CopyToTextOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+    CopyToTextLikeOneRow(cstate, slot, false);
+}
+
+/*
+ * CopyToTextOneRow
+ *
+ * Per-row callback for COPY TO with CSV format.
+ */
+static void
+CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+    CopyToTextLikeOneRow(cstate, slot, true);
+}
+
+/*
+ * CopyToTextLikeEnd
+ *
+ * End of COPY TO for text/CSV format.
+ */
+static void
+CopyToTextLikeEnd(CopyToState cstate)
+{
+    /* Nothing to do here */
+}
+
+/*
+ * CopyToRoutine implementation for "binary".
+ */
+
+/*
+ * CopyToBinaryStart
+ *
+ * Start of COPY TO for binary format.
+ */
+static void
+CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc)
+{
+    /* Generate header for a binary copy */
+    int32        tmp;
+
+    /* Signature */
+    CopySendData(cstate, BinarySignature, 11);
+    /* Flags field */
+    tmp = 0;
+    CopySendInt32(cstate, tmp);
+    /* No header extension */
+    tmp = 0;
+    CopySendInt32(cstate, tmp);
+}
+
+/*
+ * CopyToBinaryOutFunc
+ *
+ * Assign output function data for a relation's attribute in binary format.
+ */
+static void
+CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo)
+{
+    Oid            func_oid;
+    bool        is_varlena;
+
+    /* Set output function for an attribute */
+    getTypeBinaryOutputInfo(atttypid, &func_oid, &is_varlena);
+    fmgr_info(func_oid, finfo);
+}
+
+/*
+ * CopyToBinaryOneRow
+ *
+ * Process one row for binary format.
+ */
+static void
+CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+    FmgrInfo   *out_functions = cstate->out_functions;
+    ListCell   *cur;
+
+    /* Binary per-tuple header */
+    CopySendInt16(cstate, list_length(cstate->attnumlist));
+
+    foreach(cur, cstate->attnumlist)
+    {
+        int            attnum = lfirst_int(cur);
+        Datum        value = slot->tts_values[attnum - 1];
+        bool        isnull = slot->tts_isnull[attnum - 1];
+
+        if (isnull)
+        {
+            CopySendInt32(cstate, -1);
+        }
+        else
+        {
+            bytea       *outputbytes;
+
+            outputbytes = SendFunctionCall(&out_functions[attnum - 1],
+                                           value);
+            CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
+            CopySendData(cstate, VARDATA(outputbytes),
+                         VARSIZE(outputbytes) - VARHDRSZ);
+        }
+    }
+
+    CopySendEndOfRow(cstate);
+}
+
+/*
+ * CopyToBinaryEnd
+ *
+ * End of COPY TO for binary format.
+ */
+static void
+CopyToBinaryEnd(CopyToState cstate)
+{
+    /* Generate trailer for a binary copy */
+    CopySendInt16(cstate, -1);
+    /* Need to flush out the trailer */
+    CopySendEndOfRow(cstate);
+}
+
+/*
+ * CSV and text share the same implementation, at the exception of the
+ * output representation and per-row callbacks.
+ */
+static const CopyToRoutine CopyToRoutineText = {
+    .CopyToStart = CopyToTextLikeStart,
+    .CopyToOutFunc = CopyToTextLikeOutFunc,
+    .CopyToOneRow = CopyToTextOneRow,
+    .CopyToEnd = CopyToTextLikeEnd,
+};
+
+static const CopyToRoutine CopyToRoutineCSV = {
+    .CopyToStart = CopyToTextLikeStart,
+    .CopyToOutFunc = CopyToTextLikeOutFunc,
+    .CopyToOneRow = CopyToCSVOneRow,
+    .CopyToEnd = CopyToTextLikeEnd,
+};
+
+static const CopyToRoutine CopyToRoutineBinary = {
+    .CopyToStart = CopyToBinaryStart,
+    .CopyToOutFunc = CopyToBinaryOutFunc,
+    .CopyToOneRow = CopyToBinaryOneRow,
+    .CopyToEnd = CopyToBinaryEnd,
+};
+
+/*
+ * Define the COPY TO routines to use for a format.  This should be called
+ * after options are parsed.
+ */
+static const CopyToRoutine *
+CopyToGetRoutine(CopyFormatOptions opts)
+{
+    if (opts.csv_mode)
+        return &CopyToRoutineCSV;
+    else if (opts.binary)
+        return &CopyToRoutineBinary;
+
+    /* default is text */
+    return &CopyToRoutineText;
+}
 
 /*
  * Send copy start/stop messages for frontend copies.  These have changed
@@ -195,16 +510,6 @@ CopySendEndOfRow(CopyToState cstate)
     switch (cstate->copy_dest)
     {
         case COPY_FILE:
-            if (!cstate->opts.binary)
-            {
-                /* Default line termination depends on platform */
-#ifndef WIN32
-                CopySendChar(cstate, '\n');
-#else
-                CopySendString(cstate, "\r\n");
-#endif
-            }
-
             if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1,
                        cstate->copy_file) != 1 ||
                 ferror(cstate->copy_file))
@@ -239,10 +544,6 @@ CopySendEndOfRow(CopyToState cstate)
             }
             break;
         case COPY_FRONTEND:
-            /* The FE/BE protocol uses \n as newline for all platforms */
-            if (!cstate->opts.binary)
-                CopySendChar(cstate, '\n');
-
             /* Dump the accumulated row as one CopyData message */
             (void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len);
             break;
@@ -430,6 +731,9 @@ BeginCopyTo(ParseState *pstate,
     /* Extract options from the statement node tree */
     ProcessCopyOptions(pstate, &cstate->opts, false /* is_from */ , options);
 
+    /* Set format routine */
+    cstate->routine = CopyToGetRoutine(cstate->opts);
+
     /* Process the source/target relation or query */
     if (rel)
     {
@@ -770,27 +1074,10 @@ DoCopyTo(CopyToState cstate)
     foreach(cur, cstate->attnumlist)
     {
         int            attnum = lfirst_int(cur);
-        Oid            out_func_oid;
-        bool        isvarlena;
         Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
 
-        if (cstate->opts.binary)
-        {
-            getTypeBinaryOutputInfo(attr->atttypid,
-                                    &out_func_oid,
-                                    &isvarlena);
-            fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
-        }
-        else if (cstate->routine)
-            cstate->routine->CopyToOutFunc(cstate, attr->atttypid,
-                                           &cstate->out_functions[attnum - 1]);
-        else
-        {
-            getTypeOutputInfo(attr->atttypid,
-                              &out_func_oid,
-                              &isvarlena);
-            fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
-        }
+        cstate->routine->CopyToOutFunc(cstate, attr->atttypid,
+                                       &cstate->out_functions[attnum - 1]);
     }
 
     /*
@@ -803,58 +1090,7 @@ DoCopyTo(CopyToState cstate)
                                                "COPY TO",
                                                ALLOCSET_DEFAULT_SIZES);
 
-    if (cstate->opts.binary)
-    {
-        /* Generate header for a binary copy */
-        int32        tmp;
-
-        /* Signature */
-        CopySendData(cstate, BinarySignature, 11);
-        /* Flags field */
-        tmp = 0;
-        CopySendInt32(cstate, tmp);
-        /* No header extension */
-        tmp = 0;
-        CopySendInt32(cstate, tmp);
-    }
-    else if (cstate->routine)
-        cstate->routine->CopyToStart(cstate, tupDesc);
-    else
-    {
-        /*
-         * For non-binary copy, we need to convert null_print to file
-         * encoding, because it will be sent directly with CopySendString.
-         */
-        if (cstate->need_transcoding)
-            cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print,
-                                                              cstate->opts.null_print_len,
-                                                              cstate->file_encoding);
-
-        /* if a header has been requested send the line */
-        if (cstate->opts.header_line)
-        {
-            bool        hdr_delim = false;
-
-            foreach(cur, cstate->attnumlist)
-            {
-                int            attnum = lfirst_int(cur);
-                char       *colname;
-
-                if (hdr_delim)
-                    CopySendChar(cstate, cstate->opts.delim[0]);
-                hdr_delim = true;
-
-                colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
-
-                if (cstate->opts.csv_mode)
-                    CopyAttributeOutCSV(cstate, colname, false);
-                else
-                    CopyAttributeOutText(cstate, colname);
-            }
-
-            CopySendEndOfRow(cstate);
-        }
-    }
+    cstate->routine->CopyToStart(cstate, tupDesc);
 
     if (cstate->rel)
     {
@@ -893,15 +1129,7 @@ DoCopyTo(CopyToState cstate)
         processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
     }
 
-    if (cstate->opts.binary)
-    {
-        /* Generate trailer for a binary copy */
-        CopySendInt16(cstate, -1);
-        /* Need to flush out the trailer */
-        CopySendEndOfRow(cstate);
-    }
-    else if (cstate->routine)
-        cstate->routine->CopyToEnd(cstate);
+    cstate->routine->CopyToEnd(cstate);
 
     MemoryContextDelete(cstate->rowcontext);
 
@@ -917,11 +1145,7 @@ DoCopyTo(CopyToState cstate)
 static void
 CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 {
-    bool        need_delim = false;
-    FmgrInfo   *out_functions = cstate->out_functions;
     MemoryContext oldcontext;
-    ListCell   *cur;
-    char       *string;
 
     MemoryContextReset(cstate->rowcontext);
     oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
@@ -929,65 +1153,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
     /* Make sure the tuple is fully deconstructed */
     slot_getallattrs(slot);
 
-    if (cstate->routine)
-    {
-        cstate->routine->CopyToOneRow(cstate, slot);
-        MemoryContextSwitchTo(oldcontext);
-        return;
-    }
-
-    if (cstate->opts.binary)
-    {
-        /* Binary per-tuple header */
-        CopySendInt16(cstate, list_length(cstate->attnumlist));
-    }
-
-    foreach(cur, cstate->attnumlist)
-    {
-        int            attnum = lfirst_int(cur);
-        Datum        value = slot->tts_values[attnum - 1];
-        bool        isnull = slot->tts_isnull[attnum - 1];
-
-        if (!cstate->opts.binary)
-        {
-            if (need_delim)
-                CopySendChar(cstate, cstate->opts.delim[0]);
-            need_delim = true;
-        }
-
-        if (isnull)
-        {
-            if (!cstate->opts.binary)
-                CopySendString(cstate, cstate->opts.null_print_client);
-            else
-                CopySendInt32(cstate, -1);
-        }
-        else
-        {
-            if (!cstate->opts.binary)
-            {
-                string = OutputFunctionCall(&out_functions[attnum - 1],
-                                            value);
-                if (cstate->opts.csv_mode)
-                    CopyAttributeOutCSV(cstate, string,
-                                        cstate->opts.force_quote_flags[attnum - 1]);
-                else
-                    CopyAttributeOutText(cstate, string);
-            }
-            else
-            {
-                bytea       *outputbytes;
-
-                outputbytes = SendFunctionCall(&out_functions[attnum - 1],
-                                               value);
-                CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
-                CopySendData(cstate, VARDATA(outputbytes),
-                             VARSIZE(outputbytes) - VARHDRSZ);
-            }
-        }
-    }
-
-    CopySendEndOfRow(cstate);
+    cstate->routine->CopyToOneRow(cstate, slot);
 
     MemoryContextSwitchTo(oldcontext);
 }
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 141fd48dc10..ccfbdf0ee01 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -104,8 +104,6 @@ extern CopyFromState BeginCopyFrom(ParseState *pstate, Relation rel, Node *where
 extern void EndCopyFrom(CopyFromState cstate);
 extern bool NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
                          Datum *values, bool *nulls);
-extern bool NextCopyFromRawFields(CopyFromState cstate,
-                                  char ***fields, int *nfields);
 extern void CopyFromErrorCallback(void *arg);
 extern char *CopyLimitPrintoutLength(const char *str);
 
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 509b9e92a18..c11b5ff3cc0 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -187,4 +187,12 @@ typedef struct CopyFromStateData
 extern void ReceiveCopyBegin(CopyFromState cstate);
 extern void ReceiveCopyBinaryHeader(CopyFromState cstate);
 
+/* Callbacks for CopyFromRoutine->CopyFromOneRow */
+extern bool CopyFromTextOneRow(CopyFromState cstate, ExprContext *econtext,
+                               Datum *values, bool *nulls);
+extern bool CopyFromCSVOneRow(CopyFromState cstate, ExprContext *econtext,
+                              Datum *values, bool *nulls);
+extern bool CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext,
+                                 Datum *values, bool *nulls);
+
 #endif                            /* COPYFROM_INTERNAL_H */
-- 
2.45.2
From 273918b50a2c64659376e4a9b8e42d2a6480abfb Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Tue, 23 Jul 2024 17:39:41 +0900
Subject: [PATCH v19 3/5] Add support for adding custom COPY TO format
This uses the handler approach like tablesample. The approach creates
an internal function that returns an internal struct. In this case,
a COPY TO handler returns a CopyToRoutine and a COPY FROM handler
returns a CopyFromRoutine.
This uses the same handler for COPY TO and COPY FROM. PostgreSQL calls a
COPY TO/FROM handler with "is_from" argument. It's true for COPY FROM
and false for COPY TO:
    copy_handler(true) returns CopyToRoutine
    copy_handler(false) returns CopyFromRoutine
This also add a test module for custom COPY TO/FROM handler.
---
 src/backend/commands/copy.c                   |  96 ++++++++++++++---
 src/backend/commands/copyfrom.c               |   4 +-
 src/backend/commands/copyto.c                 |   4 +-
 src/backend/nodes/Makefile                    |   1 +
 src/backend/nodes/gen_node_support.pl         |   2 +
 src/backend/utils/adt/pseudotypes.c           |   1 +
 src/include/catalog/pg_proc.dat               |   6 ++
 src/include/catalog/pg_type.dat               |   6 ++
 src/include/commands/copy.h                   |   2 +
 src/include/commands/copyapi.h                |   4 +
 src/include/nodes/meson.build                 |   1 +
 src/test/modules/Makefile                     |   1 +
 src/test/modules/meson.build                  |   1 +
 src/test/modules/test_copy_format/.gitignore  |   4 +
 src/test/modules/test_copy_format/Makefile    |  23 ++++
 .../expected/test_copy_format.out             |  21 ++++
 src/test/modules/test_copy_format/meson.build |  33 ++++++
 .../test_copy_format/sql/test_copy_format.sql |   6 ++
 .../test_copy_format--1.0.sql                 |   8 ++
 .../test_copy_format/test_copy_format.c       | 100 ++++++++++++++++++
 .../test_copy_format/test_copy_format.control |   4 +
 21 files changed, 313 insertions(+), 15 deletions(-)
 create mode 100644 src/test/modules/test_copy_format/.gitignore
 create mode 100644 src/test/modules/test_copy_format/Makefile
 create mode 100644 src/test/modules/test_copy_format/expected/test_copy_format.out
 create mode 100644 src/test/modules/test_copy_format/meson.build
 create mode 100644 src/test/modules/test_copy_format/sql/test_copy_format.sql
 create mode 100644 src/test/modules/test_copy_format/test_copy_format--1.0.sql
 create mode 100644 src/test/modules/test_copy_format/test_copy_format.c
 create mode 100644 src/test/modules/test_copy_format/test_copy_format.control
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index df7a4a21c94..e5137e7bb3d 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -32,6 +32,7 @@
 #include "parser/parse_coerce.h"
 #include "parser/parse_collate.h"
 #include "parser/parse_expr.h"
+#include "parser/parse_func.h"
 #include "parser/parse_relation.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
@@ -439,6 +440,87 @@ defGetCopyLogVerbosityChoice(DefElem *def, ParseState *pstate)
     return COPY_LOG_VERBOSITY_DEFAULT;    /* keep compiler quiet */
 }
 
+/*
+ * Process the "format" option.
+ *
+ * This function checks whether the option value is a built-in format such as
+ * "text" and "csv" or not. If the option value isn't a built-in format, this
+ * function finds a COPY format handler that returns a CopyToRoutine (for
+ * is_from == false) or CopyFromRountine (for is_from == true). If no COPY
+ * format handler is found, this function reports an error.
+ */
+static void
+ProcessCopyOptionFormat(ParseState *pstate,
+                        CopyFormatOptions *opts_out,
+                        bool is_from,
+                        DefElem *defel)
+{
+    char       *format;
+    Oid            funcargtypes[1];
+    Oid            handlerOid = InvalidOid;
+    Datum        datum;
+    Node       *routine;
+
+    format = defGetString(defel);
+
+    /* built-in formats */
+    if (strcmp(format, "text") == 0)
+         /* default format */ return;
+    else if (strcmp(format, "csv") == 0)
+    {
+        opts_out->csv_mode = true;
+        return;
+    }
+    else if (strcmp(format, "binary") == 0)
+    {
+        opts_out->binary = true;
+        return;
+    }
+
+    /* custom format */
+    funcargtypes[0] = INTERNALOID;
+    handlerOid = LookupFuncName(list_make1(makeString(format)), 1,
+                                funcargtypes, true);
+    if (!OidIsValid(handlerOid))
+        ereport(ERROR,
+                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                 errmsg("COPY format \"%s\" not recognized", format),
+                 parser_errposition(pstate, defel->location)));
+
+    datum = OidFunctionCall1(handlerOid, BoolGetDatum(is_from));
+    routine = (Node *) DatumGetPointer(datum);
+    if (is_from)
+    {
+        if (routine == NULL || !IsA(routine, CopyFromRoutine))
+            ereport(
+                    ERROR,
+                    (errcode(
+                             ERRCODE_INVALID_PARAMETER_VALUE),
+                     errmsg("COPY handler function "
+                            "%s(%u) did not return a "
+                            "CopyFromRoutine struct",
+                            format, handlerOid),
+                     parser_errposition(
+                                        pstate, defel->location)));
+    }
+    else
+    {
+        if (routine == NULL || !IsA(routine, CopyToRoutine))
+            ereport(
+                    ERROR,
+                    (errcode(
+                             ERRCODE_INVALID_PARAMETER_VALUE),
+                     errmsg("COPY handler function "
+                            "%s(%u) did not return a "
+                            "CopyToRoutine struct",
+                            format, handlerOid),
+                     parser_errposition(
+                                        pstate, defel->location)));
+    }
+
+    opts_out->routine = routine;
+}
+
 /*
  * Process the statement option list for COPY.
  *
@@ -481,22 +563,10 @@ ProcessCopyOptions(ParseState *pstate,
 
         if (strcmp(defel->defname, "format") == 0)
         {
-            char       *fmt = defGetString(defel);
-
             if (format_specified)
                 errorConflictingDefElem(defel, pstate);
             format_specified = true;
-            if (strcmp(fmt, "text") == 0)
-                 /* default format */ ;
-            else if (strcmp(fmt, "csv") == 0)
-                opts_out->csv_mode = true;
-            else if (strcmp(fmt, "binary") == 0)
-                opts_out->binary = true;
-            else
-                ereport(ERROR,
-                        (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-                         errmsg("COPY format \"%s\" not recognized", fmt),
-                         parser_errposition(pstate, defel->location)));
+            ProcessCopyOptionFormat(pstate, opts_out, is_from, defel);
         }
         else if (strcmp(defel->defname, "freeze") == 0)
         {
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 1a59202f5ab..2b48c825a0a 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -244,7 +244,9 @@ static const CopyFromRoutine CopyFromRoutineBinary = {
 static const CopyFromRoutine *
 CopyFromGetRoutine(CopyFormatOptions opts)
 {
-    if (opts.csv_mode)
+    if (opts.routine)
+        return (const CopyFromRoutine *) opts.routine;
+    else if (opts.csv_mode)
         return &CopyFromRoutineCSV;
     else if (opts.binary)
         return &CopyFromRoutineBinary;
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index c7f69ba606d..a9e923467dc 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -435,7 +435,9 @@ static const CopyToRoutine CopyToRoutineBinary = {
 static const CopyToRoutine *
 CopyToGetRoutine(CopyFormatOptions opts)
 {
-    if (opts.csv_mode)
+    if (opts.routine)
+        return (const CopyToRoutine *) opts.routine;
+    else if (opts.csv_mode)
         return &CopyToRoutineCSV;
     else if (opts.binary)
         return &CopyToRoutineBinary;
diff --git a/src/backend/nodes/Makefile b/src/backend/nodes/Makefile
index 66bbad8e6e0..173ee11811c 100644
--- a/src/backend/nodes/Makefile
+++ b/src/backend/nodes/Makefile
@@ -49,6 +49,7 @@ node_headers = \
     access/sdir.h \
     access/tableam.h \
     access/tsmapi.h \
+    commands/copyapi.h \
     commands/event_trigger.h \
     commands/trigger.h \
     executor/tuptable.h \
diff --git a/src/backend/nodes/gen_node_support.pl b/src/backend/nodes/gen_node_support.pl
index 81df3bdf95f..428ab4f0d93 100644
--- a/src/backend/nodes/gen_node_support.pl
+++ b/src/backend/nodes/gen_node_support.pl
@@ -61,6 +61,7 @@ my @all_input_files = qw(
   access/sdir.h
   access/tableam.h
   access/tsmapi.h
+  commands/copyapi.h
   commands/event_trigger.h
   commands/trigger.h
   executor/tuptable.h
@@ -85,6 +86,7 @@ my @nodetag_only_files = qw(
   access/sdir.h
   access/tableam.h
   access/tsmapi.h
+  commands/copyapi.h
   commands/event_trigger.h
   commands/trigger.h
   executor/tuptable.h
diff --git a/src/backend/utils/adt/pseudotypes.c b/src/backend/utils/adt/pseudotypes.c
index e189e9b79d2..25f24ab95d2 100644
--- a/src/backend/utils/adt/pseudotypes.c
+++ b/src/backend/utils/adt/pseudotypes.c
@@ -370,6 +370,7 @@ PSEUDOTYPE_DUMMY_IO_FUNCS(fdw_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(table_am_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(index_am_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(tsm_handler);
+PSEUDOTYPE_DUMMY_IO_FUNCS(copy_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(internal);
 PSEUDOTYPE_DUMMY_IO_FUNCS(anyelement);
 PSEUDOTYPE_DUMMY_IO_FUNCS(anynonarray);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 73d9cf85826..126254473e6 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -7644,6 +7644,12 @@
 { oid => '3312', descr => 'I/O',
   proname => 'tsm_handler_out', prorettype => 'cstring',
   proargtypes => 'tsm_handler', prosrc => 'tsm_handler_out' },
+{ oid => '8753', descr => 'I/O',
+  proname => 'copy_handler_in', proisstrict => 'f', prorettype => 'copy_handler',
+  proargtypes => 'cstring', prosrc => 'copy_handler_in' },
+{ oid => '8754', descr => 'I/O',
+  proname => 'copy_handler_out', prorettype => 'cstring',
+  proargtypes => 'copy_handler', prosrc => 'copy_handler_out' },
 { oid => '267', descr => 'I/O',
   proname => 'table_am_handler_in', proisstrict => 'f',
   prorettype => 'table_am_handler', proargtypes => 'cstring',
diff --git a/src/include/catalog/pg_type.dat b/src/include/catalog/pg_type.dat
index ceff66ccde1..14c6c1ea486 100644
--- a/src/include/catalog/pg_type.dat
+++ b/src/include/catalog/pg_type.dat
@@ -633,6 +633,12 @@
   typcategory => 'P', typinput => 'tsm_handler_in',
   typoutput => 'tsm_handler_out', typreceive => '-', typsend => '-',
   typalign => 'i' },
+{ oid => '8752',
+  descr => 'pseudo-type for the result of a copy to/from method functoin',
+  typname => 'copy_handler', typlen => '4', typbyval => 't', typtype => 'p',
+  typcategory => 'P', typinput => 'copy_handler_in',
+  typoutput => 'copy_handler_out', typreceive => '-', typsend => '-',
+  typalign => 'i' },
 { oid => '269',
   descr => 'pseudo-type for the result of a table AM handler function',
   typname => 'table_am_handler', typlen => '4', typbyval => 't', typtype => 'p',
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index ccfbdf0ee01..79bd4fb9151 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -84,6 +84,8 @@ typedef struct CopyFormatOptions
     CopyOnErrorChoice on_error; /* what to do when error happened */
     CopyLogVerbosityChoice log_verbosity;    /* verbosity of logged messages */
     List       *convert_select; /* list of column names (can be NIL) */
+    Node       *routine;        /* CopyToRoutine or CopyFromRoutine (can be
+                                 * NULL) */
 } CopyFormatOptions;
 
 /* These are private in commands/copy[from|to].c */
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index d1289424c67..e049a45a4b1 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -27,6 +27,8 @@ typedef struct CopyToStateData *CopyToState;
  */
 typedef struct CopyFromRoutine
 {
+    NodeTag        type;
+
     /*
      * Called when COPY FROM is started to set up the input functions
      * associated with the relation's attributes writing to.  `finfo` can be
@@ -69,6 +71,8 @@ typedef struct CopyFromRoutine
  */
 typedef struct CopyToRoutine
 {
+    NodeTag        type;
+
     /*
      * Called when COPY TO is started to set up the output functions
      * associated with the relation's attributes reading from.  `finfo` can be
diff --git a/src/include/nodes/meson.build b/src/include/nodes/meson.build
index b665e55b657..103df1a7873 100644
--- a/src/include/nodes/meson.build
+++ b/src/include/nodes/meson.build
@@ -11,6 +11,7 @@ node_support_input_i = [
   'access/sdir.h',
   'access/tableam.h',
   'access/tsmapi.h',
+  'commands/copyapi.h',
   'commands/event_trigger.h',
   'commands/trigger.h',
   'executor/tuptable.h',
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index 256799f520a..b7b46928a19 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -15,6 +15,7 @@ SUBDIRS = \
           spgist_name_ops \
           test_bloomfilter \
           test_copy_callbacks \
+          test_copy_format \
           test_custom_rmgrs \
           test_ddl_deparse \
           test_dsa \
diff --git a/src/test/modules/meson.build b/src/test/modules/meson.build
index d8fe059d236..c42b4b2b31f 100644
--- a/src/test/modules/meson.build
+++ b/src/test/modules/meson.build
@@ -14,6 +14,7 @@ subdir('spgist_name_ops')
 subdir('ssl_passphrase_callback')
 subdir('test_bloomfilter')
 subdir('test_copy_callbacks')
+subdir('test_copy_format')
 subdir('test_custom_rmgrs')
 subdir('test_ddl_deparse')
 subdir('test_dsa')
diff --git a/src/test/modules/test_copy_format/.gitignore b/src/test/modules/test_copy_format/.gitignore
new file mode 100644
index 00000000000..5dcb3ff9723
--- /dev/null
+++ b/src/test/modules/test_copy_format/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/src/test/modules/test_copy_format/Makefile b/src/test/modules/test_copy_format/Makefile
new file mode 100644
index 00000000000..8497f91624d
--- /dev/null
+++ b/src/test/modules/test_copy_format/Makefile
@@ -0,0 +1,23 @@
+# src/test/modules/test_copy_format/Makefile
+
+MODULE_big = test_copy_format
+OBJS = \
+    $(WIN32RES) \
+    test_copy_format.o
+PGFILEDESC = "test_copy_format - test custom COPY FORMAT"
+
+EXTENSION = test_copy_format
+DATA = test_copy_format--1.0.sql
+
+REGRESS = test_copy_format
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_copy_format
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out
b/src/test/modules/test_copy_format/expected/test_copy_format.out
new file mode 100644
index 00000000000..4ed7c0b12db
--- /dev/null
+++ b/src/test/modules/test_copy_format/expected/test_copy_format.out
@@ -0,0 +1,21 @@
+CREATE EXTENSION test_copy_format;
+CREATE TABLE public.test (a smallint, b integer, c bigint);
+INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+COPY public.test FROM stdin WITH (format 'test_copy_format');
+NOTICE:  test_copy_format: is_from=true
+NOTICE:  CopyFromInFunc: atttypid=21
+NOTICE:  CopyFromInFunc: atttypid=23
+NOTICE:  CopyFromInFunc: atttypid=20
+NOTICE:  CopyFromStart: natts=3
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromEnd
+COPY public.test TO stdout WITH (format 'test_copy_format');
+NOTICE:  test_copy_format: is_from=false
+NOTICE:  CopyToOutFunc: atttypid=21
+NOTICE:  CopyToOutFunc: atttypid=23
+NOTICE:  CopyToOutFunc: atttypid=20
+NOTICE:  CopyToStart: natts=3
+NOTICE:  CopyToOneRow: tts_nvalid=3
+NOTICE:  CopyToOneRow: tts_nvalid=3
+NOTICE:  CopyToOneRow: tts_nvalid=3
+NOTICE:  CopyToEnd
diff --git a/src/test/modules/test_copy_format/meson.build b/src/test/modules/test_copy_format/meson.build
new file mode 100644
index 00000000000..4cefe7b709a
--- /dev/null
+++ b/src/test/modules/test_copy_format/meson.build
@@ -0,0 +1,33 @@
+# Copyright (c) 2024, PostgreSQL Global Development Group
+
+test_copy_format_sources = files(
+  'test_copy_format.c',
+)
+
+if host_system == 'windows'
+  test_copy_format_sources += rc_lib_gen.process(win32ver_rc, extra_args: [
+    '--NAME', 'test_copy_format',
+    '--FILEDESC', 'test_copy_format - test custom COPY FORMAT',])
+endif
+
+test_copy_format = shared_module('test_copy_format',
+  test_copy_format_sources,
+  kwargs: pg_test_mod_args,
+)
+test_install_libs += test_copy_format
+
+test_install_data += files(
+  'test_copy_format.control',
+  'test_copy_format--1.0.sql',
+)
+
+tests += {
+  'name': 'test_copy_format',
+  'sd': meson.current_source_dir(),
+  'bd': meson.current_build_dir(),
+  'regress': {
+    'sql': [
+      'test_copy_format',
+    ],
+  },
+}
diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql
b/src/test/modules/test_copy_format/sql/test_copy_format.sql
new file mode 100644
index 00000000000..e805f7cb011
--- /dev/null
+++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql
@@ -0,0 +1,6 @@
+CREATE EXTENSION test_copy_format;
+CREATE TABLE public.test (a smallint, b integer, c bigint);
+INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+COPY public.test FROM stdin WITH (format 'test_copy_format');
+\.
+COPY public.test TO stdout WITH (format 'test_copy_format');
diff --git a/src/test/modules/test_copy_format/test_copy_format--1.0.sql
b/src/test/modules/test_copy_format/test_copy_format--1.0.sql
new file mode 100644
index 00000000000..d24ea03ce99
--- /dev/null
+++ b/src/test/modules/test_copy_format/test_copy_format--1.0.sql
@@ -0,0 +1,8 @@
+/* src/test/modules/test_copy_format/test_copy_format--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_copy_format" to load this file. \quit
+
+CREATE FUNCTION test_copy_format(internal)
+    RETURNS copy_handler
+    AS 'MODULE_PATHNAME' LANGUAGE C;
diff --git a/src/test/modules/test_copy_format/test_copy_format.c
b/src/test/modules/test_copy_format/test_copy_format.c
new file mode 100644
index 00000000000..f6b105659ab
--- /dev/null
+++ b/src/test/modules/test_copy_format/test_copy_format.c
@@ -0,0 +1,100 @@
+/*--------------------------------------------------------------------------
+ *
+ * test_copy_format.c
+ *        Code for testing custom COPY format.
+ *
+ * Portions Copyright (c) 2024, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *        src/test/modules/test_copy_format/test_copy_format.c
+ *
+ * -------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "commands/copyapi.h"
+#include "commands/defrem.h"
+
+PG_MODULE_MAGIC;
+
+static void
+CopyFromInFunc(CopyFromState cstate, Oid atttypid,
+               FmgrInfo *finfo, Oid *typioparam)
+{
+    ereport(NOTICE, (errmsg("CopyFromInFunc: atttypid=%d", atttypid)));
+}
+
+static void
+CopyFromStart(CopyFromState cstate, TupleDesc tupDesc)
+{
+    ereport(NOTICE, (errmsg("CopyFromStart: natts=%d", tupDesc->natts)));
+}
+
+static bool
+CopyFromOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls)
+{
+    ereport(NOTICE, (errmsg("CopyFromOneRow")));
+    return false;
+}
+
+static void
+CopyFromEnd(CopyFromState cstate)
+{
+    ereport(NOTICE, (errmsg("CopyFromEnd")));
+}
+
+static const CopyFromRoutine CopyFromRoutineTestCopyFormat = {
+    .type = T_CopyFromRoutine,
+    .CopyFromInFunc = CopyFromInFunc,
+    .CopyFromStart = CopyFromStart,
+    .CopyFromOneRow = CopyFromOneRow,
+    .CopyFromEnd = CopyFromEnd,
+};
+
+static void
+CopyToOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo)
+{
+    ereport(NOTICE, (errmsg("CopyToOutFunc: atttypid=%d", atttypid)));
+}
+
+static void
+CopyToStart(CopyToState cstate, TupleDesc tupDesc)
+{
+    ereport(NOTICE, (errmsg("CopyToStart: natts=%d", tupDesc->natts)));
+}
+
+static void
+CopyToOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+    ereport(NOTICE, (errmsg("CopyToOneRow: tts_nvalid=%u", slot->tts_nvalid)));
+}
+
+static void
+CopyToEnd(CopyToState cstate)
+{
+    ereport(NOTICE, (errmsg("CopyToEnd")));
+}
+
+static const CopyToRoutine CopyToRoutineTestCopyFormat = {
+    .type = T_CopyToRoutine,
+    .CopyToOutFunc = CopyToOutFunc,
+    .CopyToStart = CopyToStart,
+    .CopyToOneRow = CopyToOneRow,
+    .CopyToEnd = CopyToEnd,
+};
+
+PG_FUNCTION_INFO_V1(test_copy_format);
+Datum
+test_copy_format(PG_FUNCTION_ARGS)
+{
+    bool        is_from = PG_GETARG_BOOL(0);
+
+    ereport(NOTICE,
+            (errmsg("test_copy_format: is_from=%s", is_from ? "true" : "false")));
+
+    if (is_from)
+        PG_RETURN_POINTER(&CopyFromRoutineTestCopyFormat);
+    else
+        PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat);
+}
diff --git a/src/test/modules/test_copy_format/test_copy_format.control
b/src/test/modules/test_copy_format/test_copy_format.control
new file mode 100644
index 00000000000..f05a6362358
--- /dev/null
+++ b/src/test/modules/test_copy_format/test_copy_format.control
@@ -0,0 +1,4 @@
+comment = 'Test code for custom COPY format'
+default_version = '1.0'
+module_pathname = '$libdir/test_copy_format'
+relocatable = true
-- 
2.45.2
From a6670ca32945b19f5c2a35b6f4094fa8286ed327 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Tue, 23 Jan 2024 14:54:10 +0900
Subject: [PATCH v19 4/5] Export CopyToStateData and CopyFromStateData
It's for custom COPY TO/FROM format handlers implemented as extension.
This just moves codes. This doesn't change codes except
CopyDest/CopyFrom enum values. CopyDest/CopyFrom enum values such as
COPY_FILE are conflicted each other. So COPY_DEST_ prefix instead of
COPY_ prefix is used for CopyDest enum values and COPY_SOURCE_ prefix
instead of COPY_PREFIX_ is used for CopyFrom enum values. For example,
COPY_FILE in CopyDest is renamed to COPY_DEST_FILE and COPY_FILE in
CopyFrom is renamed to COPY_SOURCE_FILE.
Note that this isn't enough to implement custom COPY TO/FROM format
handlers as extension. We'll do the followings in a subsequent commit:
For custom COPY TO format handler:
1. Add an opaque space for custom COPY TO format handler
2. Export CopySendEndOfRow() to flush buffer
For custom COPY FROM format handler:
1. Add an opaque space for custom COPY FROM format handler
2. Export CopyReadBinaryData() to read the next data
---
 src/backend/commands/copyfrom.c          |   4 +-
 src/backend/commands/copyfromparse.c     |  10 +-
 src/backend/commands/copyto.c            |  77 +-----
 src/include/commands/copy.h              |  78 +-----
 src/include/commands/copyapi.h           | 306 ++++++++++++++++++++++-
 src/include/commands/copyfrom_internal.h | 165 ------------
 6 files changed, 320 insertions(+), 320 deletions(-)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 2b48c825a0a..5902172b8df 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1699,7 +1699,7 @@ BeginCopyFrom(ParseState *pstate,
                             pg_encoding_to_char(GetDatabaseEncoding()))));
     }
 
-    cstate->copy_src = COPY_FILE;    /* default */
+    cstate->copy_src = COPY_SOURCE_FILE;    /* default */
 
     cstate->whereClause = whereClause;
 
@@ -1827,7 +1827,7 @@ BeginCopyFrom(ParseState *pstate,
     if (data_source_cb)
     {
         progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK;
-        cstate->copy_src = COPY_CALLBACK;
+        cstate->copy_src = COPY_SOURCE_CALLBACK;
         cstate->data_source_cb = data_source_cb;
     }
     else if (pipe)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 90824b47785..74844103228 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -180,7 +180,7 @@ ReceiveCopyBegin(CopyFromState cstate)
     for (i = 0; i < natts; i++)
         pq_sendint16(&buf, format); /* per-column formats */
     pq_endmessage(&buf);
-    cstate->copy_src = COPY_FRONTEND;
+    cstate->copy_src = COPY_SOURCE_FRONTEND;
     cstate->fe_msgbuf = makeStringInfo();
     /* We *must* flush here to ensure FE knows it can send. */
     pq_flush();
@@ -248,7 +248,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
 
     switch (cstate->copy_src)
     {
-        case COPY_FILE:
+        case COPY_SOURCE_FILE:
             bytesread = fread(databuf, 1, maxread, cstate->copy_file);
             if (ferror(cstate->copy_file))
                 ereport(ERROR,
@@ -257,7 +257,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
             if (bytesread == 0)
                 cstate->raw_reached_eof = true;
             break;
-        case COPY_FRONTEND:
+        case COPY_SOURCE_FRONTEND:
             while (maxread > 0 && bytesread < minread && !cstate->raw_reached_eof)
             {
                 int            avail;
@@ -340,7 +340,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
                 bytesread += avail;
             }
             break;
-        case COPY_CALLBACK:
+        case COPY_SOURCE_CALLBACK:
             bytesread = cstate->data_source_cb(databuf, minread, maxread);
             break;
     }
@@ -1188,7 +1188,7 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
          * after \. up to the protocol end of copy data.  (XXX maybe better
          * not to treat \. as special?)
          */
-        if (cstate->copy_src == COPY_FRONTEND)
+        if (cstate->copy_src == COPY_SOURCE_FRONTEND)
         {
             int            inbytes;
 
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index a9e923467dc..54aa6cdecaf 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -37,67 +37,6 @@
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
-/*
- * Represents the different dest cases we need to worry about at
- * the bottom level
- */
-typedef enum CopyDest
-{
-    COPY_FILE,                    /* to file (or a piped program) */
-    COPY_FRONTEND,                /* to frontend */
-    COPY_CALLBACK,                /* to callback function */
-} CopyDest;
-
-/*
- * This struct contains all the state variables used throughout a COPY TO
- * operation.
- *
- * Multi-byte encodings: all supported client-side encodings encode multi-byte
- * characters by having the first byte's high bit set. Subsequent bytes of the
- * character can have the high bit not set. When scanning data in such an
- * encoding to look for a match to a single-byte (ie ASCII) character, we must
- * use the full pg_encoding_mblen() machinery to skip over multibyte
- * characters, else we might find a false match to a trailing byte. In
- * supported server encodings, there is no possibility of a false match, and
- * it's faster to make useless comparisons to trailing bytes than it is to
- * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true
- * when we have to do it the hard way.
- */
-typedef struct CopyToStateData
-{
-    /* format routine */
-    const CopyToRoutine *routine;
-
-    /* low-level state data */
-    CopyDest    copy_dest;        /* type of copy source/destination */
-    FILE       *copy_file;        /* used if copy_dest == COPY_FILE */
-    StringInfo    fe_msgbuf;        /* used for all dests during COPY TO */
-
-    int            file_encoding;    /* file or remote side's character encoding */
-    bool        need_transcoding;    /* file encoding diff from server? */
-    bool        encoding_embeds_ascii;    /* ASCII can be non-first byte? */
-
-    /* parameters from the COPY command */
-    Relation    rel;            /* relation to copy to */
-    QueryDesc  *queryDesc;        /* executable query to copy from */
-    List       *attnumlist;        /* integer list of attnums to copy */
-    char       *filename;        /* filename, or NULL for STDOUT */
-    bool        is_program;        /* is 'filename' a program to popen? */
-    copy_data_dest_cb data_dest_cb; /* function for writing data */
-
-    CopyFormatOptions opts;
-    Node       *whereClause;    /* WHERE condition (or NULL) */
-
-    /*
-     * Working state
-     */
-    MemoryContext copycontext;    /* per-copy execution context */
-
-    FmgrInfo   *out_functions;    /* lookup info for output functions */
-    MemoryContext rowcontext;    /* per-row evaluation context */
-    uint64        bytes_processed;    /* number of bytes processed so far */
-} CopyToStateData;
-
 /* DestReceiver for COPY (query) TO */
 typedef struct
 {
@@ -143,7 +82,7 @@ CopyToTextLikeSendEndOfRow(CopyToState cstate)
 {
     switch (cstate->copy_dest)
     {
-        case COPY_FILE:
+        case COPY_DEST_FILE:
             /* Default line termination depends on platform */
 #ifndef WIN32
             CopySendChar(cstate, '\n');
@@ -151,7 +90,7 @@ CopyToTextLikeSendEndOfRow(CopyToState cstate)
             CopySendString(cstate, "\r\n");
 #endif
             break;
-        case COPY_FRONTEND:
+        case COPY_DEST_FRONTEND:
             /* The FE/BE protocol uses \n as newline for all platforms */
             CopySendChar(cstate, '\n');
             break;
@@ -464,7 +403,7 @@ SendCopyBegin(CopyToState cstate)
     for (i = 0; i < natts; i++)
         pq_sendint16(&buf, format); /* per-column formats */
     pq_endmessage(&buf);
-    cstate->copy_dest = COPY_FRONTEND;
+    cstate->copy_dest = COPY_DEST_FRONTEND;
 }
 
 static void
@@ -511,7 +450,7 @@ CopySendEndOfRow(CopyToState cstate)
 
     switch (cstate->copy_dest)
     {
-        case COPY_FILE:
+        case COPY_DEST_FILE:
             if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1,
                        cstate->copy_file) != 1 ||
                 ferror(cstate->copy_file))
@@ -545,11 +484,11 @@ CopySendEndOfRow(CopyToState cstate)
                              errmsg("could not write to COPY file: %m")));
             }
             break;
-        case COPY_FRONTEND:
+        case COPY_DEST_FRONTEND:
             /* Dump the accumulated row as one CopyData message */
             (void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len);
             break;
-        case COPY_CALLBACK:
+        case COPY_DEST_CALLBACK:
             cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len);
             break;
     }
@@ -928,12 +867,12 @@ BeginCopyTo(ParseState *pstate,
     /* See Multibyte encoding comment above */
     cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding);
 
-    cstate->copy_dest = COPY_FILE;    /* default */
+    cstate->copy_dest = COPY_DEST_FILE; /* default */
 
     if (data_dest_cb)
     {
         progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK;
-        cstate->copy_dest = COPY_CALLBACK;
+        cstate->copy_dest = COPY_DEST_CALLBACK;
         cstate->data_dest_cb = data_dest_cb;
     }
     else if (pipe)
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 79bd4fb9151..e2411848e9f 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -14,87 +14,11 @@
 #ifndef COPY_H
 #define COPY_H
 
-#include "nodes/execnodes.h"
+#include "commands/copyapi.h"
 #include "nodes/parsenodes.h"
 #include "parser/parse_node.h"
 #include "tcop/dest.h"
 
-/*
- * Represents whether a header line should be present, and whether it must
- * match the actual names (which implies "true").
- */
-typedef enum CopyHeaderChoice
-{
-    COPY_HEADER_FALSE = 0,
-    COPY_HEADER_TRUE,
-    COPY_HEADER_MATCH,
-} CopyHeaderChoice;
-
-/*
- * Represents where to save input processing errors.  More values to be added
- * in the future.
- */
-typedef enum CopyOnErrorChoice
-{
-    COPY_ON_ERROR_STOP = 0,        /* immediately throw errors, default */
-    COPY_ON_ERROR_IGNORE,        /* ignore errors */
-} CopyOnErrorChoice;
-
-/*
- * Represents verbosity of logged messages by COPY command.
- */
-typedef enum CopyLogVerbosityChoice
-{
-    COPY_LOG_VERBOSITY_DEFAULT = 0, /* logs no additional messages, default */
-    COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
-} CopyLogVerbosityChoice;
-
-/*
- * A struct to hold COPY options, in a parsed form. All of these are related
- * to formatting, except for 'freeze', which doesn't really belong here, but
- * it's expedient to parse it along with all the other options.
- */
-typedef struct CopyFormatOptions
-{
-    /* parameters from the COPY command */
-    int            file_encoding;    /* file or remote side's character encoding,
-                                 * -1 if not specified */
-    bool        binary;            /* binary format? */
-    bool        freeze;            /* freeze rows on loading? */
-    bool        csv_mode;        /* Comma Separated Value format? */
-    CopyHeaderChoice header_line;    /* header line? */
-    char       *null_print;        /* NULL marker string (server encoding!) */
-    int            null_print_len; /* length of same */
-    char       *null_print_client;    /* same converted to file encoding */
-    char       *default_print;    /* DEFAULT marker string */
-    int            default_print_len;    /* length of same */
-    char       *delim;            /* column delimiter (must be 1 byte) */
-    char       *quote;            /* CSV quote char (must be 1 byte) */
-    char       *escape;            /* CSV escape char (must be 1 byte) */
-    List       *force_quote;    /* list of column names */
-    bool        force_quote_all;    /* FORCE_QUOTE *? */
-    bool       *force_quote_flags;    /* per-column CSV FQ flags */
-    List       *force_notnull;    /* list of column names */
-    bool        force_notnull_all;    /* FORCE_NOT_NULL *? */
-    bool       *force_notnull_flags;    /* per-column CSV FNN flags */
-    List       *force_null;        /* list of column names */
-    bool        force_null_all; /* FORCE_NULL *? */
-    bool       *force_null_flags;    /* per-column CSV FN flags */
-    bool        convert_selectively;    /* do selective binary conversion? */
-    CopyOnErrorChoice on_error; /* what to do when error happened */
-    CopyLogVerbosityChoice log_verbosity;    /* verbosity of logged messages */
-    List       *convert_select; /* list of column names (can be NIL) */
-    Node       *routine;        /* CopyToRoutine or CopyFromRoutine (can be
-                                 * NULL) */
-} CopyFormatOptions;
-
-/* These are private in commands/copy[from|to].c */
-typedef struct CopyFromStateData *CopyFromState;
-typedef struct CopyToStateData *CopyToState;
-
-typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
-typedef void (*copy_data_dest_cb) (void *data, int len);
-
 extern void DoCopy(ParseState *pstate, const CopyStmt *stmt,
                    int stmt_location, int stmt_len,
                    uint64 *processed);
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index e049a45a4b1..e298b19860c 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -14,12 +14,83 @@
 #ifndef COPYAPI_H
 #define COPYAPI_H
 
+#include "commands/trigger.h"
+#include "executor/execdesc.h"
 #include "executor/tuptable.h"
 #include "nodes/execnodes.h"
 
-/* These are private in commands/copy[from|to].c */
+/*
+ * Represents whether a header line should be present, and whether it must
+ * match the actual names (which implies "true").
+ */
+typedef enum CopyHeaderChoice
+{
+    COPY_HEADER_FALSE = 0,
+    COPY_HEADER_TRUE,
+    COPY_HEADER_MATCH,
+} CopyHeaderChoice;
+
+/*
+ * Represents where to save input processing errors.  More values to be added
+ * in the future.
+ */
+typedef enum CopyOnErrorChoice
+{
+    COPY_ON_ERROR_STOP = 0,        /* immediately throw errors, default */
+    COPY_ON_ERROR_IGNORE,        /* ignore errors */
+} CopyOnErrorChoice;
+
+/*
+ * Represents verbosity of logged messages by COPY command.
+ */
+typedef enum CopyLogVerbosityChoice
+{
+    COPY_LOG_VERBOSITY_DEFAULT = 0, /* logs no additional messages, default */
+    COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
+} CopyLogVerbosityChoice;
+
+/*
+ * A struct to hold COPY options, in a parsed form. All of these are related
+ * to formatting, except for 'freeze', which doesn't really belong here, but
+ * it's expedient to parse it along with all the other options.
+ */
+typedef struct CopyFormatOptions
+{
+    /* parameters from the COPY command */
+    int            file_encoding;    /* file or remote side's character encoding,
+                                 * -1 if not specified */
+    bool        binary;            /* binary format? */
+    bool        freeze;            /* freeze rows on loading? */
+    bool        csv_mode;        /* Comma Separated Value format? */
+    CopyHeaderChoice header_line;    /* header line? */
+    char       *null_print;        /* NULL marker string (server encoding!) */
+    int            null_print_len; /* length of same */
+    char       *null_print_client;    /* same converted to file encoding */
+    char       *default_print;    /* DEFAULT marker string */
+    int            default_print_len;    /* length of same */
+    char       *delim;            /* column delimiter (must be 1 byte) */
+    char       *quote;            /* CSV quote char (must be 1 byte) */
+    char       *escape;            /* CSV escape char (must be 1 byte) */
+    List       *force_quote;    /* list of column names */
+    bool        force_quote_all;    /* FORCE_QUOTE *? */
+    bool       *force_quote_flags;    /* per-column CSV FQ flags */
+    List       *force_notnull;    /* list of column names */
+    bool        force_notnull_all;    /* FORCE_NOT_NULL *? */
+    bool       *force_notnull_flags;    /* per-column CSV FNN flags */
+    List       *force_null;        /* list of column names */
+    bool        force_null_all; /* FORCE_NULL *? */
+    bool       *force_null_flags;    /* per-column CSV FN flags */
+    bool        convert_selectively;    /* do selective binary conversion? */
+    CopyOnErrorChoice on_error; /* what to do when error happened */
+    CopyLogVerbosityChoice log_verbosity;    /* verbosity of logged messages */
+    List       *convert_select; /* list of column names (can be NIL) */
+    Node       *routine;        /* CopyToRoutine or CopyFromRoutine (can be
+                                 * NULL) */
+} CopyFormatOptions;
+
+typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
+
 typedef struct CopyFromStateData *CopyFromState;
-typedef struct CopyToStateData *CopyToState;
 
 /*
  * API structure for a COPY FROM format implementation.  Note this must be
@@ -65,6 +136,174 @@ typedef struct CopyFromRoutine
     void        (*CopyFromEnd) (CopyFromState cstate);
 } CopyFromRoutine;
 
+/*
+ * Represents the different source cases we need to worry about at
+ * the bottom level
+ */
+typedef enum CopySource
+{
+    COPY_SOURCE_FILE,            /* from file (or a piped program) */
+    COPY_SOURCE_FRONTEND,        /* from frontend */
+    COPY_SOURCE_CALLBACK,        /* from callback function */
+} CopySource;
+
+/*
+ * Represents the end-of-line terminator type of the input
+ */
+typedef enum EolType
+{
+    EOL_UNKNOWN,
+    EOL_NL,
+    EOL_CR,
+    EOL_CRNL,
+} EolType;
+
+/*
+ * Represents the insert method to be used during COPY FROM.
+ */
+typedef enum CopyInsertMethod
+{
+    CIM_SINGLE,                    /* use table_tuple_insert or ExecForeignInsert */
+    CIM_MULTI,                    /* always use table_multi_insert or
+                                 * ExecForeignBatchInsert */
+    CIM_MULTI_CONDITIONAL,        /* use table_multi_insert or
+                                 * ExecForeignBatchInsert only if valid */
+} CopyInsertMethod;
+
+/*
+ * This struct contains all the state variables used throughout a COPY FROM
+ * operation.
+ */
+typedef struct CopyFromStateData
+{
+    /* format routine */
+    const CopyFromRoutine *routine;
+
+    /* low-level state data */
+    CopySource    copy_src;        /* type of copy source */
+    FILE       *copy_file;        /* used if copy_src == COPY_FILE */
+    StringInfo    fe_msgbuf;        /* used if copy_src == COPY_FRONTEND */
+
+    EolType        eol_type;        /* EOL type of input */
+    int            file_encoding;    /* file or remote side's character encoding */
+    bool        need_transcoding;    /* file encoding diff from server? */
+    Oid            conversion_proc;    /* encoding conversion function */
+
+    /* parameters from the COPY command */
+    Relation    rel;            /* relation to copy from */
+    List       *attnumlist;        /* integer list of attnums to copy */
+    char       *filename;        /* filename, or NULL for STDIN */
+    bool        is_program;        /* is 'filename' a program to popen? */
+    copy_data_source_cb data_source_cb; /* function for reading data */
+
+    CopyFormatOptions opts;
+    bool       *convert_select_flags;    /* per-column CSV/TEXT CS flags */
+    Node       *whereClause;    /* WHERE condition (or NULL) */
+
+    /* these are just for error messages, see CopyFromErrorCallback */
+    const char *cur_relname;    /* table name for error messages */
+    uint64        cur_lineno;        /* line number for error messages */
+    const char *cur_attname;    /* current att for error messages */
+    const char *cur_attval;        /* current att value for error messages */
+    bool        relname_only;    /* don't output line number, att, etc. */
+
+    /*
+     * Working state
+     */
+    MemoryContext copycontext;    /* per-copy execution context */
+
+    AttrNumber    num_defaults;    /* count of att that are missing and have
+                                 * default value */
+    FmgrInfo   *in_functions;    /* array of input functions for each attrs */
+    Oid           *typioparams;    /* array of element types for in_functions */
+    ErrorSaveContext *escontext;    /* soft error trapper during in_functions
+                                     * execution */
+    uint64        num_errors;        /* total number of rows which contained soft
+                                 * errors */
+    int           *defmap;            /* array of default att numbers related to
+                                 * missing att */
+    ExprState **defexprs;        /* array of default att expressions for all
+                                 * att */
+    bool       *defaults;        /* if DEFAULT marker was found for
+                                 * corresponding att */
+    bool        volatile_defexprs;    /* is any of defexprs volatile? */
+    List       *range_table;    /* single element list of RangeTblEntry */
+    List       *rteperminfos;    /* single element list of RTEPermissionInfo */
+    ExprState  *qualexpr;
+
+    TransitionCaptureState *transition_capture;
+
+    /*
+     * These variables are used to reduce overhead in COPY FROM.
+     *
+     * attribute_buf holds the separated, de-escaped text for each field of
+     * the current line.  The CopyReadAttributes functions return arrays of
+     * pointers into this buffer.  We avoid palloc/pfree overhead by re-using
+     * the buffer on each cycle.
+     *
+     * In binary COPY FROM, attribute_buf holds the binary data for the
+     * current field, but the usage is otherwise similar.
+     */
+    StringInfoData attribute_buf;
+
+    /* field raw data pointers found by COPY FROM */
+
+    int            max_fields;
+    char      **raw_fields;
+
+    /*
+     * Similarly, line_buf holds the whole input line being processed. The
+     * input cycle is first to read the whole line into line_buf, and then
+     * extract the individual attribute fields into attribute_buf.  line_buf
+     * is preserved unmodified so that we can display it in error messages if
+     * appropriate.  (In binary mode, line_buf is not used.)
+     */
+    StringInfoData line_buf;
+    bool        line_buf_valid; /* contains the row being processed? */
+
+    /*
+     * input_buf holds input data, already converted to database encoding.
+     *
+     * In text mode, CopyReadLine parses this data sufficiently to locate line
+     * boundaries, then transfers the data to line_buf. We guarantee that
+     * there is a \0 at input_buf[input_buf_len] at all times.  (In binary
+     * mode, input_buf is not used.)
+     *
+     * If encoding conversion is not required, input_buf is not a separate
+     * buffer but points directly to raw_buf.  In that case, input_buf_len
+     * tracks the number of bytes that have been verified as valid in the
+     * database encoding, and raw_buf_len is the total number of bytes stored
+     * in the buffer.
+     */
+#define INPUT_BUF_SIZE 65536    /* we palloc INPUT_BUF_SIZE+1 bytes */
+    char       *input_buf;
+    int            input_buf_index;    /* next byte to process */
+    int            input_buf_len;    /* total # of bytes stored */
+    bool        input_reached_eof;    /* true if we reached EOF */
+    bool        input_reached_error;    /* true if a conversion error happened */
+    /* Shorthand for number of unconsumed bytes available in input_buf */
+#define INPUT_BUF_BYTES(cstate) ((cstate)->input_buf_len - (cstate)->input_buf_index)
+
+    /*
+     * raw_buf holds raw input data read from the data source (file or client
+     * connection), not yet converted to the database encoding.  Like with
+     * 'input_buf', we guarantee that there is a \0 at raw_buf[raw_buf_len].
+     */
+#define RAW_BUF_SIZE 65536        /* we palloc RAW_BUF_SIZE+1 bytes */
+    char       *raw_buf;
+    int            raw_buf_index;    /* next byte to process */
+    int            raw_buf_len;    /* total # of bytes stored */
+    bool        raw_reached_eof;    /* true if we reached EOF */
+
+    /* Shorthand for number of unconsumed bytes available in raw_buf */
+#define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index)
+
+    uint64        bytes_processed;    /* number of bytes processed so far */
+} CopyFromStateData;
+
+
+typedef struct CopyToStateData *CopyToState;
+
 /*
  * API structure for a COPY TO format implementation.   Note this must be
  * allocated in a server-lifetime manner, typically as a static const struct.
@@ -102,4 +341,67 @@ typedef struct CopyToRoutine
     void        (*CopyToEnd) (CopyToState cstate);
 } CopyToRoutine;
 
+/*
+ * Represents the different dest cases we need to worry about at
+ * the bottom level
+ */
+typedef enum CopyDest
+{
+    COPY_DEST_FILE,                /* to file (or a piped program) */
+    COPY_DEST_FRONTEND,            /* to frontend */
+    COPY_DEST_CALLBACK,            /* to callback function */
+} CopyDest;
+
+typedef void (*copy_data_dest_cb) (void *data, int len);
+
+/*
+ * This struct contains all the state variables used throughout a COPY TO
+ * operation.
+ *
+ * Multi-byte encodings: all supported client-side encodings encode multi-byte
+ * characters by having the first byte's high bit set. Subsequent bytes of the
+ * character can have the high bit not set. When scanning data in such an
+ * encoding to look for a match to a single-byte (ie ASCII) character, we must
+ * use the full pg_encoding_mblen() machinery to skip over multibyte
+ * characters, else we might find a false match to a trailing byte. In
+ * supported server encodings, there is no possibility of a false match, and
+ * it's faster to make useless comparisons to trailing bytes than it is to
+ * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true
+ * when we have to do it the hard way.
+ */
+typedef struct CopyToStateData
+{
+    /* format routine */
+    const CopyToRoutine *routine;
+
+    /* low-level state data */
+    CopyDest    copy_dest;        /* type of copy source/destination */
+    FILE       *copy_file;        /* used if copy_dest == COPY_FILE */
+    StringInfo    fe_msgbuf;        /* used for all dests during COPY TO */
+
+    int            file_encoding;    /* file or remote side's character encoding */
+    bool        need_transcoding;    /* file encoding diff from server? */
+    bool        encoding_embeds_ascii;    /* ASCII can be non-first byte? */
+
+    /* parameters from the COPY command */
+    Relation    rel;            /* relation to copy to */
+    QueryDesc  *queryDesc;        /* executable query to copy from */
+    List       *attnumlist;        /* integer list of attnums to copy */
+    char       *filename;        /* filename, or NULL for STDOUT */
+    bool        is_program;        /* is 'filename' a program to popen? */
+    copy_data_dest_cb data_dest_cb; /* function for writing data */
+
+    CopyFormatOptions opts;
+    Node       *whereClause;    /* WHERE condition (or NULL) */
+
+    /*
+     * Working state
+     */
+    MemoryContext copycontext;    /* per-copy execution context */
+
+    FmgrInfo   *out_functions;    /* lookup info for output functions */
+    MemoryContext rowcontext;    /* per-row evaluation context */
+    uint64        bytes_processed;    /* number of bytes processed so far */
+} CopyToStateData;
+
 #endif                            /* COPYAPI_H */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index c11b5ff3cc0..3863d26d5b7 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -19,171 +19,6 @@
 #include "commands/trigger.h"
 #include "nodes/miscnodes.h"
 
-/*
- * Represents the different source cases we need to worry about at
- * the bottom level
- */
-typedef enum CopySource
-{
-    COPY_FILE,                    /* from file (or a piped program) */
-    COPY_FRONTEND,                /* from frontend */
-    COPY_CALLBACK,                /* from callback function */
-} CopySource;
-
-/*
- *    Represents the end-of-line terminator type of the input
- */
-typedef enum EolType
-{
-    EOL_UNKNOWN,
-    EOL_NL,
-    EOL_CR,
-    EOL_CRNL,
-} EolType;
-
-/*
- * Represents the insert method to be used during COPY FROM.
- */
-typedef enum CopyInsertMethod
-{
-    CIM_SINGLE,                    /* use table_tuple_insert or ExecForeignInsert */
-    CIM_MULTI,                    /* always use table_multi_insert or
-                                 * ExecForeignBatchInsert */
-    CIM_MULTI_CONDITIONAL,        /* use table_multi_insert or
-                                 * ExecForeignBatchInsert only if valid */
-} CopyInsertMethod;
-
-/*
- * This struct contains all the state variables used throughout a COPY FROM
- * operation.
- */
-typedef struct CopyFromStateData
-{
-    /* format routine */
-    const CopyFromRoutine *routine;
-
-    /* low-level state data */
-    CopySource    copy_src;        /* type of copy source */
-    FILE       *copy_file;        /* used if copy_src == COPY_FILE */
-    StringInfo    fe_msgbuf;        /* used if copy_src == COPY_FRONTEND */
-
-    EolType        eol_type;        /* EOL type of input */
-    int            file_encoding;    /* file or remote side's character encoding */
-    bool        need_transcoding;    /* file encoding diff from server? */
-    Oid            conversion_proc;    /* encoding conversion function */
-
-    /* parameters from the COPY command */
-    Relation    rel;            /* relation to copy from */
-    List       *attnumlist;        /* integer list of attnums to copy */
-    char       *filename;        /* filename, or NULL for STDIN */
-    bool        is_program;        /* is 'filename' a program to popen? */
-    copy_data_source_cb data_source_cb; /* function for reading data */
-
-    CopyFormatOptions opts;
-    bool       *convert_select_flags;    /* per-column CSV/TEXT CS flags */
-    Node       *whereClause;    /* WHERE condition (or NULL) */
-
-    /* these are just for error messages, see CopyFromErrorCallback */
-    const char *cur_relname;    /* table name for error messages */
-    uint64        cur_lineno;        /* line number for error messages */
-    const char *cur_attname;    /* current att for error messages */
-    const char *cur_attval;        /* current att value for error messages */
-    bool        relname_only;    /* don't output line number, att, etc. */
-
-    /*
-     * Working state
-     */
-    MemoryContext copycontext;    /* per-copy execution context */
-
-    AttrNumber    num_defaults;    /* count of att that are missing and have
-                                 * default value */
-    FmgrInfo   *in_functions;    /* array of input functions for each attrs */
-    Oid           *typioparams;    /* array of element types for in_functions */
-    ErrorSaveContext *escontext;    /* soft error trapper during in_functions
-                                     * execution */
-    uint64        num_errors;        /* total number of rows which contained soft
-                                 * errors */
-    int           *defmap;            /* array of default att numbers related to
-                                 * missing att */
-    ExprState **defexprs;        /* array of default att expressions for all
-                                 * att */
-    bool       *defaults;        /* if DEFAULT marker was found for
-                                 * corresponding att */
-    bool        volatile_defexprs;    /* is any of defexprs volatile? */
-    List       *range_table;    /* single element list of RangeTblEntry */
-    List       *rteperminfos;    /* single element list of RTEPermissionInfo */
-    ExprState  *qualexpr;
-
-    TransitionCaptureState *transition_capture;
-
-    /*
-     * These variables are used to reduce overhead in COPY FROM.
-     *
-     * attribute_buf holds the separated, de-escaped text for each field of
-     * the current line.  The CopyReadAttributes functions return arrays of
-     * pointers into this buffer.  We avoid palloc/pfree overhead by re-using
-     * the buffer on each cycle.
-     *
-     * In binary COPY FROM, attribute_buf holds the binary data for the
-     * current field, but the usage is otherwise similar.
-     */
-    StringInfoData attribute_buf;
-
-    /* field raw data pointers found by COPY FROM */
-
-    int            max_fields;
-    char      **raw_fields;
-
-    /*
-     * Similarly, line_buf holds the whole input line being processed. The
-     * input cycle is first to read the whole line into line_buf, and then
-     * extract the individual attribute fields into attribute_buf.  line_buf
-     * is preserved unmodified so that we can display it in error messages if
-     * appropriate.  (In binary mode, line_buf is not used.)
-     */
-    StringInfoData line_buf;
-    bool        line_buf_valid; /* contains the row being processed? */
-
-    /*
-     * input_buf holds input data, already converted to database encoding.
-     *
-     * In text mode, CopyReadLine parses this data sufficiently to locate line
-     * boundaries, then transfers the data to line_buf. We guarantee that
-     * there is a \0 at input_buf[input_buf_len] at all times.  (In binary
-     * mode, input_buf is not used.)
-     *
-     * If encoding conversion is not required, input_buf is not a separate
-     * buffer but points directly to raw_buf.  In that case, input_buf_len
-     * tracks the number of bytes that have been verified as valid in the
-     * database encoding, and raw_buf_len is the total number of bytes stored
-     * in the buffer.
-     */
-#define INPUT_BUF_SIZE 65536    /* we palloc INPUT_BUF_SIZE+1 bytes */
-    char       *input_buf;
-    int            input_buf_index;    /* next byte to process */
-    int            input_buf_len;    /* total # of bytes stored */
-    bool        input_reached_eof;    /* true if we reached EOF */
-    bool        input_reached_error;    /* true if a conversion error happened */
-    /* Shorthand for number of unconsumed bytes available in input_buf */
-#define INPUT_BUF_BYTES(cstate) ((cstate)->input_buf_len - (cstate)->input_buf_index)
-
-    /*
-     * raw_buf holds raw input data read from the data source (file or client
-     * connection), not yet converted to the database encoding.  Like with
-     * 'input_buf', we guarantee that there is a \0 at raw_buf[raw_buf_len].
-     */
-#define RAW_BUF_SIZE 65536        /* we palloc RAW_BUF_SIZE+1 bytes */
-    char       *raw_buf;
-    int            raw_buf_index;    /* next byte to process */
-    int            raw_buf_len;    /* total # of bytes stored */
-    bool        raw_reached_eof;    /* true if we reached EOF */
-
-    /* Shorthand for number of unconsumed bytes available in raw_buf */
-#define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index)
-
-    uint64        bytes_processed;    /* number of bytes processed so far */
-} CopyFromStateData;
-
 extern void ReceiveCopyBegin(CopyFromState cstate);
 extern void ReceiveCopyBinaryHeader(CopyFromState cstate);
 
-- 
2.45.2
From 0cb173c99599c0669891642e6f1adb414769ae8c Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Tue, 23 Jan 2024 15:12:43 +0900
Subject: [PATCH v19 5/5] Add support for implementing custom COPY TO/FROM
 format as extension
For custom COPY TO format implementation:
* Add CopyToStateData::opaque that can be used to keep data for custom
  COPY TO format implementation
* Export CopySendEndOfRow() to flush data in CopyToStateData::fe_msgbuf
  as CopyToStateFlush()
For custom COPY FROM format implementation:
* Add CopyFromStateData::opaque that can be used to keep data for
  custom COPY From format implementation
* Export CopyReadBinaryData() to read the next data as
  CopyFromStateRead()
---
 src/backend/commands/copyfromparse.c | 14 ++++++++++++++
 src/backend/commands/copyto.c        | 14 ++++++++++++++
 src/include/commands/copyapi.h       | 10 ++++++++++
 3 files changed, 38 insertions(+)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 74844103228..a115d7f9e26 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -739,6 +739,20 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes)
     return copied_bytes;
 }
 
+/*
+ * CopyFromStateRead
+ *
+ * Export CopyReadBinaryData() for extensions. We want to keep
+ * CopyReadBinaryData() as a static function for
+ * optimization. CopyReadBinaryData() calls in this file may be optimized by
+ * a compiler.
+ */
+int
+CopyFromStateRead(CopyFromState cstate, char *dest, int nbytes)
+{
+    return CopyReadBinaryData(cstate, dest, nbytes);
+}
+
 /*
  * Read raw fields in the next line for COPY FROM in text or csv mode.
  * Return false if no more lines.
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 54aa6cdecaf..b8d0e996117 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -500,6 +500,20 @@ CopySendEndOfRow(CopyToState cstate)
     resetStringInfo(fe_msgbuf);
 }
 
+/*
+ * CopyToStateFlush
+ *
+ * Export CopySendEndOfRow() for extensions. We want to keep
+ * CopySendEndOfRow() as a static function for
+ * optimization. CopySendEndOfRow() calls in this file may be optimized by a
+ * compiler.
+ */
+void
+CopyToStateFlush(CopyToState cstate)
+{
+    CopySendEndOfRow(cstate);
+}
+
 /*
  * These functions do apply some data conversion
  */
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index e298b19860c..5665408eaa0 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -299,8 +299,13 @@ typedef struct CopyFromStateData
 #define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index)
 
     uint64        bytes_processed;    /* number of bytes processed so far */
+
+    /* For custom format implementation */
+    void       *opaque;            /* private space */
 } CopyFromStateData;
 
+extern int    CopyFromStateRead(CopyFromState cstate, char *dest, int nbytes);
+
 
 typedef struct CopyToStateData *CopyToState;
 
@@ -402,6 +407,11 @@ typedef struct CopyToStateData
     FmgrInfo   *out_functions;    /* lookup info for output functions */
     MemoryContext rowcontext;    /* per-row evaluation context */
     uint64        bytes_processed;    /* number of bytes processed so far */
+
+    /* For custom format implementation */
+    void       *opaque;            /* private space */
 } CopyToStateData;
 
+extern void CopyToStateFlush(CopyToState cstate);
+
 #endif                            /* COPYAPI_H */
-- 
2.45.2
			
		On 7/30/24 09:13, Sutou Kouhei wrote: > Hi, > > In <26541788-8853-4d93-86cd-5f701b13ae51@enterprisedb.com> > "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 29 Jul 2024 14:17:08 +0200, > Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > >> I wrote a simple script to automate the benchmark - it just runs these >> tests with different parameters (number of columns and number of >> imported/exported rows). See the run.sh attachment, along with two CSV >> results from current master and with all patches applied. > > Thanks. I also used the script with some modifications: > > 1. Create a test database automatically > 2. Enable blackhole_am automatically > 3. Create create_table_cols() automatically > > I attach it. I also attach results of master and patched. My > results are from my desktop. So it's probably noisy. > >> - For COPY FROM there is no difference - the results are within 1% of >> master, and there's no systemic difference. >> >> - For COPY TO it's a different story, though. There's a pretty clear >> regression, by ~5%. It's a bit interesting the correlation with the >> number of columns is not stronger ... > > My results showed different trend: > > - COPY FROM: Patched is about 15-20% slower than master > - COPY TO: Patched is a bit faster than master > > Here are some my numbers: > > type n_cols n_rows diff master patched > ---------------------------------------------------------- > TO 5 1 100.56% 218.376000 219.609000 > FROM 5 1 113.33% 168.493000 190.954000 > ... > TO 5 5 100.60% 1037.773000 1044.045000 > FROM 5 5 116.46% 767.966000 894.377000 > ... > TO 5 10 100.15% 2092.245000 2095.472000 > FROM 5 10 115.91% 1508.160000 1748.130000 > TO 10 1 98.62% 353.087000 348.214000 > FROM 10 1 118.65% 260.551000 309.133000 > ... > TO 10 5 96.89% 1724.061000 1670.427000 > FROM 10 5 119.92% 1224.098000 1467.941000 > ... > TO 10 10 98.70% 3444.291000 3399.538000 > FROM 10 10 118.79% 2462.314000 2924.866000 > TO 15 1 97.71% 492.082000 480.802000 > FROM 15 1 115.59% 347.820000 402.033000 > ... > TO 15 5 98.32% 2402.419000 2362.140000 > FROM 15 5 115.48% 1657.594000 1914.245000 > ... > TO 15 10 96.91% 4830.319000 4681.145000 > FROM 15 10 115.09% 3304.798000 3803.542000 > TO 20 1 96.05% 629.828000 604.939000 > FROM 20 1 118.50% 438.673000 519.839000 > ... > TO 20 5 97.15% 3084.210000 2996.331000 > FROM 20 5 115.35% 2110.909000 2435.032000 > ... > TO 25 1 98.29% 764.779000 751.684000 > FROM 25 1 115.13% 519.686000 598.301000 > ... > TO 25 5 94.08% 3843.996000 3616.614000 > FROM 25 5 115.62% 2554.008000 2952.928000 > ... > TO 25 10 97.41% 7504.865000 7310.549000 > FROM 25 10 117.25% 4994.463000 5856.029000 > TO 30 1 94.39% 906.324000 855.503000 > FROM 30 1 119.60% 604.110000 722.491000 > ... > TO 30 5 96.50% 4419.907000 4265.417000 > FROM 30 5 116.97% 2932.883000 3430.556000 > ... > TO 30 10 94.39% 8974.878000 8470.991000 > FROM 30 10 117.84% 5800.793000 6835.900000 > ---- > > See the attached diff.txt for full numbers. > I also attach scripts to generate the diff.txt. Here is the > command line I used: > > ---- > ruby diff.rb <(ruby aggregate.rb master.result) <(ruby aggregate.rb patched.result) | tee diff.txt > ---- > > My environment: > > * Debian GNU/Linux sid > * gcc (Debian 13.3.0-2) 13.3.0 > * AMD Ryzen 9 3900X 12-Core Processor > > I'll look into this. > > If someone is interested in this proposal, could you share > your numbers? > I'm on Fedora 40 with gcc 14.1, on Intel i7-9750H. But it's running on Qubes OS, so it's really in a VM which makes it noisier. I'll try to do more benchmarks on a regular hw, but that may take a couple days. I decided to do the benchmark for individual parts of the patch series. The attached PDF shows results for master (label 0000) and the 0001-0005 patches, along with relative performance difference between the patches. The color scale is the same as before - red = bad, green = good. There are pretty clear differences between the patches and "direction" of the COPY. I'm sure it does depend on the hardware - I tried running this on rpi5 (with 32-bits), and it looks very different. There might be a similar behavior difference between Intel and Ryzen, but my point is that when looking for regressions, looking at these "per patch" charts can be very useful (as it reduces the scope of changes that might have caused the regression). >> It's interesting the main change in the flamegraphs is CopyToStateFlush >> pops up on the left side. Because, what is that about? That is a thing >> introduced in the 0005 patch, so maybe the regression is not strictly >> about the existing formats moving to the new API, but due to something >> else in a later version of the patch? > > Ah, making static CopySendEndOfRow() a to non-static function > (CopyToStateFlush()) may be the reason of this. Could you > try the attached v19 patch? It changes the 0005 patch: > Perhaps, that's possible. > * It reverts the static change > * It adds a new non-static function that just exports > CopySendEndOfRow() > I'll try to benchmark this later, when the other machine is available. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
Hi, In <b1c8c9fa-06c5-4b60-a2b3-d2b4bedbbde9@enterprisedb.com> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Tue, 30 Jul 2024 11:51:37 +0200, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > I decided to do the benchmark for individual parts of the patch series. > The attached PDF shows results for master (label 0000) and the 0001-0005 > patches, along with relative performance difference between the patches. > The color scale is the same as before - red = bad, green = good. > > There are pretty clear differences between the patches and "direction" > of the COPY. I'm sure it does depend on the hardware - I tried running > this on rpi5 (with 32-bits), and it looks very different. There might be > a similar behavior difference between Intel and Ryzen, but my point is > that when looking for regressions, looking at these "per patch" charts > can be very useful (as it reduces the scope of changes that might have > caused the regression). Thanks. The numbers on your environment shows that there are performance problems in the following cases in the v18 patch set: 1. 0001 + TO 2. 0005 + TO There are +-~3% differences in FROM cases. They may be noise. +~6% differences in TO cases may not be noise. I also tried another benchmark with the v19 (not v18) patch set with "Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz" not "AMD Ryzen 9 3900X 12-Core Processor". The attached PDF visualized my numbers like your PDF but red = bad, green = good. -30 (blue) means 70% (faster) and 30 (red) means 130% (slower). 0001 + TO is a bit slower like your numbers. Other TO cases are a bit faster. 0002 + FROM is very slower. Other FROM cases are slower with less records but a bit faster with many records. I'll re-run it with "AMD Ryzen 9 3900X 12-Core Processor". FYI: I've created a repository to push benchmark scripts: https://gitlab.com/ktou/pg-bench Thanks, -- kou
Вложения
Hi,
I re-ran the benchmark(*) with the v19 patch set and the
following CPUs:
1. AMD Ryzen 9 3900X 12-Core Processor
2. Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
(*)
* Use tables that have {5,10,15,20,25,30} integer columns
* Use tables that have {1,2,3,4,5,6,7,8,9,10}M rows
* Use '/dev/null' for COPY TO
* Use blackhole_am for COPY FROM
See the attached graphs for details.
Notes:
* X-axis is the number of columns
* Y-axis is the number of M rows
* Z-axis is the elapsed time percent (smaller is faster,
  e.g. 99% is a bit faster than the HEAD and 101% is a bit
  slower than the HEAD)
* Z-ranges aren't same (The Ryzen case uses about 79%-121%
  but the Intel case uses about 91%-111%)
* Red means the patch is slower than HEAD
* Blue means the patch is faster than HEAD
* The upper row shows FROM results
* The lower row shows TO results
Here are summaries based on the results:
For FROM:
* With Ryzen: It shows that negative performance impact
* With Intel: It shows that negative performance impact with
  1-5M rows and positive performance impact with 6M-10M rows
For TO:
* With Ryzen: It shows that positive performance impact
* With Intel: It shows that positive performance impact
Here are insights based on the results:
* 0001 (that introduces Copy{From,To}Routine} and adds some
  "if () {...}" for them but the existing formats still
  doesn't use them) has a bit negative performance impact
* 0002 (that migrates the existing codes to
  Copy{From,To}Routine} based implementations) has positive
  performance impact
  * For FROM: Negative impact by 0001 and positive impact by
    0002 almost balanced
    * We should use both of 0001 and 0002 than only 0001
    * With Ryzon: It's a bit slower than HEAD. So we may not
      want to reject this propose for FROM
    * With Intel:
      * With 1-5M rows: It's a bit slower than HEAD
      * With 6-10M rows: It's a bit faster than HEAD
  * For TO: Positive impact by 0002 is larger than negative
    impact by 0002
    * We should use both of 0001 and 0002 than only 0001
* 0003 (that makes Copy{From,To}Routine Node) has a bit
  negative performance impact
  * But I don't know why. This doesn't change per row
    related codes. Increasing Copy{From,To}Routine size
    (NodeTag is added) may be related.
* 0004 (that moves Copy{From,To}StateData to copyapi.h)
  doesn't have impact
  * It makes sense because this doesn't change any
    implementations.
* 0005 (that add "void *opaque" to Copy{From,To}StateData)
  has a bit negative impact for FROM and a bit positive
  impact for TO
  * But I don't know why. This doesn't change per row
    related codes. Increasing Copy{From,To}StateData size
    ("void *opaque" is added) may be related.
How to proceed this proposal?
* Do we need more numbers to judge this proposal?
  * If so, could someone help us?
* There is no negative performance impact for TO with both
  of Ryzen and Intel based on my results. Can we merge only
  the TO part?
  * Can we defer the FROM part? Should we proceed this
    proposal with both of the FROM and TO part?
* Could someone provide a hint why the FROM part is more
  slower with Ryzen?
(If nobody responds to this, this proposal will get stuck
again. If you're interested in this proposal, could you help
us?)
How to run this benchmark on your machine:
$ cd your-postgres
$ git switch -c copy-format-extendable
$ git am v19-*.patch
$ git clone https://gitlab.com/ktou/pg-bench.git ../pg-bench
$ ../pg-bench/bench.sh copy-format-extendable ../pg-bench/copy-format-extendable/run.sh
(This will take about 5 hours...)
If you want to visualize your results on your machine:
$ sudo gem install ruby-gr
$ ../pg-bench/visualize.rb 5
If you share your results to me, I can visualize it and
share.
Thanks,
-- 
kou
			
		Вложения
Hi,
On Sun, Aug 4, 2024 at 3:20 PM Sutou Kouhei <kou@clear-code.com> wrote:
>
> Hi,
>
> I re-ran the benchmark(*) with the v19 patch set and the
> following CPUs:
>
> 1. AMD Ryzen 9 3900X 12-Core Processor
> 2. Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
>
> (*)
> * Use tables that have {5,10,15,20,25,30} integer columns
> * Use tables that have {1,2,3,4,5,6,7,8,9,10}M rows
> * Use '/dev/null' for COPY TO
> * Use blackhole_am for COPY FROM
>
> See the attached graphs for details.
>
> Notes:
> * X-axis is the number of columns
> * Y-axis is the number of M rows
> * Z-axis is the elapsed time percent (smaller is faster,
>   e.g. 99% is a bit faster than the HEAD and 101% is a bit
>   slower than the HEAD)
> * Z-ranges aren't same (The Ryzen case uses about 79%-121%
>   but the Intel case uses about 91%-111%)
> * Red means the patch is slower than HEAD
> * Blue means the patch is faster than HEAD
> * The upper row shows FROM results
> * The lower row shows TO results
>
> Here are summaries based on the results:
>
> For FROM:
> * With Ryzen: It shows that negative performance impact
> * With Intel: It shows that negative performance impact with
>   1-5M rows and positive performance impact with 6M-10M rows
> For TO:
> * With Ryzen: It shows that positive performance impact
> * With Intel: It shows that positive performance impact
>
> Here are insights based on the results:
>
> * 0001 (that introduces Copy{From,To}Routine} and adds some
>   "if () {...}" for them but the existing formats still
>   doesn't use them) has a bit negative performance impact
> * 0002 (that migrates the existing codes to
>   Copy{From,To}Routine} based implementations) has positive
>   performance impact
>   * For FROM: Negative impact by 0001 and positive impact by
>     0002 almost balanced
>     * We should use both of 0001 and 0002 than only 0001
>     * With Ryzon: It's a bit slower than HEAD. So we may not
>       want to reject this propose for FROM
>     * With Intel:
>       * With 1-5M rows: It's a bit slower than HEAD
>       * With 6-10M rows: It's a bit faster than HEAD
>   * For TO: Positive impact by 0002 is larger than negative
>     impact by 0002
>     * We should use both of 0001 and 0002 than only 0001
> * 0003 (that makes Copy{From,To}Routine Node) has a bit
>   negative performance impact
>   * But I don't know why. This doesn't change per row
>     related codes. Increasing Copy{From,To}Routine size
>     (NodeTag is added) may be related.
> * 0004 (that moves Copy{From,To}StateData to copyapi.h)
>   doesn't have impact
>   * It makes sense because this doesn't change any
>     implementations.
> * 0005 (that add "void *opaque" to Copy{From,To}StateData)
>   has a bit negative impact for FROM and a bit positive
>   impact for TO
>   * But I don't know why. This doesn't change per row
>     related codes. Increasing Copy{From,To}StateData size
>     ("void *opaque" is added) may be related.
I was surprised that the 0005 patch made COPY FROM slower (with fewer
rows) and COPY TO faster overall in spite of just adding one struct
field and some functions.
I'm interested in why the performance trends of COPY FROM are
different between fewer than 6M rows and more than 6M rows.
>
> How to proceed this proposal?
>
> * Do we need more numbers to judge this proposal?
>   * If so, could someone help us?
> * There is no negative performance impact for TO with both
>   of Ryzen and Intel based on my results. Can we merge only
>   the TO part?
>   * Can we defer the FROM part? Should we proceed this
>     proposal with both of the FROM and TO part?
> * Could someone provide a hint why the FROM part is more
>   slower with Ryzen?
>
Separating the patches into two parts (one is for COPY TO and another
one is for COPY FROM) could be a good idea. It would help reviews and
investigate performance regression in COPY FROM cases. And I think we
can commit them separately.
Also, could you please rebase the patches as they conflict with the
current HEAD? I'll run some benchmarks on my environment as well.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
			
		Hi,
In <CAD21AoCwMmwLJ8PQLnZu0MbB4gDJiMvWrHREQD4xRp3-F2RU2Q@mail.gmail.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 27 Sep 2024 16:33:13 -0700,
  Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> * 0005 (that add "void *opaque" to Copy{From,To}StateData)
>>   has a bit negative impact for FROM and a bit positive
>>   impact for TO
>>   * But I don't know why. This doesn't change per row
>>     related codes. Increasing Copy{From,To}StateData size
>>     ("void *opaque" is added) may be related.
> 
> I was surprised that the 0005 patch made COPY FROM slower (with fewer
> rows) and COPY TO faster overall in spite of just adding one struct
> field and some functions.
Me too...
> I'm interested in why the performance trends of COPY FROM are
> different between fewer than 6M rows and more than 6M rows.
My hypothesis:
With this patch set:
  1. One row processing is faster than master.
  2. Non row related processing is slower than master.
If we have many rows, 1. impact is greater than 2. impact.
> Separating the patches into two parts (one is for COPY TO and another
> one is for COPY FROM) could be a good idea. It would help reviews and
> investigate performance regression in COPY FROM cases. And I think we
> can commit them separately.
> 
> Also, could you please rebase the patches as they conflict with the
> current HEAD?
OK. I've prepared 2 patch sets:
v20: It just rebased on master. It still mixes COPY TO and
COPY FROM implementations.
v21: It's based on v20 but splits COPY TO implementations
and COPY FROM implementations.
0001-0005 includes only COPY TO related changes.
0006-0010 includes only COPY FROM related changes.
(v21 0001 + 0006) == (v20 v0001),
(v21 0002 + 0007) == (v20 v0002) and so on.
>               I'll run some benchmarks on my environment as well.
Thanks. It's very helpful.
Thanks,
-- 
kou
From 51779387b107e80ba598fe26f4ecec71c54f916b Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Mon, 4 Mar 2024 13:52:34 +0900
Subject: [PATCH v20 1/5] Add CopyToRoutine/CopyFromRountine
They are for implementing custom COPY TO/FROM format. But this is not
enough to implement custom COPY TO/FROM format yet. We'll export some
APIs to receive/send data and add "format" option to COPY TO/FROM
later.
Existing text/csv/binary format implementations don't use
CopyToRoutine/CopyFromRoutine for now. We have a patch for it but we
defer it. Because there are some mysterious profile results in spite
of we get faster runtimes. See [1] for details.
[1] https://www.postgresql.org/message-id/ZdbtQJ-p5H1_EDwE%40paquier.xyz
Note that this doesn't change existing text/csv/binary format
implementations.
---
 src/backend/commands/copyfrom.c          |  24 +++++-
 src/backend/commands/copyfromparse.c     |   5 ++
 src/backend/commands/copyto.c            |  31 ++++++-
 src/include/commands/copyapi.h           | 101 +++++++++++++++++++++++
 src/include/commands/copyfrom_internal.h |   4 +
 src/tools/pgindent/typedefs.list         |   2 +
 6 files changed, 159 insertions(+), 8 deletions(-)
 create mode 100644 src/include/commands/copyapi.h
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 2d3462913e1..c42485bd9cb 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1632,12 +1632,22 @@ BeginCopyFrom(ParseState *pstate,
 
         /* Fetch the input function and typioparam info */
         if (cstate->opts.binary)
+        {
             getTypeBinaryInputInfo(att->atttypid,
                                    &in_func_oid, &typioparams[attnum - 1]);
+            fmgr_info(in_func_oid, &in_functions[attnum - 1]);
+        }
+        else if (cstate->routine)
+            cstate->routine->CopyFromInFunc(cstate, att->atttypid,
+                                            &in_functions[attnum - 1],
+                                            &typioparams[attnum - 1]);
+
         else
+        {
             getTypeInputInfo(att->atttypid,
                              &in_func_oid, &typioparams[attnum - 1]);
-        fmgr_info(in_func_oid, &in_functions[attnum - 1]);
+            fmgr_info(in_func_oid, &in_functions[attnum - 1]);
+        }
 
         /* Get default info if available */
         defexprs[attnum - 1] = NULL;
@@ -1777,10 +1787,13 @@ BeginCopyFrom(ParseState *pstate,
         /* Read and verify binary header */
         ReceiveCopyBinaryHeader(cstate);
     }
-
-    /* create workspace for CopyReadAttributes results */
-    if (!cstate->opts.binary)
+    else if (cstate->routine)
     {
+        cstate->routine->CopyFromStart(cstate, tupDesc);
+    }
+    else
+    {
+        /* create workspace for CopyReadAttributes results */
         AttrNumber    attr_count = list_length(cstate->attnumlist);
 
         cstate->max_fields = attr_count;
@@ -1798,6 +1811,9 @@ BeginCopyFrom(ParseState *pstate,
 void
 EndCopyFrom(CopyFromState cstate)
 {
+    if (cstate->routine)
+        cstate->routine->CopyFromEnd(cstate);
+
     /* No COPY FROM related resources except memory. */
     if (cstate->is_program)
     {
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 97a4c387a30..2e126448019 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -1012,6 +1012,11 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 
         Assert(fieldno == attr_count);
     }
+    else if (cstate->routine)
+    {
+        if (!cstate->routine->CopyFromOneRow(cstate, econtext, values, nulls))
+            return false;
+    }
     else
     {
         /* binary */
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 91de442f434..3c5a97679aa 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -20,6 +20,7 @@
 
 #include "access/tableam.h"
 #include "commands/copy.h"
+#include "commands/copyapi.h"
 #include "commands/progress.h"
 #include "executor/execdesc.h"
 #include "executor/executor.h"
@@ -64,6 +65,9 @@ typedef enum CopyDest
  */
 typedef struct CopyToStateData
 {
+    /* format routine */
+    const CopyToRoutine *routine;
+
     /* low-level state data */
     CopyDest    copy_dest;        /* type of copy source/destination */
     FILE       *copy_file;        /* used if copy_dest == COPY_FILE */
@@ -772,14 +776,22 @@ DoCopyTo(CopyToState cstate)
         Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
 
         if (cstate->opts.binary)
+        {
             getTypeBinaryOutputInfo(attr->atttypid,
                                     &out_func_oid,
                                     &isvarlena);
+            fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
+        }
+        else if (cstate->routine)
+            cstate->routine->CopyToOutFunc(cstate, attr->atttypid,
+                                           &cstate->out_functions[attnum - 1]);
         else
+        {
             getTypeOutputInfo(attr->atttypid,
                               &out_func_oid,
                               &isvarlena);
-        fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
+            fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
+        }
     }
 
     /*
@@ -806,6 +818,8 @@ DoCopyTo(CopyToState cstate)
         tmp = 0;
         CopySendInt32(cstate, tmp);
     }
+    else if (cstate->routine)
+        cstate->routine->CopyToStart(cstate, tupDesc);
     else
     {
         /*
@@ -887,6 +901,8 @@ DoCopyTo(CopyToState cstate)
         /* Need to flush out the trailer */
         CopySendEndOfRow(cstate);
     }
+    else if (cstate->routine)
+        cstate->routine->CopyToEnd(cstate);
 
     MemoryContextDelete(cstate->rowcontext);
 
@@ -908,15 +924,22 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
     MemoryContextReset(cstate->rowcontext);
     oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
 
+    /* Make sure the tuple is fully deconstructed */
+    slot_getallattrs(slot);
+
+    if (cstate->routine)
+    {
+        cstate->routine->CopyToOneRow(cstate, slot);
+        MemoryContextSwitchTo(oldcontext);
+        return;
+    }
+
     if (cstate->opts.binary)
     {
         /* Binary per-tuple header */
         CopySendInt16(cstate, list_length(cstate->attnumlist));
     }
 
-    /* Make sure the tuple is fully deconstructed */
-    slot_getallattrs(slot);
-
     if (!cstate->opts.binary)
     {
         bool        need_delim = false;
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
new file mode 100644
index 00000000000..d1289424c67
--- /dev/null
+++ b/src/include/commands/copyapi.h
@@ -0,0 +1,101 @@
+/*-------------------------------------------------------------------------
+ *
+ * copyapi.h
+ *      API for COPY TO/FROM handlers
+ *
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/commands/copyapi.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef COPYAPI_H
+#define COPYAPI_H
+
+#include "executor/tuptable.h"
+#include "nodes/execnodes.h"
+
+/* These are private in commands/copy[from|to].c */
+typedef struct CopyFromStateData *CopyFromState;
+typedef struct CopyToStateData *CopyToState;
+
+/*
+ * API structure for a COPY FROM format implementation.  Note this must be
+ * allocated in a server-lifetime manner, typically as a static const struct.
+ */
+typedef struct CopyFromRoutine
+{
+    /*
+     * Called when COPY FROM is started to set up the input functions
+     * associated with the relation's attributes writing to.  `finfo` can be
+     * optionally filled to provide the catalog information of the input
+     * function.  `typioparam` can be optionally filled to define the OID of
+     * the type to pass to the input function.  `atttypid` is the OID of data
+     * type used by the relation's attribute.
+     */
+    void        (*CopyFromInFunc) (CopyFromState cstate, Oid atttypid,
+                                   FmgrInfo *finfo, Oid *typioparam);
+
+    /*
+     * Called when COPY FROM is started.
+     *
+     * `tupDesc` is the tuple descriptor of the relation where the data needs
+     * to be copied.  This can be used for any initialization steps required
+     * by a format.
+     */
+    void        (*CopyFromStart) (CopyFromState cstate, TupleDesc tupDesc);
+
+    /*
+     * Copy one row to a set of `values` and `nulls` of size tupDesc->natts.
+     *
+     * 'econtext' is used to evaluate default expression for each column that
+     * is either not read from the file or is using the DEFAULT option of COPY
+     * FROM.  It is NULL if no default values are used.
+     *
+     * Returns false if there are no more tuples to copy.
+     */
+    bool        (*CopyFromOneRow) (CopyFromState cstate, ExprContext *econtext,
+                                   Datum *values, bool *nulls);
+
+    /* Called when COPY FROM has ended. */
+    void        (*CopyFromEnd) (CopyFromState cstate);
+} CopyFromRoutine;
+
+/*
+ * API structure for a COPY TO format implementation.   Note this must be
+ * allocated in a server-lifetime manner, typically as a static const struct.
+ */
+typedef struct CopyToRoutine
+{
+    /*
+     * Called when COPY TO is started to set up the output functions
+     * associated with the relation's attributes reading from.  `finfo` can be
+     * optionally filled to provide the catalog information of the output
+     * function.  `atttypid` is the OID of data type used by the relation's
+     * attribute.
+     */
+    void        (*CopyToOutFunc) (CopyToState cstate, Oid atttypid,
+                                  FmgrInfo *finfo);
+
+    /*
+     * Called when COPY TO is started.
+     *
+     * `tupDesc` is the tuple descriptor of the relation from where the data
+     * is read.
+     */
+    void        (*CopyToStart) (CopyToState cstate, TupleDesc tupDesc);
+
+    /*
+     * Copy one row for COPY TO.
+     *
+     * `slot` is the tuple slot where the data is emitted.
+     */
+    void        (*CopyToOneRow) (CopyToState cstate, TupleTableSlot *slot);
+
+    /* Called when COPY TO has ended */
+    void        (*CopyToEnd) (CopyToState cstate);
+} CopyToRoutine;
+
+#endif                            /* COPYAPI_H */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index cad52fcc783..509b9e92a18 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -15,6 +15,7 @@
 #define COPYFROM_INTERNAL_H
 
 #include "commands/copy.h"
+#include "commands/copyapi.h"
 #include "commands/trigger.h"
 #include "nodes/miscnodes.h"
 
@@ -58,6 +59,9 @@ typedef enum CopyInsertMethod
  */
 typedef struct CopyFromStateData
 {
+    /* format routine */
+    const CopyFromRoutine *routine;
+
     /* low-level state data */
     CopySource    copy_src;        /* type of copy source */
     FILE       *copy_file;        /* used if copy_src == COPY_FILE */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 5fabb127d7e..4c4bf60d9e5 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -492,6 +492,7 @@ ConvertRowtypeExpr
 CookedConstraint
 CopyDest
 CopyFormatOptions
+CopyFromRoutine
 CopyFromState
 CopyFromStateData
 CopyHeaderChoice
@@ -503,6 +504,7 @@ CopyMultiInsertInfo
 CopyOnErrorChoice
 CopySource
 CopyStmt
+CopyToRoutine
 CopyToState
 CopyToStateData
 Cost
-- 
2.45.2
From daca92389cfea80c22d95ae64b9bc1c11751eb52 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Tue, 23 Jul 2024 16:44:44 +0900
Subject: [PATCH v20 2/5] Use CopyToRoutine/CopyFromRountine for the existing
 formats
The existing formats are text, csv and binary. If we find any
performance regression by this, we will not merge this to master.
This will increase indirect function call costs but this will reduce
runtime "if (cstate->opts.binary)" and "if (cstate->opts.csv_mode)"
branch costs.
This uses an optimization based of static inline function and a
constant argument call for cstate->opts.csv_mode. For example,
CopyFromTextLikeOneRow() uses this optimization. It accepts the "bool
is_csv" argument instead of using cstate->opts.csv_mode in
it. CopyFromTextOneRow() calls CopyFromTextLikeOneRow() with
false (constant) for "bool is_csv". Compiler will remove "if (is_csv)"
branch in it by this optimization.
This doesn't change existing logic. This just moves existing codes.
---
 src/backend/commands/copyfrom.c          | 215 ++++++---
 src/backend/commands/copyfromparse.c     | 550 +++++++++++++----------
 src/backend/commands/copyto.c            | 477 +++++++++++++-------
 src/include/commands/copy.h              |   2 -
 src/include/commands/copyfrom_internal.h |   8 +
 5 files changed, 806 insertions(+), 446 deletions(-)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index c42485bd9cb..14f95f17124 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -106,6 +106,157 @@ typedef struct CopyMultiInsertInfo
 /* non-export function prototypes */
 static void ClosePipeFromProgram(CopyFromState cstate);
 
+
+/*
+ * CopyFromRoutine implementations for text and CSV.
+ */
+
+/*
+ * CopyFromTextLikeInFunc
+ *
+ * Assign input function data for a relation's attribute in text/CSV format.
+ */
+static void
+CopyFromTextLikeInFunc(CopyFromState cstate, Oid atttypid,
+                       FmgrInfo *finfo, Oid *typioparam)
+{
+    Oid            func_oid;
+
+    getTypeInputInfo(atttypid, &func_oid, typioparam);
+    fmgr_info(func_oid, finfo);
+}
+
+/*
+ * CopyFromTextLikeStart
+ *
+ * Start of COPY FROM for text/CSV format.
+ */
+static void
+CopyFromTextLikeStart(CopyFromState cstate, TupleDesc tupDesc)
+{
+    AttrNumber    attr_count;
+
+    /*
+     * If encoding conversion is needed, we need another buffer to hold the
+     * converted input data.  Otherwise, we can just point input_buf to the
+     * same buffer as raw_buf.
+     */
+    if (cstate->need_transcoding)
+    {
+        cstate->input_buf = (char *) palloc(INPUT_BUF_SIZE + 1);
+        cstate->input_buf_index = cstate->input_buf_len = 0;
+    }
+    else
+        cstate->input_buf = cstate->raw_buf;
+    cstate->input_reached_eof = false;
+
+    initStringInfo(&cstate->line_buf);
+
+    /*
+     * Create workspace for CopyReadAttributes results; used by CSV and text
+     * format.
+     */
+    attr_count = list_length(cstate->attnumlist);
+    cstate->max_fields = attr_count;
+    cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *));
+}
+
+/*
+ * CopyFromTextLikeEnd
+ *
+ * End of COPY FROM for text/CSV format.
+ */
+static void
+CopyFromTextLikeEnd(CopyFromState cstate)
+{
+    /* nothing to do */
+}
+
+/*
+ * CopyFromRoutine implementation for "binary".
+ */
+
+/*
+ * CopyFromBinaryInFunc
+ *
+ * Assign input function data for a relation's attribute in binary format.
+ */
+static void
+CopyFromBinaryInFunc(CopyFromState cstate, Oid atttypid,
+                     FmgrInfo *finfo, Oid *typioparam)
+{
+    Oid            func_oid;
+
+    getTypeBinaryInputInfo(atttypid, &func_oid, typioparam);
+    fmgr_info(func_oid, finfo);
+}
+
+/*
+ * CopyFromBinaryStart
+ *
+ * Start of COPY FROM for binary format.
+ */
+static void
+CopyFromBinaryStart(CopyFromState cstate, TupleDesc tupDesc)
+{
+    /* Read and verify binary header */
+    ReceiveCopyBinaryHeader(cstate);
+}
+
+/*
+ * CopyFromBinaryEnd
+ *
+ * End of COPY FROM for binary format.
+ */
+static void
+CopyFromBinaryEnd(CopyFromState cstate)
+{
+    /* nothing to do */
+}
+
+/*
+ * Routines assigned to each format.
++
+ * CSV and text share the same implementation, at the exception of the
+ * per-row callback.
+ */
+static const CopyFromRoutine CopyFromRoutineText = {
+    .CopyFromInFunc = CopyFromTextLikeInFunc,
+    .CopyFromStart = CopyFromTextLikeStart,
+    .CopyFromOneRow = CopyFromTextOneRow,
+    .CopyFromEnd = CopyFromTextLikeEnd,
+};
+
+static const CopyFromRoutine CopyFromRoutineCSV = {
+    .CopyFromInFunc = CopyFromTextLikeInFunc,
+    .CopyFromStart = CopyFromTextLikeStart,
+    .CopyFromOneRow = CopyFromCSVOneRow,
+    .CopyFromEnd = CopyFromTextLikeEnd,
+};
+
+static const CopyFromRoutine CopyFromRoutineBinary = {
+    .CopyFromInFunc = CopyFromBinaryInFunc,
+    .CopyFromStart = CopyFromBinaryStart,
+    .CopyFromOneRow = CopyFromBinaryOneRow,
+    .CopyFromEnd = CopyFromBinaryEnd,
+};
+
+/*
+ * Define the COPY FROM routines to use for a format.
+ */
+static const CopyFromRoutine *
+CopyFromGetRoutine(CopyFormatOptions opts)
+{
+    if (opts.csv_mode)
+        return &CopyFromRoutineCSV;
+    else if (opts.binary)
+        return &CopyFromRoutineBinary;
+
+    /* default is text */
+    return &CopyFromRoutineText;
+}
+
+
 /*
  * error context callback for COPY FROM
  *
@@ -1393,7 +1544,6 @@ BeginCopyFrom(ParseState *pstate,
                 num_defaults;
     FmgrInfo   *in_functions;
     Oid           *typioparams;
-    Oid            in_func_oid;
     int           *defmap;
     ExprState **defexprs;
     MemoryContext oldcontext;
@@ -1425,6 +1575,9 @@ BeginCopyFrom(ParseState *pstate,
     /* Extract options from the statement node tree */
     ProcessCopyOptions(pstate, &cstate->opts, true /* is_from */ , options);
 
+    /* Set format routine */
+    cstate->routine = CopyFromGetRoutine(cstate->opts);
+
     /* Process the target relation */
     cstate->rel = rel;
 
@@ -1580,25 +1733,6 @@ BeginCopyFrom(ParseState *pstate,
     cstate->raw_buf_index = cstate->raw_buf_len = 0;
     cstate->raw_reached_eof = false;
 
-    if (!cstate->opts.binary)
-    {
-        /*
-         * If encoding conversion is needed, we need another buffer to hold
-         * the converted input data.  Otherwise, we can just point input_buf
-         * to the same buffer as raw_buf.
-         */
-        if (cstate->need_transcoding)
-        {
-            cstate->input_buf = (char *) palloc(INPUT_BUF_SIZE + 1);
-            cstate->input_buf_index = cstate->input_buf_len = 0;
-        }
-        else
-            cstate->input_buf = cstate->raw_buf;
-        cstate->input_reached_eof = false;
-
-        initStringInfo(&cstate->line_buf);
-    }
-
     initStringInfo(&cstate->attribute_buf);
 
     /* Assign range table and rteperminfos, we'll need them in CopyFrom. */
@@ -1631,23 +1765,9 @@ BeginCopyFrom(ParseState *pstate,
             continue;
 
         /* Fetch the input function and typioparam info */
-        if (cstate->opts.binary)
-        {
-            getTypeBinaryInputInfo(att->atttypid,
-                                   &in_func_oid, &typioparams[attnum - 1]);
-            fmgr_info(in_func_oid, &in_functions[attnum - 1]);
-        }
-        else if (cstate->routine)
-            cstate->routine->CopyFromInFunc(cstate, att->atttypid,
-                                            &in_functions[attnum - 1],
-                                            &typioparams[attnum - 1]);
-
-        else
-        {
-            getTypeInputInfo(att->atttypid,
-                             &in_func_oid, &typioparams[attnum - 1]);
-            fmgr_info(in_func_oid, &in_functions[attnum - 1]);
-        }
+        cstate->routine->CopyFromInFunc(cstate, att->atttypid,
+                                        &in_functions[attnum - 1],
+                                        &typioparams[attnum - 1]);
 
         /* Get default info if available */
         defexprs[attnum - 1] = NULL;
@@ -1782,23 +1902,7 @@ BeginCopyFrom(ParseState *pstate,
 
     pgstat_progress_update_multi_param(3, progress_cols, progress_vals);
 
-    if (cstate->opts.binary)
-    {
-        /* Read and verify binary header */
-        ReceiveCopyBinaryHeader(cstate);
-    }
-    else if (cstate->routine)
-    {
-        cstate->routine->CopyFromStart(cstate, tupDesc);
-    }
-    else
-    {
-        /* create workspace for CopyReadAttributes results */
-        AttrNumber    attr_count = list_length(cstate->attnumlist);
-
-        cstate->max_fields = attr_count;
-        cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *));
-    }
+    cstate->routine->CopyFromStart(cstate, tupDesc);
 
     MemoryContextSwitchTo(oldcontext);
 
@@ -1811,8 +1915,7 @@ BeginCopyFrom(ParseState *pstate,
 void
 EndCopyFrom(CopyFromState cstate)
 {
-    if (cstate->routine)
-        cstate->routine->CopyFromEnd(cstate);
+    cstate->routine->CopyFromEnd(cstate);
 
     /* No COPY FROM related resources except memory. */
     if (cstate->is_program)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 2e126448019..5f63b683d17 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -149,8 +149,8 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 
 
 /* non-export function prototypes */
-static bool CopyReadLine(CopyFromState cstate);
-static bool CopyReadLineText(CopyFromState cstate);
+static bool CopyReadLine(CopyFromState cstate, bool is_csv);
+static inline bool CopyReadLineText(CopyFromState cstate, bool is_csv);
 static int    CopyReadAttributesText(CopyFromState cstate);
 static int    CopyReadAttributesCSV(CopyFromState cstate);
 static Datum CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
@@ -750,8 +750,8 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes)
  *
  * NOTE: force_not_null option are not applied to the returned fields.
  */
-bool
-NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
+static inline bool
+NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields, bool is_csv)
 {
     int            fldct;
     bool        done;
@@ -768,13 +768,17 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
         tupDesc = RelationGetDescr(cstate->rel);
 
         cstate->cur_lineno++;
-        done = CopyReadLine(cstate);
+        done = CopyReadLine(cstate, is_csv);
 
         if (cstate->opts.header_line == COPY_HEADER_MATCH)
         {
             int            fldnum;
 
-            if (cstate->opts.csv_mode)
+            /*
+             * is_csv will be optimized away by compiler, as argument is
+             * constant at caller.
+             */
+            if (is_csv)
                 fldct = CopyReadAttributesCSV(cstate);
             else
                 fldct = CopyReadAttributesText(cstate);
@@ -818,7 +822,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
     cstate->cur_lineno++;
 
     /* Actually read the line into memory here */
-    done = CopyReadLine(cstate);
+    done = CopyReadLine(cstate, is_csv);
 
     /*
      * EOF at start of line means we're done.  If we see EOF after some
@@ -828,8 +832,13 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
     if (done && cstate->line_buf.len == 0)
         return false;
 
-    /* Parse the line into de-escaped field values */
-    if (cstate->opts.csv_mode)
+    /*
+     * Parse the line into de-escaped field values
+     *
+     * is_csv will be optimized away by compiler, as argument is constant at
+     * caller.
+     */
+    if (is_csv)
         fldct = CopyReadAttributesCSV(cstate);
     else
         fldct = CopyReadAttributesText(cstate);
@@ -839,6 +848,267 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
     return true;
 }
 
+/*
+ * CopyFromTextLikeOneRow
+ *
+ * Copy one row to a set of `values` and `nulls` for the text and CSV
+ * formats.
+ *
+ * Workhorse for CopyFromTextOneRow() and CopyFromCSVOneRow().
+ */
+static inline bool
+CopyFromTextLikeOneRow(CopyFromState cstate,
+                       ExprContext *econtext,
+                       Datum *values,
+                       bool *nulls,
+                       bool is_csv)
+{
+    TupleDesc    tupDesc;
+    AttrNumber    attr_count;
+    FmgrInfo   *in_functions = cstate->in_functions;
+    Oid           *typioparams = cstate->typioparams;
+    ExprState **defexprs = cstate->defexprs;
+    char      **field_strings;
+    ListCell   *cur;
+    int            fldct;
+    int            fieldno;
+    char       *string;
+
+    tupDesc = RelationGetDescr(cstate->rel);
+    attr_count = list_length(cstate->attnumlist);
+
+    /* read raw fields in the next line */
+    if (!NextCopyFromRawFields(cstate, &field_strings, &fldct, is_csv))
+        return false;
+
+    /* check for overflowing fields */
+    if (attr_count > 0 && fldct > attr_count)
+        ereport(ERROR,
+                (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                 errmsg("extra data after last expected column")));
+
+    fieldno = 0;
+
+    /* Loop to read the user attributes on the line. */
+    foreach(cur, cstate->attnumlist)
+    {
+        int            attnum = lfirst_int(cur);
+        int            m = attnum - 1;
+        Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+        if (fieldno >= fldct)
+            ereport(ERROR,
+                    (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                     errmsg("missing data for column \"%s\"",
+                            NameStr(att->attname))));
+        string = field_strings[fieldno++];
+
+        if (cstate->convert_select_flags &&
+            !cstate->convert_select_flags[m])
+        {
+            /* ignore input field, leaving column as NULL */
+            continue;
+        }
+
+        if (is_csv)
+        {
+            if (string == NULL &&
+                cstate->opts.force_notnull_flags[m])
+            {
+                /*
+                 * FORCE_NOT_NULL option is set and column is NULL - convert
+                 * it to the NULL string.
+                 */
+                string = cstate->opts.null_print;
+            }
+            else if (string != NULL && cstate->opts.force_null_flags[m]
+                     && strcmp(string, cstate->opts.null_print) == 0)
+            {
+                /*
+                 * FORCE_NULL option is set and column matches the NULL
+                 * string. It must have been quoted, or otherwise the string
+                 * would already have been set to NULL. Convert it to NULL as
+                 * specified.
+                 */
+                string = NULL;
+            }
+        }
+
+        cstate->cur_attname = NameStr(att->attname);
+        cstate->cur_attval = string;
+
+        if (string != NULL)
+            nulls[m] = false;
+
+        if (cstate->defaults[m])
+        {
+            /*
+             * The caller must supply econtext and have switched into the
+             * per-tuple memory context in it.
+             */
+            Assert(econtext != NULL);
+            Assert(CurrentMemoryContext == econtext->ecxt_per_tuple_memory);
+
+            values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]);
+        }
+
+        /*
+         * If ON_ERROR is specified with IGNORE, skip rows with soft errors
+         */
+        else if (!InputFunctionCallSafe(&in_functions[m],
+                                        string,
+                                        typioparams[m],
+                                        att->atttypmod,
+                                        (Node *) cstate->escontext,
+                                        &values[m]))
+        {
+            Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
+
+            cstate->num_errors++;
+
+            if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+            {
+                /*
+                 * Since we emit line number and column info in the below
+                 * notice message, we suppress error context information other
+                 * than the relation name.
+                 */
+                Assert(!cstate->relname_only);
+                cstate->relname_only = true;
+
+                if (cstate->cur_attval)
+                {
+                    char       *attval;
+
+                    attval = CopyLimitPrintoutLength(cstate->cur_attval);
+                    ereport(NOTICE,
+                            errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\":
\"%s\"",
+                                   (unsigned long long) cstate->cur_lineno,
+                                   cstate->cur_attname,
+                                   attval));
+                    pfree(attval);
+                }
+                else
+                    ereport(NOTICE,
+                            errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": null
input",
+                                   (unsigned long long) cstate->cur_lineno,
+                                   cstate->cur_attname));
+
+                /* reset relname_only */
+                cstate->relname_only = false;
+            }
+
+            return true;
+        }
+
+        cstate->cur_attname = NULL;
+        cstate->cur_attval = NULL;
+    }
+
+    Assert(fieldno == attr_count);
+
+    return true;
+}
+
+
+/*
+ * CopyFromTextOneRow
+ *
+ * Per-row callback for COPY FROM with text format.
+ */
+bool
+CopyFromTextOneRow(CopyFromState cstate,
+                   ExprContext *econtext,
+                   Datum *values,
+                   bool *nulls)
+{
+    return CopyFromTextLikeOneRow(cstate, econtext, values, nulls, false);
+}
+
+/*
+ * CopyFromCSVOneRow
+ *
+ * Per-row callback for COPY FROM with CSV format.
+ */
+bool
+CopyFromCSVOneRow(CopyFromState cstate,
+                  ExprContext *econtext,
+                  Datum *values,
+                  bool *nulls)
+{
+    return CopyFromTextLikeOneRow(cstate, econtext, values, nulls, true);
+}
+
+/*
+ * CopyFromBinaryOneRow
+ *
+ * Copy one row to a set of `values` and `nulls` for the binary format.
+ */
+bool
+CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext,
+                     Datum *values, bool *nulls)
+{
+    TupleDesc    tupDesc;
+    AttrNumber    attr_count;
+    FmgrInfo   *in_functions = cstate->in_functions;
+    Oid           *typioparams = cstate->typioparams;
+    int16        fld_count;
+    ListCell   *cur;
+
+    tupDesc = RelationGetDescr(cstate->rel);
+    attr_count = list_length(cstate->attnumlist);
+
+    cstate->cur_lineno++;
+
+    if (!CopyGetInt16(cstate, &fld_count))
+    {
+        /* EOF detected (end of file, or protocol-level EOF) */
+        return false;
+    }
+
+    if (fld_count == -1)
+    {
+        /*
+         * Received EOF marker.  Wait for the protocol-level EOF, and complain
+         * if it doesn't come immediately.  In COPY FROM STDIN, this ensures
+         * that we correctly handle CopyFail, if client chooses to send that
+         * now.  When copying from file, we could ignore the rest of the file
+         * like in text mode, but we choose to be consistent with the COPY
+         * FROM STDIN case.
+         */
+        char        dummy;
+
+        if (CopyReadBinaryData(cstate, &dummy, 1) > 0)
+            ereport(ERROR,
+                    (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                     errmsg("received copy data after EOF marker")));
+        return false;
+    }
+
+    if (fld_count != attr_count)
+        ereport(ERROR,
+                (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                 errmsg("row field count is %d, expected %d",
+                        (int) fld_count, attr_count)));
+
+    foreach(cur, cstate->attnumlist)
+    {
+        int            attnum = lfirst_int(cur);
+        int            m = attnum - 1;
+        Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+        cstate->cur_attname = NameStr(att->attname);
+        values[m] = CopyReadBinaryAttribute(cstate,
+                                            &in_functions[m],
+                                            typioparams[m],
+                                            att->atttypmod,
+                                            &nulls[m]);
+        cstate->cur_attname = NULL;
+    }
+
+    return true;
+}
+
 /*
  * Read next tuple from file for COPY FROM. Return false if no more tuples.
  *
@@ -856,221 +1126,21 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 {
     TupleDesc    tupDesc;
     AttrNumber    num_phys_attrs,
-                attr_count,
                 num_defaults = cstate->num_defaults;
-    FmgrInfo   *in_functions = cstate->in_functions;
-    Oid           *typioparams = cstate->typioparams;
     int            i;
     int           *defmap = cstate->defmap;
     ExprState **defexprs = cstate->defexprs;
 
     tupDesc = RelationGetDescr(cstate->rel);
     num_phys_attrs = tupDesc->natts;
-    attr_count = list_length(cstate->attnumlist);
 
     /* Initialize all values for row to NULL */
     MemSet(values, 0, num_phys_attrs * sizeof(Datum));
     MemSet(nulls, true, num_phys_attrs * sizeof(bool));
     MemSet(cstate->defaults, false, num_phys_attrs * sizeof(bool));
 
-    if (!cstate->opts.binary)
-    {
-        char      **field_strings;
-        ListCell   *cur;
-        int            fldct;
-        int            fieldno;
-        char       *string;
-
-        /* read raw fields in the next line */
-        if (!NextCopyFromRawFields(cstate, &field_strings, &fldct))
-            return false;
-
-        /* check for overflowing fields */
-        if (attr_count > 0 && fldct > attr_count)
-            ereport(ERROR,
-                    (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                     errmsg("extra data after last expected column")));
-
-        fieldno = 0;
-
-        /* Loop to read the user attributes on the line. */
-        foreach(cur, cstate->attnumlist)
-        {
-            int            attnum = lfirst_int(cur);
-            int            m = attnum - 1;
-            Form_pg_attribute att = TupleDescAttr(tupDesc, m);
-
-            if (fieldno >= fldct)
-                ereport(ERROR,
-                        (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                         errmsg("missing data for column \"%s\"",
-                                NameStr(att->attname))));
-            string = field_strings[fieldno++];
-
-            if (cstate->convert_select_flags &&
-                !cstate->convert_select_flags[m])
-            {
-                /* ignore input field, leaving column as NULL */
-                continue;
-            }
-
-            if (cstate->opts.csv_mode)
-            {
-                if (string == NULL &&
-                    cstate->opts.force_notnull_flags[m])
-                {
-                    /*
-                     * FORCE_NOT_NULL option is set and column is NULL -
-                     * convert it to the NULL string.
-                     */
-                    string = cstate->opts.null_print;
-                }
-                else if (string != NULL && cstate->opts.force_null_flags[m]
-                         && strcmp(string, cstate->opts.null_print) == 0)
-                {
-                    /*
-                     * FORCE_NULL option is set and column matches the NULL
-                     * string. It must have been quoted, or otherwise the
-                     * string would already have been set to NULL. Convert it
-                     * to NULL as specified.
-                     */
-                    string = NULL;
-                }
-            }
-
-            cstate->cur_attname = NameStr(att->attname);
-            cstate->cur_attval = string;
-
-            if (string != NULL)
-                nulls[m] = false;
-
-            if (cstate->defaults[m])
-            {
-                /*
-                 * The caller must supply econtext and have switched into the
-                 * per-tuple memory context in it.
-                 */
-                Assert(econtext != NULL);
-                Assert(CurrentMemoryContext == econtext->ecxt_per_tuple_memory);
-
-                values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]);
-            }
-
-            /*
-             * If ON_ERROR is specified with IGNORE, skip rows with soft
-             * errors
-             */
-            else if (!InputFunctionCallSafe(&in_functions[m],
-                                            string,
-                                            typioparams[m],
-                                            att->atttypmod,
-                                            (Node *) cstate->escontext,
-                                            &values[m]))
-            {
-                Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
-
-                cstate->num_errors++;
-
-                if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
-                {
-                    /*
-                     * Since we emit line number and column info in the below
-                     * notice message, we suppress error context information
-                     * other than the relation name.
-                     */
-                    Assert(!cstate->relname_only);
-                    cstate->relname_only = true;
-
-                    if (cstate->cur_attval)
-                    {
-                        char       *attval;
-
-                        attval = CopyLimitPrintoutLength(cstate->cur_attval);
-                        ereport(NOTICE,
-                                errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\":
\"%s\"",
-                                       (unsigned long long) cstate->cur_lineno,
-                                       cstate->cur_attname,
-                                       attval));
-                        pfree(attval);
-                    }
-                    else
-                        ereport(NOTICE,
-                                errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\":
nullinput",
 
-                                       (unsigned long long) cstate->cur_lineno,
-                                       cstate->cur_attname));
-
-                    /* reset relname_only */
-                    cstate->relname_only = false;
-                }
-
-                return true;
-            }
-
-            cstate->cur_attname = NULL;
-            cstate->cur_attval = NULL;
-        }
-
-        Assert(fieldno == attr_count);
-    }
-    else if (cstate->routine)
-    {
-        if (!cstate->routine->CopyFromOneRow(cstate, econtext, values, nulls))
-            return false;
-    }
-    else
-    {
-        /* binary */
-        int16        fld_count;
-        ListCell   *cur;
-
-        cstate->cur_lineno++;
-
-        if (!CopyGetInt16(cstate, &fld_count))
-        {
-            /* EOF detected (end of file, or protocol-level EOF) */
-            return false;
-        }
-
-        if (fld_count == -1)
-        {
-            /*
-             * Received EOF marker.  Wait for the protocol-level EOF, and
-             * complain if it doesn't come immediately.  In COPY FROM STDIN,
-             * this ensures that we correctly handle CopyFail, if client
-             * chooses to send that now.  When copying from file, we could
-             * ignore the rest of the file like in text mode, but we choose to
-             * be consistent with the COPY FROM STDIN case.
-             */
-            char        dummy;
-
-            if (CopyReadBinaryData(cstate, &dummy, 1) > 0)
-                ereport(ERROR,
-                        (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                         errmsg("received copy data after EOF marker")));
-            return false;
-        }
-
-        if (fld_count != attr_count)
-            ereport(ERROR,
-                    (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                     errmsg("row field count is %d, expected %d",
-                            (int) fld_count, attr_count)));
-
-        foreach(cur, cstate->attnumlist)
-        {
-            int            attnum = lfirst_int(cur);
-            int            m = attnum - 1;
-            Form_pg_attribute att = TupleDescAttr(tupDesc, m);
-
-            cstate->cur_attname = NameStr(att->attname);
-            values[m] = CopyReadBinaryAttribute(cstate,
-                                                &in_functions[m],
-                                                typioparams[m],
-                                                att->atttypmod,
-                                                &nulls[m]);
-            cstate->cur_attname = NULL;
-        }
-    }
+    if (!cstate->routine->CopyFromOneRow(cstate, econtext, values, nulls))
+        return false;
 
     /*
      * Now compute and insert any defaults available for the columns not
@@ -1101,7 +1171,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
  * in the final value of line_buf.
  */
 static bool
-CopyReadLine(CopyFromState cstate)
+CopyReadLine(CopyFromState cstate, bool is_csv)
 {
     bool        result;
 
@@ -1109,7 +1179,7 @@ CopyReadLine(CopyFromState cstate)
     cstate->line_buf_valid = false;
 
     /* Parse data and transfer into line_buf */
-    result = CopyReadLineText(cstate);
+    result = CopyReadLineText(cstate, is_csv);
 
     if (result)
     {
@@ -1176,8 +1246,8 @@ CopyReadLine(CopyFromState cstate)
 /*
  * CopyReadLineText - inner loop of CopyReadLine for text mode
  */
-static bool
-CopyReadLineText(CopyFromState cstate)
+static inline bool
+CopyReadLineText(CopyFromState cstate, bool is_csv)
 {
     char       *copy_input_buf;
     int            input_buf_ptr;
@@ -1193,7 +1263,11 @@ CopyReadLineText(CopyFromState cstate)
     char        quotec = '\0';
     char        escapec = '\0';
 
-    if (cstate->opts.csv_mode)
+    /*
+     * is_csv will be optimized away by compiler, as argument is constant at
+     * caller.
+     */
+    if (is_csv)
     {
         quotec = cstate->opts.quote[0];
         escapec = cstate->opts.escape[0];
@@ -1270,7 +1344,11 @@ CopyReadLineText(CopyFromState cstate)
         prev_raw_ptr = input_buf_ptr;
         c = copy_input_buf[input_buf_ptr++];
 
-        if (cstate->opts.csv_mode)
+        /*
+         * is_csv will be optimized away by compiler, as argument is constant
+         * at caller.
+         */
+        if (is_csv)
         {
             /*
              * If character is '\\' or '\r', we may need to look ahead below.
@@ -1309,7 +1387,7 @@ CopyReadLineText(CopyFromState cstate)
         }
 
         /* Process \r */
-        if (c == '\r' && (!cstate->opts.csv_mode || !in_quote))
+        if (c == '\r' && (!is_csv || !in_quote))
         {
             /* Check for \r\n on first line, _and_ handle \r\n. */
             if (cstate->eol_type == EOL_UNKNOWN ||
@@ -1337,10 +1415,10 @@ CopyReadLineText(CopyFromState cstate)
                     if (cstate->eol_type == EOL_CRNL)
                         ereport(ERROR,
                                 (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                                 !cstate->opts.csv_mode ?
+                                 !is_csv ?
                                  errmsg("literal carriage return found in data") :
                                  errmsg("unquoted carriage return found in data"),
-                                 !cstate->opts.csv_mode ?
+                                 !is_csv ?
                                  errhint("Use \"\\r\" to represent carriage return.") :
                                  errhint("Use quoted CSV field to represent carriage return.")));
 
@@ -1354,10 +1432,10 @@ CopyReadLineText(CopyFromState cstate)
             else if (cstate->eol_type == EOL_NL)
                 ereport(ERROR,
                         (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                         !cstate->opts.csv_mode ?
+                         !is_csv ?
                          errmsg("literal carriage return found in data") :
                          errmsg("unquoted carriage return found in data"),
-                         !cstate->opts.csv_mode ?
+                         !is_csv ?
                          errhint("Use \"\\r\" to represent carriage return.") :
                          errhint("Use quoted CSV field to represent carriage return.")));
             /* If reach here, we have found the line terminator */
@@ -1365,15 +1443,15 @@ CopyReadLineText(CopyFromState cstate)
         }
 
         /* Process \n */
-        if (c == '\n' && (!cstate->opts.csv_mode || !in_quote))
+        if (c == '\n' && (!is_csv || !in_quote))
         {
             if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL)
                 ereport(ERROR,
                         (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                         !cstate->opts.csv_mode ?
+                         !is_csv ?
                          errmsg("literal newline found in data") :
                          errmsg("unquoted newline found in data"),
-                         !cstate->opts.csv_mode ?
+                         !is_csv ?
                          errhint("Use \"\\n\" to represent newline.") :
                          errhint("Use quoted CSV field to represent newline.")));
             cstate->eol_type = EOL_NL;    /* in case not set yet */
@@ -1385,7 +1463,7 @@ CopyReadLineText(CopyFromState cstate)
          * In CSV mode, we only recognize \. alone on a line.  This is because
          * \. is a valid CSV data value.
          */
-        if (c == '\\' && (!cstate->opts.csv_mode || first_char_in_line))
+        if (c == '\\' && (!is_csv || first_char_in_line))
         {
             char        c2;
 
@@ -1418,7 +1496,11 @@ CopyReadLineText(CopyFromState cstate)
 
                     if (c2 == '\n')
                     {
-                        if (!cstate->opts.csv_mode)
+                        /*
+                         * is_csv will be optimized away by compiler, as
+                         * argument is constant at caller.
+                         */
+                        if (!is_csv)
                             ereport(ERROR,
                                     (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
                                      errmsg("end-of-copy marker does not match previous newline style")));
@@ -1427,7 +1509,11 @@ CopyReadLineText(CopyFromState cstate)
                     }
                     else if (c2 != '\r')
                     {
-                        if (!cstate->opts.csv_mode)
+                        /*
+                         * is_csv will be optimized away by compiler, as
+                         * argument is constant at caller.
+                         */
+                        if (!is_csv)
                             ereport(ERROR,
                                     (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
                                      errmsg("end-of-copy marker corrupt")));
@@ -1443,7 +1529,11 @@ CopyReadLineText(CopyFromState cstate)
 
                 if (c2 != '\r' && c2 != '\n')
                 {
-                    if (!cstate->opts.csv_mode)
+                    /*
+                     * is_csv will be optimized away by compiler, as argument
+                     * is constant at caller.
+                     */
+                    if (!is_csv)
                         ereport(ERROR,
                                 (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
                                  errmsg("end-of-copy marker corrupt")));
@@ -1472,7 +1562,7 @@ CopyReadLineText(CopyFromState cstate)
                 result = true;    /* report EOF */
                 break;
             }
-            else if (!cstate->opts.csv_mode)
+            else if (!is_csv)
             {
                 /*
                  * If we are here, it means we found a backslash followed by
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 3c5a97679aa..86dc1b742cc 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -128,6 +128,317 @@ static void CopySendEndOfRow(CopyToState cstate);
 static void CopySendInt32(CopyToState cstate, int32 val);
 static void CopySendInt16(CopyToState cstate, int16 val);
 
+/*
+ * CopyToRoutine implementations.
+ */
+
+/*
+ * CopyToTextLikeSendEndOfRow
+ *
+ * Apply line terminations for a line sent in text or CSV format depending
+ * on the destination, then send the end of a row.
+ */
+static inline void
+CopyToTextLikeSendEndOfRow(CopyToState cstate)
+{
+    switch (cstate->copy_dest)
+    {
+        case COPY_FILE:
+            /* Default line termination depends on platform */
+#ifndef WIN32
+            CopySendChar(cstate, '\n');
+#else
+            CopySendString(cstate, "\r\n");
+#endif
+            break;
+        case COPY_FRONTEND:
+            /* The FE/BE protocol uses \n as newline for all platforms */
+            CopySendChar(cstate, '\n');
+            break;
+        default:
+            break;
+    }
+
+    /* Now take the actions related to the end of a row */
+    CopySendEndOfRow(cstate);
+}
+
+/*
+ * CopyToTextLikeStart
+ *
+ * Start of COPY TO for text and CSV format.
+ */
+static void
+CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
+{
+    /*
+     * For non-binary copy, we need to convert null_print to file encoding,
+     * because it will be sent directly with CopySendString.
+     */
+    if (cstate->need_transcoding)
+        cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print,
+                                                          cstate->opts.null_print_len,
+                                                          cstate->file_encoding);
+
+    /* if a header has been requested send the line */
+    if (cstate->opts.header_line)
+    {
+        ListCell   *cur;
+        bool        hdr_delim = false;
+
+        foreach(cur, cstate->attnumlist)
+        {
+            int            attnum = lfirst_int(cur);
+            char       *colname;
+
+            if (hdr_delim)
+                CopySendChar(cstate, cstate->opts.delim[0]);
+            hdr_delim = true;
+
+            colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
+
+            if (cstate->opts.csv_mode)
+                CopyAttributeOutCSV(cstate, colname, false);
+            else
+                CopyAttributeOutText(cstate, colname);
+        }
+
+        CopyToTextLikeSendEndOfRow(cstate);
+    }
+}
+
+/*
+ * CopyToTextLikeOutFunc
+ *
+ * Assign output function data for a relation's attribute in text/CSV format.
+ */
+static void
+CopyToTextLikeOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo)
+{
+    Oid            func_oid;
+    bool        is_varlena;
+
+    /* Set output function for an attribute */
+    getTypeOutputInfo(atttypid, &func_oid, &is_varlena);
+    fmgr_info(func_oid, finfo);
+}
+
+
+/*
+ * CopyToTextLikeOneRow
+ *
+ * Process one row for text/CSV format.
+ *
+ * Workhorse for CopyToTextOneRow() and CopyToCSVOneRow().
+ */
+static inline void
+CopyToTextLikeOneRow(CopyToState cstate,
+                     TupleTableSlot *slot,
+                     bool is_csv)
+{
+    bool        need_delim = false;
+    FmgrInfo   *out_functions = cstate->out_functions;
+
+    foreach_int(attnum, cstate->attnumlist)
+    {
+        Datum        value = slot->tts_values[attnum - 1];
+        bool        isnull = slot->tts_isnull[attnum - 1];
+
+        if (need_delim)
+            CopySendChar(cstate, cstate->opts.delim[0]);
+        need_delim = true;
+
+        if (isnull)
+        {
+            CopySendString(cstate, cstate->opts.null_print_client);
+        }
+        else
+        {
+            char       *string;
+
+            string = OutputFunctionCall(&out_functions[attnum - 1],
+                                        value);
+
+            /*
+             * is_csv will be optimized away by compiler, as argument is
+             * constant at caller.
+             */
+            if (is_csv)
+                CopyAttributeOutCSV(cstate, string,
+                                    cstate->opts.force_quote_flags[attnum - 1]);
+            else
+                CopyAttributeOutText(cstate, string);
+        }
+    }
+
+    CopyToTextLikeSendEndOfRow(cstate);
+}
+
+/*
+ * CopyToTextOneRow
+ *
+ * Per-row callback for COPY TO with text format.
+ */
+static void
+CopyToTextOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+    CopyToTextLikeOneRow(cstate, slot, false);
+}
+
+/*
+ * CopyToTextOneRow
+ *
+ * Per-row callback for COPY TO with CSV format.
+ */
+static void
+CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+    CopyToTextLikeOneRow(cstate, slot, true);
+}
+
+/*
+ * CopyToTextLikeEnd
+ *
+ * End of COPY TO for text/CSV format.
+ */
+static void
+CopyToTextLikeEnd(CopyToState cstate)
+{
+    /* Nothing to do here */
+}
+
+/*
+ * CopyToRoutine implementation for "binary".
+ */
+
+/*
+ * CopyToBinaryStart
+ *
+ * Start of COPY TO for binary format.
+ */
+static void
+CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc)
+{
+    /* Generate header for a binary copy */
+    int32        tmp;
+
+    /* Signature */
+    CopySendData(cstate, BinarySignature, 11);
+    /* Flags field */
+    tmp = 0;
+    CopySendInt32(cstate, tmp);
+    /* No header extension */
+    tmp = 0;
+    CopySendInt32(cstate, tmp);
+}
+
+/*
+ * CopyToBinaryOutFunc
+ *
+ * Assign output function data for a relation's attribute in binary format.
+ */
+static void
+CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo)
+{
+    Oid            func_oid;
+    bool        is_varlena;
+
+    /* Set output function for an attribute */
+    getTypeBinaryOutputInfo(atttypid, &func_oid, &is_varlena);
+    fmgr_info(func_oid, finfo);
+}
+
+/*
+ * CopyToBinaryOneRow
+ *
+ * Process one row for binary format.
+ */
+static void
+CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+    FmgrInfo   *out_functions = cstate->out_functions;
+
+    /* Binary per-tuple header */
+    CopySendInt16(cstate, list_length(cstate->attnumlist));
+
+    foreach_int(attnum, cstate->attnumlist)
+    {
+        Datum        value = slot->tts_values[attnum - 1];
+        bool        isnull = slot->tts_isnull[attnum - 1];
+
+        if (isnull)
+        {
+            CopySendInt32(cstate, -1);
+        }
+        else
+        {
+            bytea       *outputbytes;
+
+            outputbytes = SendFunctionCall(&out_functions[attnum - 1],
+                                           value);
+            CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
+            CopySendData(cstate, VARDATA(outputbytes),
+                         VARSIZE(outputbytes) - VARHDRSZ);
+        }
+    }
+
+    CopySendEndOfRow(cstate);
+}
+
+/*
+ * CopyToBinaryEnd
+ *
+ * End of COPY TO for binary format.
+ */
+static void
+CopyToBinaryEnd(CopyToState cstate)
+{
+    /* Generate trailer for a binary copy */
+    CopySendInt16(cstate, -1);
+    /* Need to flush out the trailer */
+    CopySendEndOfRow(cstate);
+}
+
+/*
+ * CSV and text share the same implementation, at the exception of the
+ * output representation and per-row callbacks.
+ */
+static const CopyToRoutine CopyToRoutineText = {
+    .CopyToStart = CopyToTextLikeStart,
+    .CopyToOutFunc = CopyToTextLikeOutFunc,
+    .CopyToOneRow = CopyToTextOneRow,
+    .CopyToEnd = CopyToTextLikeEnd,
+};
+
+static const CopyToRoutine CopyToRoutineCSV = {
+    .CopyToStart = CopyToTextLikeStart,
+    .CopyToOutFunc = CopyToTextLikeOutFunc,
+    .CopyToOneRow = CopyToCSVOneRow,
+    .CopyToEnd = CopyToTextLikeEnd,
+};
+
+static const CopyToRoutine CopyToRoutineBinary = {
+    .CopyToStart = CopyToBinaryStart,
+    .CopyToOutFunc = CopyToBinaryOutFunc,
+    .CopyToOneRow = CopyToBinaryOneRow,
+    .CopyToEnd = CopyToBinaryEnd,
+};
+
+/*
+ * Define the COPY TO routines to use for a format.  This should be called
+ * after options are parsed.
+ */
+static const CopyToRoutine *
+CopyToGetRoutine(CopyFormatOptions opts)
+{
+    if (opts.csv_mode)
+        return &CopyToRoutineCSV;
+    else if (opts.binary)
+        return &CopyToRoutineBinary;
+
+    /* default is text */
+    return &CopyToRoutineText;
+}
 
 /*
  * Send copy start/stop messages for frontend copies.  These have changed
@@ -195,16 +506,6 @@ CopySendEndOfRow(CopyToState cstate)
     switch (cstate->copy_dest)
     {
         case COPY_FILE:
-            if (!cstate->opts.binary)
-            {
-                /* Default line termination depends on platform */
-#ifndef WIN32
-                CopySendChar(cstate, '\n');
-#else
-                CopySendString(cstate, "\r\n");
-#endif
-            }
-
             if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1,
                        cstate->copy_file) != 1 ||
                 ferror(cstate->copy_file))
@@ -239,10 +540,6 @@ CopySendEndOfRow(CopyToState cstate)
             }
             break;
         case COPY_FRONTEND:
-            /* The FE/BE protocol uses \n as newline for all platforms */
-            if (!cstate->opts.binary)
-                CopySendChar(cstate, '\n');
-
             /* Dump the accumulated row as one CopyData message */
             (void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len);
             break;
@@ -430,6 +727,9 @@ BeginCopyTo(ParseState *pstate,
     /* Extract options from the statement node tree */
     ProcessCopyOptions(pstate, &cstate->opts, false /* is_from */ , options);
 
+    /* Set format routine */
+    cstate->routine = CopyToGetRoutine(cstate->opts);
+
     /* Process the source/target relation or query */
     if (rel)
     {
@@ -771,27 +1071,10 @@ DoCopyTo(CopyToState cstate)
     foreach(cur, cstate->attnumlist)
     {
         int            attnum = lfirst_int(cur);
-        Oid            out_func_oid;
-        bool        isvarlena;
         Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
 
-        if (cstate->opts.binary)
-        {
-            getTypeBinaryOutputInfo(attr->atttypid,
-                                    &out_func_oid,
-                                    &isvarlena);
-            fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
-        }
-        else if (cstate->routine)
-            cstate->routine->CopyToOutFunc(cstate, attr->atttypid,
-                                           &cstate->out_functions[attnum - 1]);
-        else
-        {
-            getTypeOutputInfo(attr->atttypid,
-                              &out_func_oid,
-                              &isvarlena);
-            fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
-        }
+        cstate->routine->CopyToOutFunc(cstate, attr->atttypid,
+                                       &cstate->out_functions[attnum - 1]);
     }
 
     /*
@@ -804,58 +1087,7 @@ DoCopyTo(CopyToState cstate)
                                                "COPY TO",
                                                ALLOCSET_DEFAULT_SIZES);
 
-    if (cstate->opts.binary)
-    {
-        /* Generate header for a binary copy */
-        int32        tmp;
-
-        /* Signature */
-        CopySendData(cstate, BinarySignature, 11);
-        /* Flags field */
-        tmp = 0;
-        CopySendInt32(cstate, tmp);
-        /* No header extension */
-        tmp = 0;
-        CopySendInt32(cstate, tmp);
-    }
-    else if (cstate->routine)
-        cstate->routine->CopyToStart(cstate, tupDesc);
-    else
-    {
-        /*
-         * For non-binary copy, we need to convert null_print to file
-         * encoding, because it will be sent directly with CopySendString.
-         */
-        if (cstate->need_transcoding)
-            cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print,
-                                                              cstate->opts.null_print_len,
-                                                              cstate->file_encoding);
-
-        /* if a header has been requested send the line */
-        if (cstate->opts.header_line)
-        {
-            bool        hdr_delim = false;
-
-            foreach(cur, cstate->attnumlist)
-            {
-                int            attnum = lfirst_int(cur);
-                char       *colname;
-
-                if (hdr_delim)
-                    CopySendChar(cstate, cstate->opts.delim[0]);
-                hdr_delim = true;
-
-                colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
-
-                if (cstate->opts.csv_mode)
-                    CopyAttributeOutCSV(cstate, colname, false);
-                else
-                    CopyAttributeOutText(cstate, colname);
-            }
-
-            CopySendEndOfRow(cstate);
-        }
-    }
+    cstate->routine->CopyToStart(cstate, tupDesc);
 
     if (cstate->rel)
     {
@@ -894,15 +1126,7 @@ DoCopyTo(CopyToState cstate)
         processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
     }
 
-    if (cstate->opts.binary)
-    {
-        /* Generate trailer for a binary copy */
-        CopySendInt16(cstate, -1);
-        /* Need to flush out the trailer */
-        CopySendEndOfRow(cstate);
-    }
-    else if (cstate->routine)
-        cstate->routine->CopyToEnd(cstate);
+    cstate->routine->CopyToEnd(cstate);
 
     MemoryContextDelete(cstate->rowcontext);
 
@@ -918,7 +1142,6 @@ DoCopyTo(CopyToState cstate)
 static void
 CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 {
-    FmgrInfo   *out_functions = cstate->out_functions;
     MemoryContext oldcontext;
 
     MemoryContextReset(cstate->rowcontext);
@@ -927,69 +1150,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
     /* Make sure the tuple is fully deconstructed */
     slot_getallattrs(slot);
 
-    if (cstate->routine)
-    {
-        cstate->routine->CopyToOneRow(cstate, slot);
-        MemoryContextSwitchTo(oldcontext);
-        return;
-    }
-
-    if (cstate->opts.binary)
-    {
-        /* Binary per-tuple header */
-        CopySendInt16(cstate, list_length(cstate->attnumlist));
-    }
-
-    if (!cstate->opts.binary)
-    {
-        bool        need_delim = false;
-
-        foreach_int(attnum, cstate->attnumlist)
-        {
-            Datum        value = slot->tts_values[attnum - 1];
-            bool        isnull = slot->tts_isnull[attnum - 1];
-            char       *string;
-
-            if (need_delim)
-                CopySendChar(cstate, cstate->opts.delim[0]);
-            need_delim = true;
-
-            if (isnull)
-                CopySendString(cstate, cstate->opts.null_print_client);
-            else
-            {
-                string = OutputFunctionCall(&out_functions[attnum - 1],
-                                            value);
-                if (cstate->opts.csv_mode)
-                    CopyAttributeOutCSV(cstate, string,
-                                        cstate->opts.force_quote_flags[attnum - 1]);
-                else
-                    CopyAttributeOutText(cstate, string);
-            }
-        }
-    }
-    else
-    {
-        foreach_int(attnum, cstate->attnumlist)
-        {
-            Datum        value = slot->tts_values[attnum - 1];
-            bool        isnull = slot->tts_isnull[attnum - 1];
-            bytea       *outputbytes;
-
-            if (isnull)
-                CopySendInt32(cstate, -1);
-            else
-            {
-                outputbytes = SendFunctionCall(&out_functions[attnum - 1],
-                                               value);
-                CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
-                CopySendData(cstate, VARDATA(outputbytes),
-                             VARSIZE(outputbytes) - VARHDRSZ);
-            }
-        }
-    }
-
-    CopySendEndOfRow(cstate);
+    cstate->routine->CopyToOneRow(cstate, slot);
 
     MemoryContextSwitchTo(oldcontext);
 }
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 141fd48dc10..ccfbdf0ee01 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -104,8 +104,6 @@ extern CopyFromState BeginCopyFrom(ParseState *pstate, Relation rel, Node *where
 extern void EndCopyFrom(CopyFromState cstate);
 extern bool NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
                          Datum *values, bool *nulls);
-extern bool NextCopyFromRawFields(CopyFromState cstate,
-                                  char ***fields, int *nfields);
 extern void CopyFromErrorCallback(void *arg);
 extern char *CopyLimitPrintoutLength(const char *str);
 
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 509b9e92a18..c11b5ff3cc0 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -187,4 +187,12 @@ typedef struct CopyFromStateData
 extern void ReceiveCopyBegin(CopyFromState cstate);
 extern void ReceiveCopyBinaryHeader(CopyFromState cstate);
 
+/* Callbacks for CopyFromRoutine->CopyFromOneRow */
+extern bool CopyFromTextOneRow(CopyFromState cstate, ExprContext *econtext,
+                               Datum *values, bool *nulls);
+extern bool CopyFromCSVOneRow(CopyFromState cstate, ExprContext *econtext,
+                              Datum *values, bool *nulls);
+extern bool CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext,
+                                 Datum *values, bool *nulls);
+
 #endif                            /* COPYFROM_INTERNAL_H */
-- 
2.45.2
From 5ba7136a527aca0861a74d0a7e1287ceef74f222 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Tue, 23 Jul 2024 17:39:41 +0900
Subject: [PATCH v20 3/5] Add support for adding custom COPY TO/FROM format
This uses the handler approach like tablesample. The approach creates
an internal function that returns an internal struct. In this case,
a COPY TO handler returns a CopyToRoutine and a COPY FROM handler
returns a CopyFromRoutine.
This uses the same handler for COPY TO and COPY FROM. PostgreSQL calls a
COPY TO/FROM handler with "is_from" argument. It's true for COPY FROM
and false for COPY TO:
    copy_handler(true) returns CopyToRoutine
    copy_handler(false) returns CopyFromRoutine
This also add a test module for custom COPY TO/FROM handler.
---
 src/backend/commands/copy.c                   |  96 ++++++++++++++---
 src/backend/commands/copyfrom.c               |   4 +-
 src/backend/commands/copyto.c                 |   4 +-
 src/backend/nodes/Makefile                    |   1 +
 src/backend/nodes/gen_node_support.pl         |   2 +
 src/backend/utils/adt/pseudotypes.c           |   1 +
 src/include/catalog/pg_proc.dat               |   6 ++
 src/include/catalog/pg_type.dat               |   6 ++
 src/include/commands/copy.h                   |   2 +
 src/include/commands/copyapi.h                |   4 +
 src/include/nodes/meson.build                 |   1 +
 src/test/modules/Makefile                     |   1 +
 src/test/modules/meson.build                  |   1 +
 src/test/modules/test_copy_format/.gitignore  |   4 +
 src/test/modules/test_copy_format/Makefile    |  23 ++++
 .../expected/test_copy_format.out             |  21 ++++
 src/test/modules/test_copy_format/meson.build |  33 ++++++
 .../test_copy_format/sql/test_copy_format.sql |   6 ++
 .../test_copy_format--1.0.sql                 |   8 ++
 .../test_copy_format/test_copy_format.c       | 100 ++++++++++++++++++
 .../test_copy_format/test_copy_format.control |   4 +
 21 files changed, 313 insertions(+), 15 deletions(-)
 create mode 100644 src/test/modules/test_copy_format/.gitignore
 create mode 100644 src/test/modules/test_copy_format/Makefile
 create mode 100644 src/test/modules/test_copy_format/expected/test_copy_format.out
 create mode 100644 src/test/modules/test_copy_format/meson.build
 create mode 100644 src/test/modules/test_copy_format/sql/test_copy_format.sql
 create mode 100644 src/test/modules/test_copy_format/test_copy_format--1.0.sql
 create mode 100644 src/test/modules/test_copy_format/test_copy_format.c
 create mode 100644 src/test/modules/test_copy_format/test_copy_format.control
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3bb579a3a44..d7d409379d1 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -32,6 +32,7 @@
 #include "parser/parse_coerce.h"
 #include "parser/parse_collate.h"
 #include "parser/parse_expr.h"
+#include "parser/parse_func.h"
 #include "parser/parse_relation.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
@@ -443,6 +444,87 @@ defGetCopyLogVerbosityChoice(DefElem *def, ParseState *pstate)
     return COPY_LOG_VERBOSITY_DEFAULT;    /* keep compiler quiet */
 }
 
+/*
+ * Process the "format" option.
+ *
+ * This function checks whether the option value is a built-in format such as
+ * "text" and "csv" or not. If the option value isn't a built-in format, this
+ * function finds a COPY format handler that returns a CopyToRoutine (for
+ * is_from == false) or CopyFromRountine (for is_from == true). If no COPY
+ * format handler is found, this function reports an error.
+ */
+static void
+ProcessCopyOptionFormat(ParseState *pstate,
+                        CopyFormatOptions *opts_out,
+                        bool is_from,
+                        DefElem *defel)
+{
+    char       *format;
+    Oid            funcargtypes[1];
+    Oid            handlerOid = InvalidOid;
+    Datum        datum;
+    Node       *routine;
+
+    format = defGetString(defel);
+
+    /* built-in formats */
+    if (strcmp(format, "text") == 0)
+         /* default format */ return;
+    else if (strcmp(format, "csv") == 0)
+    {
+        opts_out->csv_mode = true;
+        return;
+    }
+    else if (strcmp(format, "binary") == 0)
+    {
+        opts_out->binary = true;
+        return;
+    }
+
+    /* custom format */
+    funcargtypes[0] = INTERNALOID;
+    handlerOid = LookupFuncName(list_make1(makeString(format)), 1,
+                                funcargtypes, true);
+    if (!OidIsValid(handlerOid))
+        ereport(ERROR,
+                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                 errmsg("COPY format \"%s\" not recognized", format),
+                 parser_errposition(pstate, defel->location)));
+
+    datum = OidFunctionCall1(handlerOid, BoolGetDatum(is_from));
+    routine = (Node *) DatumGetPointer(datum);
+    if (is_from)
+    {
+        if (routine == NULL || !IsA(routine, CopyFromRoutine))
+            ereport(
+                    ERROR,
+                    (errcode(
+                             ERRCODE_INVALID_PARAMETER_VALUE),
+                     errmsg("COPY handler function "
+                            "%s(%u) did not return a "
+                            "CopyFromRoutine struct",
+                            format, handlerOid),
+                     parser_errposition(
+                                        pstate, defel->location)));
+    }
+    else
+    {
+        if (routine == NULL || !IsA(routine, CopyToRoutine))
+            ereport(
+                    ERROR,
+                    (errcode(
+                             ERRCODE_INVALID_PARAMETER_VALUE),
+                     errmsg("COPY handler function "
+                            "%s(%u) did not return a "
+                            "CopyToRoutine struct",
+                            format, handlerOid),
+                     parser_errposition(
+                                        pstate, defel->location)));
+    }
+
+    opts_out->routine = routine;
+}
+
 /*
  * Process the statement option list for COPY.
  *
@@ -485,22 +567,10 @@ ProcessCopyOptions(ParseState *pstate,
 
         if (strcmp(defel->defname, "format") == 0)
         {
-            char       *fmt = defGetString(defel);
-
             if (format_specified)
                 errorConflictingDefElem(defel, pstate);
             format_specified = true;
-            if (strcmp(fmt, "text") == 0)
-                 /* default format */ ;
-            else if (strcmp(fmt, "csv") == 0)
-                opts_out->csv_mode = true;
-            else if (strcmp(fmt, "binary") == 0)
-                opts_out->binary = true;
-            else
-                ereport(ERROR,
-                        (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-                         errmsg("COPY format \"%s\" not recognized", fmt),
-                         parser_errposition(pstate, defel->location)));
+            ProcessCopyOptionFormat(pstate, opts_out, is_from, defel);
         }
         else if (strcmp(defel->defname, "freeze") == 0)
         {
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 14f95f17124..7ecd9a1ad2c 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -247,7 +247,9 @@ static const CopyFromRoutine CopyFromRoutineBinary = {
 static const CopyFromRoutine *
 CopyFromGetRoutine(CopyFormatOptions opts)
 {
-    if (opts.csv_mode)
+    if (opts.routine)
+        return (const CopyFromRoutine *) opts.routine;
+    else if (opts.csv_mode)
         return &CopyFromRoutineCSV;
     else if (opts.binary)
         return &CopyFromRoutineBinary;
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 86dc1b742cc..8ddbddb119d 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -431,7 +431,9 @@ static const CopyToRoutine CopyToRoutineBinary = {
 static const CopyToRoutine *
 CopyToGetRoutine(CopyFormatOptions opts)
 {
-    if (opts.csv_mode)
+    if (opts.routine)
+        return (const CopyToRoutine *) opts.routine;
+    else if (opts.csv_mode)
         return &CopyToRoutineCSV;
     else if (opts.binary)
         return &CopyToRoutineBinary;
diff --git a/src/backend/nodes/Makefile b/src/backend/nodes/Makefile
index 66bbad8e6e0..173ee11811c 100644
--- a/src/backend/nodes/Makefile
+++ b/src/backend/nodes/Makefile
@@ -49,6 +49,7 @@ node_headers = \
     access/sdir.h \
     access/tableam.h \
     access/tsmapi.h \
+    commands/copyapi.h \
     commands/event_trigger.h \
     commands/trigger.h \
     executor/tuptable.h \
diff --git a/src/backend/nodes/gen_node_support.pl b/src/backend/nodes/gen_node_support.pl
index 81df3bdf95f..428ab4f0d93 100644
--- a/src/backend/nodes/gen_node_support.pl
+++ b/src/backend/nodes/gen_node_support.pl
@@ -61,6 +61,7 @@ my @all_input_files = qw(
   access/sdir.h
   access/tableam.h
   access/tsmapi.h
+  commands/copyapi.h
   commands/event_trigger.h
   commands/trigger.h
   executor/tuptable.h
@@ -85,6 +86,7 @@ my @nodetag_only_files = qw(
   access/sdir.h
   access/tableam.h
   access/tsmapi.h
+  commands/copyapi.h
   commands/event_trigger.h
   commands/trigger.h
   executor/tuptable.h
diff --git a/src/backend/utils/adt/pseudotypes.c b/src/backend/utils/adt/pseudotypes.c
index e189e9b79d2..25f24ab95d2 100644
--- a/src/backend/utils/adt/pseudotypes.c
+++ b/src/backend/utils/adt/pseudotypes.c
@@ -370,6 +370,7 @@ PSEUDOTYPE_DUMMY_IO_FUNCS(fdw_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(table_am_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(index_am_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(tsm_handler);
+PSEUDOTYPE_DUMMY_IO_FUNCS(copy_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(internal);
 PSEUDOTYPE_DUMMY_IO_FUNCS(anyelement);
 PSEUDOTYPE_DUMMY_IO_FUNCS(anynonarray);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 322114d72a7..f108780e8b6 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -7741,6 +7741,12 @@
 { oid => '3312', descr => 'I/O',
   proname => 'tsm_handler_out', prorettype => 'cstring',
   proargtypes => 'tsm_handler', prosrc => 'tsm_handler_out' },
+{ oid => '8753', descr => 'I/O',
+  proname => 'copy_handler_in', proisstrict => 'f', prorettype => 'copy_handler',
+  proargtypes => 'cstring', prosrc => 'copy_handler_in' },
+{ oid => '8754', descr => 'I/O',
+  proname => 'copy_handler_out', prorettype => 'cstring',
+  proargtypes => 'copy_handler', prosrc => 'copy_handler_out' },
 { oid => '267', descr => 'I/O',
   proname => 'table_am_handler_in', proisstrict => 'f',
   prorettype => 'table_am_handler', proargtypes => 'cstring',
diff --git a/src/include/catalog/pg_type.dat b/src/include/catalog/pg_type.dat
index ceff66ccde1..37ebfa0908f 100644
--- a/src/include/catalog/pg_type.dat
+++ b/src/include/catalog/pg_type.dat
@@ -633,6 +633,12 @@
   typcategory => 'P', typinput => 'tsm_handler_in',
   typoutput => 'tsm_handler_out', typreceive => '-', typsend => '-',
   typalign => 'i' },
+{ oid => '8752',
+  descr => 'pseudo-type for the result of a copy to/from method function',
+  typname => 'copy_handler', typlen => '4', typbyval => 't', typtype => 'p',
+  typcategory => 'P', typinput => 'copy_handler_in',
+  typoutput => 'copy_handler_out', typreceive => '-', typsend => '-',
+  typalign => 'i' },
 { oid => '269',
   descr => 'pseudo-type for the result of a table AM handler function',
   typname => 'table_am_handler', typlen => '4', typbyval => 't', typtype => 'p',
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index ccfbdf0ee01..79bd4fb9151 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -84,6 +84,8 @@ typedef struct CopyFormatOptions
     CopyOnErrorChoice on_error; /* what to do when error happened */
     CopyLogVerbosityChoice log_verbosity;    /* verbosity of logged messages */
     List       *convert_select; /* list of column names (can be NIL) */
+    Node       *routine;        /* CopyToRoutine or CopyFromRoutine (can be
+                                 * NULL) */
 } CopyFormatOptions;
 
 /* These are private in commands/copy[from|to].c */
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index d1289424c67..e049a45a4b1 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -27,6 +27,8 @@ typedef struct CopyToStateData *CopyToState;
  */
 typedef struct CopyFromRoutine
 {
+    NodeTag        type;
+
     /*
      * Called when COPY FROM is started to set up the input functions
      * associated with the relation's attributes writing to.  `finfo` can be
@@ -69,6 +71,8 @@ typedef struct CopyFromRoutine
  */
 typedef struct CopyToRoutine
 {
+    NodeTag        type;
+
     /*
      * Called when COPY TO is started to set up the output functions
      * associated with the relation's attributes reading from.  `finfo` can be
diff --git a/src/include/nodes/meson.build b/src/include/nodes/meson.build
index b665e55b657..103df1a7873 100644
--- a/src/include/nodes/meson.build
+++ b/src/include/nodes/meson.build
@@ -11,6 +11,7 @@ node_support_input_i = [
   'access/sdir.h',
   'access/tableam.h',
   'access/tsmapi.h',
+  'commands/copyapi.h',
   'commands/event_trigger.h',
   'commands/trigger.h',
   'executor/tuptable.h',
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index 256799f520a..b7b46928a19 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -15,6 +15,7 @@ SUBDIRS = \
           spgist_name_ops \
           test_bloomfilter \
           test_copy_callbacks \
+          test_copy_format \
           test_custom_rmgrs \
           test_ddl_deparse \
           test_dsa \
diff --git a/src/test/modules/meson.build b/src/test/modules/meson.build
index d8fe059d236..c42b4b2b31f 100644
--- a/src/test/modules/meson.build
+++ b/src/test/modules/meson.build
@@ -14,6 +14,7 @@ subdir('spgist_name_ops')
 subdir('ssl_passphrase_callback')
 subdir('test_bloomfilter')
 subdir('test_copy_callbacks')
+subdir('test_copy_format')
 subdir('test_custom_rmgrs')
 subdir('test_ddl_deparse')
 subdir('test_dsa')
diff --git a/src/test/modules/test_copy_format/.gitignore b/src/test/modules/test_copy_format/.gitignore
new file mode 100644
index 00000000000..5dcb3ff9723
--- /dev/null
+++ b/src/test/modules/test_copy_format/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/src/test/modules/test_copy_format/Makefile b/src/test/modules/test_copy_format/Makefile
new file mode 100644
index 00000000000..8497f91624d
--- /dev/null
+++ b/src/test/modules/test_copy_format/Makefile
@@ -0,0 +1,23 @@
+# src/test/modules/test_copy_format/Makefile
+
+MODULE_big = test_copy_format
+OBJS = \
+    $(WIN32RES) \
+    test_copy_format.o
+PGFILEDESC = "test_copy_format - test custom COPY FORMAT"
+
+EXTENSION = test_copy_format
+DATA = test_copy_format--1.0.sql
+
+REGRESS = test_copy_format
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_copy_format
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out
b/src/test/modules/test_copy_format/expected/test_copy_format.out
new file mode 100644
index 00000000000..4ed7c0b12db
--- /dev/null
+++ b/src/test/modules/test_copy_format/expected/test_copy_format.out
@@ -0,0 +1,21 @@
+CREATE EXTENSION test_copy_format;
+CREATE TABLE public.test (a smallint, b integer, c bigint);
+INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+COPY public.test FROM stdin WITH (format 'test_copy_format');
+NOTICE:  test_copy_format: is_from=true
+NOTICE:  CopyFromInFunc: atttypid=21
+NOTICE:  CopyFromInFunc: atttypid=23
+NOTICE:  CopyFromInFunc: atttypid=20
+NOTICE:  CopyFromStart: natts=3
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromEnd
+COPY public.test TO stdout WITH (format 'test_copy_format');
+NOTICE:  test_copy_format: is_from=false
+NOTICE:  CopyToOutFunc: atttypid=21
+NOTICE:  CopyToOutFunc: atttypid=23
+NOTICE:  CopyToOutFunc: atttypid=20
+NOTICE:  CopyToStart: natts=3
+NOTICE:  CopyToOneRow: tts_nvalid=3
+NOTICE:  CopyToOneRow: tts_nvalid=3
+NOTICE:  CopyToOneRow: tts_nvalid=3
+NOTICE:  CopyToEnd
diff --git a/src/test/modules/test_copy_format/meson.build b/src/test/modules/test_copy_format/meson.build
new file mode 100644
index 00000000000..4cefe7b709a
--- /dev/null
+++ b/src/test/modules/test_copy_format/meson.build
@@ -0,0 +1,33 @@
+# Copyright (c) 2024, PostgreSQL Global Development Group
+
+test_copy_format_sources = files(
+  'test_copy_format.c',
+)
+
+if host_system == 'windows'
+  test_copy_format_sources += rc_lib_gen.process(win32ver_rc, extra_args: [
+    '--NAME', 'test_copy_format',
+    '--FILEDESC', 'test_copy_format - test custom COPY FORMAT',])
+endif
+
+test_copy_format = shared_module('test_copy_format',
+  test_copy_format_sources,
+  kwargs: pg_test_mod_args,
+)
+test_install_libs += test_copy_format
+
+test_install_data += files(
+  'test_copy_format.control',
+  'test_copy_format--1.0.sql',
+)
+
+tests += {
+  'name': 'test_copy_format',
+  'sd': meson.current_source_dir(),
+  'bd': meson.current_build_dir(),
+  'regress': {
+    'sql': [
+      'test_copy_format',
+    ],
+  },
+}
diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql
b/src/test/modules/test_copy_format/sql/test_copy_format.sql
new file mode 100644
index 00000000000..e805f7cb011
--- /dev/null
+++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql
@@ -0,0 +1,6 @@
+CREATE EXTENSION test_copy_format;
+CREATE TABLE public.test (a smallint, b integer, c bigint);
+INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+COPY public.test FROM stdin WITH (format 'test_copy_format');
+\.
+COPY public.test TO stdout WITH (format 'test_copy_format');
diff --git a/src/test/modules/test_copy_format/test_copy_format--1.0.sql
b/src/test/modules/test_copy_format/test_copy_format--1.0.sql
new file mode 100644
index 00000000000..d24ea03ce99
--- /dev/null
+++ b/src/test/modules/test_copy_format/test_copy_format--1.0.sql
@@ -0,0 +1,8 @@
+/* src/test/modules/test_copy_format/test_copy_format--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_copy_format" to load this file. \quit
+
+CREATE FUNCTION test_copy_format(internal)
+    RETURNS copy_handler
+    AS 'MODULE_PATHNAME' LANGUAGE C;
diff --git a/src/test/modules/test_copy_format/test_copy_format.c
b/src/test/modules/test_copy_format/test_copy_format.c
new file mode 100644
index 00000000000..f6b105659ab
--- /dev/null
+++ b/src/test/modules/test_copy_format/test_copy_format.c
@@ -0,0 +1,100 @@
+/*--------------------------------------------------------------------------
+ *
+ * test_copy_format.c
+ *        Code for testing custom COPY format.
+ *
+ * Portions Copyright (c) 2024, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *        src/test/modules/test_copy_format/test_copy_format.c
+ *
+ * -------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "commands/copyapi.h"
+#include "commands/defrem.h"
+
+PG_MODULE_MAGIC;
+
+static void
+CopyFromInFunc(CopyFromState cstate, Oid atttypid,
+               FmgrInfo *finfo, Oid *typioparam)
+{
+    ereport(NOTICE, (errmsg("CopyFromInFunc: atttypid=%d", atttypid)));
+}
+
+static void
+CopyFromStart(CopyFromState cstate, TupleDesc tupDesc)
+{
+    ereport(NOTICE, (errmsg("CopyFromStart: natts=%d", tupDesc->natts)));
+}
+
+static bool
+CopyFromOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls)
+{
+    ereport(NOTICE, (errmsg("CopyFromOneRow")));
+    return false;
+}
+
+static void
+CopyFromEnd(CopyFromState cstate)
+{
+    ereport(NOTICE, (errmsg("CopyFromEnd")));
+}
+
+static const CopyFromRoutine CopyFromRoutineTestCopyFormat = {
+    .type = T_CopyFromRoutine,
+    .CopyFromInFunc = CopyFromInFunc,
+    .CopyFromStart = CopyFromStart,
+    .CopyFromOneRow = CopyFromOneRow,
+    .CopyFromEnd = CopyFromEnd,
+};
+
+static void
+CopyToOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo)
+{
+    ereport(NOTICE, (errmsg("CopyToOutFunc: atttypid=%d", atttypid)));
+}
+
+static void
+CopyToStart(CopyToState cstate, TupleDesc tupDesc)
+{
+    ereport(NOTICE, (errmsg("CopyToStart: natts=%d", tupDesc->natts)));
+}
+
+static void
+CopyToOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+    ereport(NOTICE, (errmsg("CopyToOneRow: tts_nvalid=%u", slot->tts_nvalid)));
+}
+
+static void
+CopyToEnd(CopyToState cstate)
+{
+    ereport(NOTICE, (errmsg("CopyToEnd")));
+}
+
+static const CopyToRoutine CopyToRoutineTestCopyFormat = {
+    .type = T_CopyToRoutine,
+    .CopyToOutFunc = CopyToOutFunc,
+    .CopyToStart = CopyToStart,
+    .CopyToOneRow = CopyToOneRow,
+    .CopyToEnd = CopyToEnd,
+};
+
+PG_FUNCTION_INFO_V1(test_copy_format);
+Datum
+test_copy_format(PG_FUNCTION_ARGS)
+{
+    bool        is_from = PG_GETARG_BOOL(0);
+
+    ereport(NOTICE,
+            (errmsg("test_copy_format: is_from=%s", is_from ? "true" : "false")));
+
+    if (is_from)
+        PG_RETURN_POINTER(&CopyFromRoutineTestCopyFormat);
+    else
+        PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat);
+}
diff --git a/src/test/modules/test_copy_format/test_copy_format.control
b/src/test/modules/test_copy_format/test_copy_format.control
new file mode 100644
index 00000000000..f05a6362358
--- /dev/null
+++ b/src/test/modules/test_copy_format/test_copy_format.control
@@ -0,0 +1,4 @@
+comment = 'Test code for custom COPY format'
+default_version = '1.0'
+module_pathname = '$libdir/test_copy_format'
+relocatable = true
-- 
2.45.2
From 95213545ae4754c3a435ba2d737fd0b2677e9241 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Tue, 23 Jan 2024 14:54:10 +0900
Subject: [PATCH v20 4/5] Export CopyToStateData and CopyFromStateData
It's for custom COPY TO/FROM format handlers implemented as extension.
This just moves codes. This doesn't change codes except
CopyDest/CopySource enum values. CopyDest/CopySource enum values such as
COPY_FILE are conflicted each other. So COPY_DEST_ prefix instead of
COPY_ prefix is used for CopyDest enum values and COPY_SOURCE_ prefix
instead of COPY_ prefix is used for CopySource enum values. For example,
COPY_FILE in CopyDest is renamed to COPY_DEST_FILE and COPY_FILE in
CopySource is renamed to COPY_SOURCE_FILE.
Note that this isn't enough to implement custom COPY TO/FROM format
handlers as extension. We'll do the followings in a subsequent commit:
For custom COPY TO format handler:
1. Add an opaque space for custom COPY TO format handler
2. Export CopySendEndOfRow() to flush buffer
For custom COPY FROM format handler:
1. Add an opaque space for custom COPY FROM format handler
2. Export CopyReadBinaryData() to read the next data
---
 src/backend/commands/copyfrom.c          |   4 +-
 src/backend/commands/copyfromparse.c     |  10 +-
 src/backend/commands/copyto.c            |  77 +-----
 src/include/commands/copy.h              |  78 +-----
 src/include/commands/copyapi.h           | 306 ++++++++++++++++++++++-
 src/include/commands/copyfrom_internal.h | 165 ------------
 6 files changed, 320 insertions(+), 320 deletions(-)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 7ecd9a1ad2c..dd4342f8b9c 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1713,7 +1713,7 @@ BeginCopyFrom(ParseState *pstate,
                             pg_encoding_to_char(GetDatabaseEncoding()))));
     }
 
-    cstate->copy_src = COPY_FILE;    /* default */
+    cstate->copy_src = COPY_SOURCE_FILE;    /* default */
 
     cstate->whereClause = whereClause;
 
@@ -1841,7 +1841,7 @@ BeginCopyFrom(ParseState *pstate,
     if (data_source_cb)
     {
         progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK;
-        cstate->copy_src = COPY_CALLBACK;
+        cstate->copy_src = COPY_SOURCE_CALLBACK;
         cstate->data_source_cb = data_source_cb;
     }
     else if (pipe)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 5f63b683d17..ec86a17b3b3 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -180,7 +180,7 @@ ReceiveCopyBegin(CopyFromState cstate)
     for (i = 0; i < natts; i++)
         pq_sendint16(&buf, format); /* per-column formats */
     pq_endmessage(&buf);
-    cstate->copy_src = COPY_FRONTEND;
+    cstate->copy_src = COPY_SOURCE_FRONTEND;
     cstate->fe_msgbuf = makeStringInfo();
     /* We *must* flush here to ensure FE knows it can send. */
     pq_flush();
@@ -248,7 +248,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
 
     switch (cstate->copy_src)
     {
-        case COPY_FILE:
+        case COPY_SOURCE_FILE:
             bytesread = fread(databuf, 1, maxread, cstate->copy_file);
             if (ferror(cstate->copy_file))
                 ereport(ERROR,
@@ -257,7 +257,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
             if (bytesread == 0)
                 cstate->raw_reached_eof = true;
             break;
-        case COPY_FRONTEND:
+        case COPY_SOURCE_FRONTEND:
             while (maxread > 0 && bytesread < minread && !cstate->raw_reached_eof)
             {
                 int            avail;
@@ -340,7 +340,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
                 bytesread += avail;
             }
             break;
-        case COPY_CALLBACK:
+        case COPY_SOURCE_CALLBACK:
             bytesread = cstate->data_source_cb(databuf, minread, maxread);
             break;
     }
@@ -1188,7 +1188,7 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
          * after \. up to the protocol end of copy data.  (XXX maybe better
          * not to treat \. as special?)
          */
-        if (cstate->copy_src == COPY_FRONTEND)
+        if (cstate->copy_src == COPY_SOURCE_FRONTEND)
         {
             int            inbytes;
 
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 8ddbddb119d..37b150b44ba 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -37,67 +37,6 @@
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
-/*
- * Represents the different dest cases we need to worry about at
- * the bottom level
- */
-typedef enum CopyDest
-{
-    COPY_FILE,                    /* to file (or a piped program) */
-    COPY_FRONTEND,                /* to frontend */
-    COPY_CALLBACK,                /* to callback function */
-} CopyDest;
-
-/*
- * This struct contains all the state variables used throughout a COPY TO
- * operation.
- *
- * Multi-byte encodings: all supported client-side encodings encode multi-byte
- * characters by having the first byte's high bit set. Subsequent bytes of the
- * character can have the high bit not set. When scanning data in such an
- * encoding to look for a match to a single-byte (ie ASCII) character, we must
- * use the full pg_encoding_mblen() machinery to skip over multibyte
- * characters, else we might find a false match to a trailing byte. In
- * supported server encodings, there is no possibility of a false match, and
- * it's faster to make useless comparisons to trailing bytes than it is to
- * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true
- * when we have to do it the hard way.
- */
-typedef struct CopyToStateData
-{
-    /* format routine */
-    const CopyToRoutine *routine;
-
-    /* low-level state data */
-    CopyDest    copy_dest;        /* type of copy source/destination */
-    FILE       *copy_file;        /* used if copy_dest == COPY_FILE */
-    StringInfo    fe_msgbuf;        /* used for all dests during COPY TO */
-
-    int            file_encoding;    /* file or remote side's character encoding */
-    bool        need_transcoding;    /* file encoding diff from server? */
-    bool        encoding_embeds_ascii;    /* ASCII can be non-first byte? */
-
-    /* parameters from the COPY command */
-    Relation    rel;            /* relation to copy to */
-    QueryDesc  *queryDesc;        /* executable query to copy from */
-    List       *attnumlist;        /* integer list of attnums to copy */
-    char       *filename;        /* filename, or NULL for STDOUT */
-    bool        is_program;        /* is 'filename' a program to popen? */
-    copy_data_dest_cb data_dest_cb; /* function for writing data */
-
-    CopyFormatOptions opts;
-    Node       *whereClause;    /* WHERE condition (or NULL) */
-
-    /*
-     * Working state
-     */
-    MemoryContext copycontext;    /* per-copy execution context */
-
-    FmgrInfo   *out_functions;    /* lookup info for output functions */
-    MemoryContext rowcontext;    /* per-row evaluation context */
-    uint64        bytes_processed;    /* number of bytes processed so far */
-} CopyToStateData;
-
 /* DestReceiver for COPY (query) TO */
 typedef struct
 {
@@ -143,7 +82,7 @@ CopyToTextLikeSendEndOfRow(CopyToState cstate)
 {
     switch (cstate->copy_dest)
     {
-        case COPY_FILE:
+        case COPY_DEST_FILE:
             /* Default line termination depends on platform */
 #ifndef WIN32
             CopySendChar(cstate, '\n');
@@ -151,7 +90,7 @@ CopyToTextLikeSendEndOfRow(CopyToState cstate)
             CopySendString(cstate, "\r\n");
 #endif
             break;
-        case COPY_FRONTEND:
+        case COPY_DEST_FRONTEND:
             /* The FE/BE protocol uses \n as newline for all platforms */
             CopySendChar(cstate, '\n');
             break;
@@ -460,7 +399,7 @@ SendCopyBegin(CopyToState cstate)
     for (i = 0; i < natts; i++)
         pq_sendint16(&buf, format); /* per-column formats */
     pq_endmessage(&buf);
-    cstate->copy_dest = COPY_FRONTEND;
+    cstate->copy_dest = COPY_DEST_FRONTEND;
 }
 
 static void
@@ -507,7 +446,7 @@ CopySendEndOfRow(CopyToState cstate)
 
     switch (cstate->copy_dest)
     {
-        case COPY_FILE:
+        case COPY_DEST_FILE:
             if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1,
                        cstate->copy_file) != 1 ||
                 ferror(cstate->copy_file))
@@ -541,11 +480,11 @@ CopySendEndOfRow(CopyToState cstate)
                              errmsg("could not write to COPY file: %m")));
             }
             break;
-        case COPY_FRONTEND:
+        case COPY_DEST_FRONTEND:
             /* Dump the accumulated row as one CopyData message */
             (void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len);
             break;
-        case COPY_CALLBACK:
+        case COPY_DEST_CALLBACK:
             cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len);
             break;
     }
@@ -925,12 +864,12 @@ BeginCopyTo(ParseState *pstate,
     /* See Multibyte encoding comment above */
     cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding);
 
-    cstate->copy_dest = COPY_FILE;    /* default */
+    cstate->copy_dest = COPY_DEST_FILE; /* default */
 
     if (data_dest_cb)
     {
         progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK;
-        cstate->copy_dest = COPY_CALLBACK;
+        cstate->copy_dest = COPY_DEST_CALLBACK;
         cstate->data_dest_cb = data_dest_cb;
     }
     else if (pipe)
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 79bd4fb9151..e2411848e9f 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -14,87 +14,11 @@
 #ifndef COPY_H
 #define COPY_H
 
-#include "nodes/execnodes.h"
+#include "commands/copyapi.h"
 #include "nodes/parsenodes.h"
 #include "parser/parse_node.h"
 #include "tcop/dest.h"
 
-/*
- * Represents whether a header line should be present, and whether it must
- * match the actual names (which implies "true").
- */
-typedef enum CopyHeaderChoice
-{
-    COPY_HEADER_FALSE = 0,
-    COPY_HEADER_TRUE,
-    COPY_HEADER_MATCH,
-} CopyHeaderChoice;
-
-/*
- * Represents where to save input processing errors.  More values to be added
- * in the future.
- */
-typedef enum CopyOnErrorChoice
-{
-    COPY_ON_ERROR_STOP = 0,        /* immediately throw errors, default */
-    COPY_ON_ERROR_IGNORE,        /* ignore errors */
-} CopyOnErrorChoice;
-
-/*
- * Represents verbosity of logged messages by COPY command.
- */
-typedef enum CopyLogVerbosityChoice
-{
-    COPY_LOG_VERBOSITY_DEFAULT = 0, /* logs no additional messages, default */
-    COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
-} CopyLogVerbosityChoice;
-
-/*
- * A struct to hold COPY options, in a parsed form. All of these are related
- * to formatting, except for 'freeze', which doesn't really belong here, but
- * it's expedient to parse it along with all the other options.
- */
-typedef struct CopyFormatOptions
-{
-    /* parameters from the COPY command */
-    int            file_encoding;    /* file or remote side's character encoding,
-                                 * -1 if not specified */
-    bool        binary;            /* binary format? */
-    bool        freeze;            /* freeze rows on loading? */
-    bool        csv_mode;        /* Comma Separated Value format? */
-    CopyHeaderChoice header_line;    /* header line? */
-    char       *null_print;        /* NULL marker string (server encoding!) */
-    int            null_print_len; /* length of same */
-    char       *null_print_client;    /* same converted to file encoding */
-    char       *default_print;    /* DEFAULT marker string */
-    int            default_print_len;    /* length of same */
-    char       *delim;            /* column delimiter (must be 1 byte) */
-    char       *quote;            /* CSV quote char (must be 1 byte) */
-    char       *escape;            /* CSV escape char (must be 1 byte) */
-    List       *force_quote;    /* list of column names */
-    bool        force_quote_all;    /* FORCE_QUOTE *? */
-    bool       *force_quote_flags;    /* per-column CSV FQ flags */
-    List       *force_notnull;    /* list of column names */
-    bool        force_notnull_all;    /* FORCE_NOT_NULL *? */
-    bool       *force_notnull_flags;    /* per-column CSV FNN flags */
-    List       *force_null;        /* list of column names */
-    bool        force_null_all; /* FORCE_NULL *? */
-    bool       *force_null_flags;    /* per-column CSV FN flags */
-    bool        convert_selectively;    /* do selective binary conversion? */
-    CopyOnErrorChoice on_error; /* what to do when error happened */
-    CopyLogVerbosityChoice log_verbosity;    /* verbosity of logged messages */
-    List       *convert_select; /* list of column names (can be NIL) */
-    Node       *routine;        /* CopyToRoutine or CopyFromRoutine (can be
-                                 * NULL) */
-} CopyFormatOptions;
-
-/* These are private in commands/copy[from|to].c */
-typedef struct CopyFromStateData *CopyFromState;
-typedef struct CopyToStateData *CopyToState;
-
-typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
-typedef void (*copy_data_dest_cb) (void *data, int len);
-
 extern void DoCopy(ParseState *pstate, const CopyStmt *stmt,
                    int stmt_location, int stmt_len,
                    uint64 *processed);
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index e049a45a4b1..8a560903ede 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -14,12 +14,81 @@
 #ifndef COPYAPI_H
 #define COPYAPI_H
 
+#include "commands/trigger.h"
+#include "executor/execdesc.h"
 #include "executor/tuptable.h"
 #include "nodes/execnodes.h"
 
-/* These are private in commands/copy[from|to].c */
+/*
+ * Represents whether a header line should be present, and whether it must
+ * match the actual names (which implies "true").
+ */
+typedef enum CopyHeaderChoice
+{
+    COPY_HEADER_FALSE = 0,
+    COPY_HEADER_TRUE,
+    COPY_HEADER_MATCH,
+} CopyHeaderChoice;
+
+/*
+ * Represents where to save input processing errors.  More values to be added
+ * in the future.
+ */
+typedef enum CopyOnErrorChoice
+{
+    COPY_ON_ERROR_STOP = 0,        /* immediately throw errors, default */
+    COPY_ON_ERROR_IGNORE,        /* ignore errors */
+} CopyOnErrorChoice;
+
+/*
+ * Represents verbosity of logged messages by COPY command.
+ */
+typedef enum CopyLogVerbosityChoice
+{
+    COPY_LOG_VERBOSITY_DEFAULT = 0, /* logs no additional messages, default */
+    COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
+} CopyLogVerbosityChoice;
+
+/*
+ * A struct to hold COPY options, in a parsed form. All of these are related
+ * to formatting, except for 'freeze', which doesn't really belong here, but
+ * it's expedient to parse it along with all the other options.
+ */
+typedef struct CopyFormatOptions
+{
+    /* parameters from the COPY command */
+    int            file_encoding;    /* file or remote side's character encoding,
+                                 * -1 if not specified */
+    bool        binary;            /* binary format? */
+    bool        freeze;            /* freeze rows on loading? */
+    bool        csv_mode;        /* Comma Separated Value format? */
+    CopyHeaderChoice header_line;    /* header line? */
+    char       *null_print;        /* NULL marker string (server encoding!) */
+    int            null_print_len; /* length of same */
+    char       *null_print_client;    /* same converted to file encoding */
+    char       *default_print;    /* DEFAULT marker string */
+    int            default_print_len;    /* length of same */
+    char       *delim;            /* column delimiter (must be 1 byte) */
+    char       *quote;            /* CSV quote char (must be 1 byte) */
+    char       *escape;            /* CSV escape char (must be 1 byte) */
+    List       *force_quote;    /* list of column names */
+    bool        force_quote_all;    /* FORCE_QUOTE *? */
+    bool       *force_quote_flags;    /* per-column CSV FQ flags */
+    List       *force_notnull;    /* list of column names */
+    bool        force_notnull_all;    /* FORCE_NOT_NULL *? */
+    bool       *force_notnull_flags;    /* per-column CSV FNN flags */
+    List       *force_null;        /* list of column names */
+    bool        force_null_all; /* FORCE_NULL *? */
+    bool       *force_null_flags;    /* per-column CSV FN flags */
+    bool        convert_selectively;    /* do selective binary conversion? */
+    CopyOnErrorChoice on_error; /* what to do when error happened */
+    CopyLogVerbosityChoice log_verbosity;    /* verbosity of logged messages */
+    List       *convert_select; /* list of column names (can be NIL) */
+    Node       *routine;        /* CopyToRoutine or CopyFromRoutine (can be
+                                 * NULL) */
+} CopyFormatOptions;
+
 typedef struct CopyFromStateData *CopyFromState;
-typedef struct CopyToStateData *CopyToState;
 
 /*
  * API structure for a COPY FROM format implementation.  Note this must be
@@ -65,6 +134,176 @@ typedef struct CopyFromRoutine
     void        (*CopyFromEnd) (CopyFromState cstate);
 } CopyFromRoutine;
 
+/*
+ * Represents the different source cases we need to worry about at
+ * the bottom level
+ */
+typedef enum CopySource
+{
+    COPY_SOURCE_FILE,            /* from file (or a piped program) */
+    COPY_SOURCE_FRONTEND,        /* from frontend */
+    COPY_SOURCE_CALLBACK,        /* from callback function */
+} CopySource;
+
+/*
+ * Represents the end-of-line terminator type of the input
+ */
+typedef enum EolType
+{
+    EOL_UNKNOWN,
+    EOL_NL,
+    EOL_CR,
+    EOL_CRNL,
+} EolType;
+
+/*
+ * Represents the insert method to be used during COPY FROM.
+ */
+typedef enum CopyInsertMethod
+{
+    CIM_SINGLE,                    /* use table_tuple_insert or ExecForeignInsert */
+    CIM_MULTI,                    /* always use table_multi_insert or
+                                 * ExecForeignBatchInsert */
+    CIM_MULTI_CONDITIONAL,        /* use table_multi_insert or
+                                 * ExecForeignBatchInsert only if valid */
+} CopyInsertMethod;
+
+typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
+
+/*
+ * This struct contains all the state variables used throughout a COPY FROM
+ * operation.
+ */
+typedef struct CopyFromStateData
+{
+    /* format routine */
+    const CopyFromRoutine *routine;
+
+    /* low-level state data */
+    CopySource    copy_src;        /* type of copy source */
+    FILE       *copy_file;        /* used if copy_src == COPY_FILE */
+    StringInfo    fe_msgbuf;        /* used if copy_src == COPY_FRONTEND */
+
+    EolType        eol_type;        /* EOL type of input */
+    int            file_encoding;    /* file or remote side's character encoding */
+    bool        need_transcoding;    /* file encoding diff from server? */
+    Oid            conversion_proc;    /* encoding conversion function */
+
+    /* parameters from the COPY command */
+    Relation    rel;            /* relation to copy from */
+    List       *attnumlist;        /* integer list of attnums to copy */
+    char       *filename;        /* filename, or NULL for STDIN */
+    bool        is_program;        /* is 'filename' a program to popen? */
+    copy_data_source_cb data_source_cb; /* function for reading data */
+
+    CopyFormatOptions opts;
+    bool       *convert_select_flags;    /* per-column CSV/TEXT CS flags */
+    Node       *whereClause;    /* WHERE condition (or NULL) */
+
+    /* these are just for error messages, see CopyFromErrorCallback */
+    const char *cur_relname;    /* table name for error messages */
+    uint64        cur_lineno;        /* line number for error messages */
+    const char *cur_attname;    /* current att for error messages */
+    const char *cur_attval;        /* current att value for error messages */
+    bool        relname_only;    /* don't output line number, att, etc. */
+
+    /*
+     * Working state
+     */
+    MemoryContext copycontext;    /* per-copy execution context */
+
+    AttrNumber    num_defaults;    /* count of att that are missing and have
+                                 * default value */
+    FmgrInfo   *in_functions;    /* array of input functions for each attrs */
+    Oid           *typioparams;    /* array of element types for in_functions */
+    ErrorSaveContext *escontext;    /* soft error trapper during in_functions
+                                     * execution */
+    uint64        num_errors;        /* total number of rows which contained soft
+                                 * errors */
+    int           *defmap;            /* array of default att numbers related to
+                                 * missing att */
+    ExprState **defexprs;        /* array of default att expressions for all
+                                 * att */
+    bool       *defaults;        /* if DEFAULT marker was found for
+                                 * corresponding att */
+    bool        volatile_defexprs;    /* is any of defexprs volatile? */
+    List       *range_table;    /* single element list of RangeTblEntry */
+    List       *rteperminfos;    /* single element list of RTEPermissionInfo */
+    ExprState  *qualexpr;
+
+    TransitionCaptureState *transition_capture;
+
+    /*
+     * These variables are used to reduce overhead in COPY FROM.
+     *
+     * attribute_buf holds the separated, de-escaped text for each field of
+     * the current line.  The CopyReadAttributes functions return arrays of
+     * pointers into this buffer.  We avoid palloc/pfree overhead by re-using
+     * the buffer on each cycle.
+     *
+     * In binary COPY FROM, attribute_buf holds the binary data for the
+     * current field, but the usage is otherwise similar.
+     */
+    StringInfoData attribute_buf;
+
+    /* field raw data pointers found by COPY FROM */
+
+    int            max_fields;
+    char      **raw_fields;
+
+    /*
+     * Similarly, line_buf holds the whole input line being processed. The
+     * input cycle is first to read the whole line into line_buf, and then
+     * extract the individual attribute fields into attribute_buf.  line_buf
+     * is preserved unmodified so that we can display it in error messages if
+     * appropriate.  (In binary mode, line_buf is not used.)
+     */
+    StringInfoData line_buf;
+    bool        line_buf_valid; /* contains the row being processed? */
+
+    /*
+     * input_buf holds input data, already converted to database encoding.
+     *
+     * In text mode, CopyReadLine parses this data sufficiently to locate line
+     * boundaries, then transfers the data to line_buf. We guarantee that
+     * there is a \0 at input_buf[input_buf_len] at all times.  (In binary
+     * mode, input_buf is not used.)
+     *
+     * If encoding conversion is not required, input_buf is not a separate
+     * buffer but points directly to raw_buf.  In that case, input_buf_len
+     * tracks the number of bytes that have been verified as valid in the
+     * database encoding, and raw_buf_len is the total number of bytes stored
+     * in the buffer.
+     */
+#define INPUT_BUF_SIZE 65536    /* we palloc INPUT_BUF_SIZE+1 bytes */
+    char       *input_buf;
+    int            input_buf_index;    /* next byte to process */
+    int            input_buf_len;    /* total # of bytes stored */
+    bool        input_reached_eof;    /* true if we reached EOF */
+    bool        input_reached_error;    /* true if a conversion error happened */
+    /* Shorthand for number of unconsumed bytes available in input_buf */
+#define INPUT_BUF_BYTES(cstate) ((cstate)->input_buf_len - (cstate)->input_buf_index)
+
+    /*
+     * raw_buf holds raw input data read from the data source (file or client
+     * connection), not yet converted to the database encoding.  Like with
+     * 'input_buf', we guarantee that there is a \0 at raw_buf[raw_buf_len].
+     */
+#define RAW_BUF_SIZE 65536        /* we palloc RAW_BUF_SIZE+1 bytes */
+    char       *raw_buf;
+    int            raw_buf_index;    /* next byte to process */
+    int            raw_buf_len;    /* total # of bytes stored */
+    bool        raw_reached_eof;    /* true if we reached EOF */
+
+    /* Shorthand for number of unconsumed bytes available in raw_buf */
+#define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index)
+
+    uint64        bytes_processed;    /* number of bytes processed so far */
+} CopyFromStateData;
+
+
+typedef struct CopyToStateData *CopyToState;
+
 /*
  * API structure for a COPY TO format implementation.   Note this must be
  * allocated in a server-lifetime manner, typically as a static const struct.
@@ -102,4 +341,67 @@ typedef struct CopyToRoutine
     void        (*CopyToEnd) (CopyToState cstate);
 } CopyToRoutine;
 
+/*
+ * Represents the different dest cases we need to worry about at
+ * the bottom level
+ */
+typedef enum CopyDest
+{
+    COPY_DEST_FILE,                /* to file (or a piped program) */
+    COPY_DEST_FRONTEND,            /* to frontend */
+    COPY_DEST_CALLBACK,            /* to callback function */
+} CopyDest;
+
+typedef void (*copy_data_dest_cb) (void *data, int len);
+
+/*
+ * This struct contains all the state variables used throughout a COPY TO
+ * operation.
+ *
+ * Multi-byte encodings: all supported client-side encodings encode multi-byte
+ * characters by having the first byte's high bit set. Subsequent bytes of the
+ * character can have the high bit not set. When scanning data in such an
+ * encoding to look for a match to a single-byte (ie ASCII) character, we must
+ * use the full pg_encoding_mblen() machinery to skip over multibyte
+ * characters, else we might find a false match to a trailing byte. In
+ * supported server encodings, there is no possibility of a false match, and
+ * it's faster to make useless comparisons to trailing bytes than it is to
+ * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true
+ * when we have to do it the hard way.
+ */
+typedef struct CopyToStateData
+{
+    /* format routine */
+    const CopyToRoutine *routine;
+
+    /* low-level state data */
+    CopyDest    copy_dest;        /* type of copy source/destination */
+    FILE       *copy_file;        /* used if copy_dest == COPY_FILE */
+    StringInfo    fe_msgbuf;        /* used for all dests during COPY TO */
+
+    int            file_encoding;    /* file or remote side's character encoding */
+    bool        need_transcoding;    /* file encoding diff from server? */
+    bool        encoding_embeds_ascii;    /* ASCII can be non-first byte? */
+
+    /* parameters from the COPY command */
+    Relation    rel;            /* relation to copy to */
+    QueryDesc  *queryDesc;        /* executable query to copy from */
+    List       *attnumlist;        /* integer list of attnums to copy */
+    char       *filename;        /* filename, or NULL for STDOUT */
+    bool        is_program;        /* is 'filename' a program to popen? */
+    copy_data_dest_cb data_dest_cb; /* function for writing data */
+
+    CopyFormatOptions opts;
+    Node       *whereClause;    /* WHERE condition (or NULL) */
+
+    /*
+     * Working state
+     */
+    MemoryContext copycontext;    /* per-copy execution context */
+
+    FmgrInfo   *out_functions;    /* lookup info for output functions */
+    MemoryContext rowcontext;    /* per-row evaluation context */
+    uint64        bytes_processed;    /* number of bytes processed so far */
+} CopyToStateData;
+
 #endif                            /* COPYAPI_H */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index c11b5ff3cc0..3863d26d5b7 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -19,171 +19,6 @@
 #include "commands/trigger.h"
 #include "nodes/miscnodes.h"
 
-/*
- * Represents the different source cases we need to worry about at
- * the bottom level
- */
-typedef enum CopySource
-{
-    COPY_FILE,                    /* from file (or a piped program) */
-    COPY_FRONTEND,                /* from frontend */
-    COPY_CALLBACK,                /* from callback function */
-} CopySource;
-
-/*
- *    Represents the end-of-line terminator type of the input
- */
-typedef enum EolType
-{
-    EOL_UNKNOWN,
-    EOL_NL,
-    EOL_CR,
-    EOL_CRNL,
-} EolType;
-
-/*
- * Represents the insert method to be used during COPY FROM.
- */
-typedef enum CopyInsertMethod
-{
-    CIM_SINGLE,                    /* use table_tuple_insert or ExecForeignInsert */
-    CIM_MULTI,                    /* always use table_multi_insert or
-                                 * ExecForeignBatchInsert */
-    CIM_MULTI_CONDITIONAL,        /* use table_multi_insert or
-                                 * ExecForeignBatchInsert only if valid */
-} CopyInsertMethod;
-
-/*
- * This struct contains all the state variables used throughout a COPY FROM
- * operation.
- */
-typedef struct CopyFromStateData
-{
-    /* format routine */
-    const CopyFromRoutine *routine;
-
-    /* low-level state data */
-    CopySource    copy_src;        /* type of copy source */
-    FILE       *copy_file;        /* used if copy_src == COPY_FILE */
-    StringInfo    fe_msgbuf;        /* used if copy_src == COPY_FRONTEND */
-
-    EolType        eol_type;        /* EOL type of input */
-    int            file_encoding;    /* file or remote side's character encoding */
-    bool        need_transcoding;    /* file encoding diff from server? */
-    Oid            conversion_proc;    /* encoding conversion function */
-
-    /* parameters from the COPY command */
-    Relation    rel;            /* relation to copy from */
-    List       *attnumlist;        /* integer list of attnums to copy */
-    char       *filename;        /* filename, or NULL for STDIN */
-    bool        is_program;        /* is 'filename' a program to popen? */
-    copy_data_source_cb data_source_cb; /* function for reading data */
-
-    CopyFormatOptions opts;
-    bool       *convert_select_flags;    /* per-column CSV/TEXT CS flags */
-    Node       *whereClause;    /* WHERE condition (or NULL) */
-
-    /* these are just for error messages, see CopyFromErrorCallback */
-    const char *cur_relname;    /* table name for error messages */
-    uint64        cur_lineno;        /* line number for error messages */
-    const char *cur_attname;    /* current att for error messages */
-    const char *cur_attval;        /* current att value for error messages */
-    bool        relname_only;    /* don't output line number, att, etc. */
-
-    /*
-     * Working state
-     */
-    MemoryContext copycontext;    /* per-copy execution context */
-
-    AttrNumber    num_defaults;    /* count of att that are missing and have
-                                 * default value */
-    FmgrInfo   *in_functions;    /* array of input functions for each attrs */
-    Oid           *typioparams;    /* array of element types for in_functions */
-    ErrorSaveContext *escontext;    /* soft error trapper during in_functions
-                                     * execution */
-    uint64        num_errors;        /* total number of rows which contained soft
-                                 * errors */
-    int           *defmap;            /* array of default att numbers related to
-                                 * missing att */
-    ExprState **defexprs;        /* array of default att expressions for all
-                                 * att */
-    bool       *defaults;        /* if DEFAULT marker was found for
-                                 * corresponding att */
-    bool        volatile_defexprs;    /* is any of defexprs volatile? */
-    List       *range_table;    /* single element list of RangeTblEntry */
-    List       *rteperminfos;    /* single element list of RTEPermissionInfo */
-    ExprState  *qualexpr;
-
-    TransitionCaptureState *transition_capture;
-
-    /*
-     * These variables are used to reduce overhead in COPY FROM.
-     *
-     * attribute_buf holds the separated, de-escaped text for each field of
-     * the current line.  The CopyReadAttributes functions return arrays of
-     * pointers into this buffer.  We avoid palloc/pfree overhead by re-using
-     * the buffer on each cycle.
-     *
-     * In binary COPY FROM, attribute_buf holds the binary data for the
-     * current field, but the usage is otherwise similar.
-     */
-    StringInfoData attribute_buf;
-
-    /* field raw data pointers found by COPY FROM */
-
-    int            max_fields;
-    char      **raw_fields;
-
-    /*
-     * Similarly, line_buf holds the whole input line being processed. The
-     * input cycle is first to read the whole line into line_buf, and then
-     * extract the individual attribute fields into attribute_buf.  line_buf
-     * is preserved unmodified so that we can display it in error messages if
-     * appropriate.  (In binary mode, line_buf is not used.)
-     */
-    StringInfoData line_buf;
-    bool        line_buf_valid; /* contains the row being processed? */
-
-    /*
-     * input_buf holds input data, already converted to database encoding.
-     *
-     * In text mode, CopyReadLine parses this data sufficiently to locate line
-     * boundaries, then transfers the data to line_buf. We guarantee that
-     * there is a \0 at input_buf[input_buf_len] at all times.  (In binary
-     * mode, input_buf is not used.)
-     *
-     * If encoding conversion is not required, input_buf is not a separate
-     * buffer but points directly to raw_buf.  In that case, input_buf_len
-     * tracks the number of bytes that have been verified as valid in the
-     * database encoding, and raw_buf_len is the total number of bytes stored
-     * in the buffer.
-     */
-#define INPUT_BUF_SIZE 65536    /* we palloc INPUT_BUF_SIZE+1 bytes */
-    char       *input_buf;
-    int            input_buf_index;    /* next byte to process */
-    int            input_buf_len;    /* total # of bytes stored */
-    bool        input_reached_eof;    /* true if we reached EOF */
-    bool        input_reached_error;    /* true if a conversion error happened */
-    /* Shorthand for number of unconsumed bytes available in input_buf */
-#define INPUT_BUF_BYTES(cstate) ((cstate)->input_buf_len - (cstate)->input_buf_index)
-
-    /*
-     * raw_buf holds raw input data read from the data source (file or client
-     * connection), not yet converted to the database encoding.  Like with
-     * 'input_buf', we guarantee that there is a \0 at raw_buf[raw_buf_len].
-     */
-#define RAW_BUF_SIZE 65536        /* we palloc RAW_BUF_SIZE+1 bytes */
-    char       *raw_buf;
-    int            raw_buf_index;    /* next byte to process */
-    int            raw_buf_len;    /* total # of bytes stored */
-    bool        raw_reached_eof;    /* true if we reached EOF */
-
-    /* Shorthand for number of unconsumed bytes available in raw_buf */
-#define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index)
-
-    uint64        bytes_processed;    /* number of bytes processed so far */
-} CopyFromStateData;
-
 extern void ReceiveCopyBegin(CopyFromState cstate);
 extern void ReceiveCopyBinaryHeader(CopyFromState cstate);
 
-- 
2.45.2
From 3f19b3ea56f3a234846b4078edcf85eb229df8ee Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Tue, 23 Jan 2024 15:12:43 +0900
Subject: [PATCH v20 5/5] Add support for implementing custom COPY TO/FROM
 format as extension
For custom COPY TO format implementation:
* Add CopyToStateData::opaque that can be used to keep data for custom
  COPY TO format implementation
* Export CopySendEndOfRow() to flush data in CopyToStateData::fe_msgbuf
  as CopyToStateFlush()
For custom COPY FROM format implementation:
* Add CopyFromStateData::opaque that can be used to keep data for
  custom COPY From format implementation
* Export CopyReadBinaryData() to read the next data as
  CopyFromStateRead()
---
 src/backend/commands/copyfromparse.c | 14 ++++++++++++++
 src/backend/commands/copyto.c        | 14 ++++++++++++++
 src/include/commands/copyapi.h       | 10 ++++++++++
 3 files changed, 38 insertions(+)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index ec86a17b3b3..64772877b0f 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -739,6 +739,20 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes)
     return copied_bytes;
 }
 
+/*
+ * CopyFromStateRead
+ *
+ * Export CopyReadBinaryData() for extensions. We want to keep
+ * CopyReadBinaryData() as a static function for
+ * optimization. CopyReadBinaryData() calls in this file may be optimized by
+ * a compiler.
+ */
+int
+CopyFromStateRead(CopyFromState cstate, char *dest, int nbytes)
+{
+    return CopyReadBinaryData(cstate, dest, nbytes);
+}
+
 /*
  * Read raw fields in the next line for COPY FROM in text or csv mode.
  * Return false if no more lines.
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 37b150b44ba..c99edae575b 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -496,6 +496,20 @@ CopySendEndOfRow(CopyToState cstate)
     resetStringInfo(fe_msgbuf);
 }
 
+/*
+ * CopyToStateFlush
+ *
+ * Export CopySendEndOfRow() for extensions. We want to keep
+ * CopySendEndOfRow() as a static function for
+ * optimization. CopySendEndOfRow() calls in this file may be optimized by a
+ * compiler.
+ */
+void
+CopyToStateFlush(CopyToState cstate)
+{
+    CopySendEndOfRow(cstate);
+}
+
 /*
  * These functions do apply some data conversion
  */
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 8a560903ede..c1e9fe366f3 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -299,8 +299,13 @@ typedef struct CopyFromStateData
 #define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index)
 
     uint64        bytes_processed;    /* number of bytes processed so far */
+
+    /* For custom format implementation */
+    void       *opaque;            /* private space */
 } CopyFromStateData;
 
+extern int    CopyFromStateRead(CopyFromState cstate, char *dest, int nbytes);
+
 
 typedef struct CopyToStateData *CopyToState;
 
@@ -402,6 +407,11 @@ typedef struct CopyToStateData
     FmgrInfo   *out_functions;    /* lookup info for output functions */
     MemoryContext rowcontext;    /* per-row evaluation context */
     uint64        bytes_processed;    /* number of bytes processed so far */
+
+    /* For custom format implementation */
+    void       *opaque;            /* private space */
 } CopyToStateData;
 
+extern void CopyToStateFlush(CopyToState cstate);
+
 #endif                            /* COPYAPI_H */
-- 
2.45.2
From e33abbfdf1bdf0c68d77a8ab367ae5a9800b613e Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Sat, 28 Sep 2024 23:24:49 +0900
Subject: [PATCH v21 01/10] Add CopyToRountine
It's for implementing custom COPY TO format. But this is not enough to
implement custom COPY TO format yet. We'll export some APIs to send
data and add "format" option to COPY TO later.
Existing text/csv/binary format implementations don't use
CopyToRoutine for now. We have a patch for it but we defer it. Because
there are some mysterious profile results in spite of we get faster
runtimes. See [1] for details.
[1] https://www.postgresql.org/message-id/ZdbtQJ-p5H1_EDwE%40paquier.xyz
Note that this doesn't change existing text/csv/binary format
implementations.
---
 src/backend/commands/copyto.c    | 31 ++++++++++++++---
 src/include/commands/copyapi.h   | 58 ++++++++++++++++++++++++++++++++
 src/tools/pgindent/typedefs.list |  1 +
 3 files changed, 86 insertions(+), 4 deletions(-)
 create mode 100644 src/include/commands/copyapi.h
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 91de442f434..3c5a97679aa 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -20,6 +20,7 @@
 
 #include "access/tableam.h"
 #include "commands/copy.h"
+#include "commands/copyapi.h"
 #include "commands/progress.h"
 #include "executor/execdesc.h"
 #include "executor/executor.h"
@@ -64,6 +65,9 @@ typedef enum CopyDest
  */
 typedef struct CopyToStateData
 {
+    /* format routine */
+    const CopyToRoutine *routine;
+
     /* low-level state data */
     CopyDest    copy_dest;        /* type of copy source/destination */
     FILE       *copy_file;        /* used if copy_dest == COPY_FILE */
@@ -772,14 +776,22 @@ DoCopyTo(CopyToState cstate)
         Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
 
         if (cstate->opts.binary)
+        {
             getTypeBinaryOutputInfo(attr->atttypid,
                                     &out_func_oid,
                                     &isvarlena);
+            fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
+        }
+        else if (cstate->routine)
+            cstate->routine->CopyToOutFunc(cstate, attr->atttypid,
+                                           &cstate->out_functions[attnum - 1]);
         else
+        {
             getTypeOutputInfo(attr->atttypid,
                               &out_func_oid,
                               &isvarlena);
-        fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
+            fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
+        }
     }
 
     /*
@@ -806,6 +818,8 @@ DoCopyTo(CopyToState cstate)
         tmp = 0;
         CopySendInt32(cstate, tmp);
     }
+    else if (cstate->routine)
+        cstate->routine->CopyToStart(cstate, tupDesc);
     else
     {
         /*
@@ -887,6 +901,8 @@ DoCopyTo(CopyToState cstate)
         /* Need to flush out the trailer */
         CopySendEndOfRow(cstate);
     }
+    else if (cstate->routine)
+        cstate->routine->CopyToEnd(cstate);
 
     MemoryContextDelete(cstate->rowcontext);
 
@@ -908,15 +924,22 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
     MemoryContextReset(cstate->rowcontext);
     oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
 
+    /* Make sure the tuple is fully deconstructed */
+    slot_getallattrs(slot);
+
+    if (cstate->routine)
+    {
+        cstate->routine->CopyToOneRow(cstate, slot);
+        MemoryContextSwitchTo(oldcontext);
+        return;
+    }
+
     if (cstate->opts.binary)
     {
         /* Binary per-tuple header */
         CopySendInt16(cstate, list_length(cstate->attnumlist));
     }
 
-    /* Make sure the tuple is fully deconstructed */
-    slot_getallattrs(slot);
-
     if (!cstate->opts.binary)
     {
         bool        need_delim = false;
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
new file mode 100644
index 00000000000..5ce24f195dc
--- /dev/null
+++ b/src/include/commands/copyapi.h
@@ -0,0 +1,58 @@
+/*-------------------------------------------------------------------------
+ *
+ * copyapi.h
+ *      API for COPY TO handlers
+ *
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/commands/copyapi.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef COPYAPI_H
+#define COPYAPI_H
+
+#include "executor/tuptable.h"
+#include "nodes/execnodes.h"
+
+/* This is private in commands/copyto.c */
+typedef struct CopyToStateData *CopyToState;
+
+/*
+ * API structure for a COPY TO format implementation.   Note this must be
+ * allocated in a server-lifetime manner, typically as a static const struct.
+ */
+typedef struct CopyToRoutine
+{
+    /*
+     * Called when COPY TO is started to set up the output functions
+     * associated with the relation's attributes reading from.  `finfo` can be
+     * optionally filled to provide the catalog information of the output
+     * function.  `atttypid` is the OID of data type used by the relation's
+     * attribute.
+     */
+    void        (*CopyToOutFunc) (CopyToState cstate, Oid atttypid,
+                                  FmgrInfo *finfo);
+
+    /*
+     * Called when COPY TO is started.
+     *
+     * `tupDesc` is the tuple descriptor of the relation from where the data
+     * is read.
+     */
+    void        (*CopyToStart) (CopyToState cstate, TupleDesc tupDesc);
+
+    /*
+     * Copy one row for COPY TO.
+     *
+     * `slot` is the tuple slot where the data is emitted.
+     */
+    void        (*CopyToOneRow) (CopyToState cstate, TupleTableSlot *slot);
+
+    /* Called when COPY TO has ended */
+    void        (*CopyToEnd) (CopyToState cstate);
+} CopyToRoutine;
+
+#endif                            /* COPYAPI_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 5fabb127d7e..8eb537bfa77 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -503,6 +503,7 @@ CopyMultiInsertInfo
 CopyOnErrorChoice
 CopySource
 CopyStmt
+CopyToRoutine
 CopyToState
 CopyToStateData
 Cost
-- 
2.45.2
From be1264ed79cdc5a7e529b05866f8b6fe589a705f Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Sat, 28 Sep 2024 23:26:29 +0900
Subject: [PATCH v21 02/10] Use CopyToRountine for the existing formats
The existing formats are text, csv and binary. If we find any
performance regression by this, we will not merge this to master.
This will increase indirect function call costs but this will reduce
runtime "if (cstate->opts.binary)" and "if (cstate->opts.csv_mode)"
branch costs.
This uses an optimization based of static inline function and a
constant argument call for cstate->opts.csv_mode. For example,
CopyToTextLikeOneRow() uses this optimization. It accepts the "bool
is_csv" argument instead of using cstate->opts.csv_mode in
it. CopyToTextOneRow() calls CopyToTextLikeOneRow() with
false (constant) for "bool is_csv". Compiler will remove "if (is_csv)"
branch in it by this optimization.
This doesn't change existing logic. This just moves existing codes.
---
 src/backend/commands/copyto.c | 477 +++++++++++++++++++++++-----------
 1 file changed, 319 insertions(+), 158 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 3c5a97679aa..86dc1b742cc 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -128,6 +128,317 @@ static void CopySendEndOfRow(CopyToState cstate);
 static void CopySendInt32(CopyToState cstate, int32 val);
 static void CopySendInt16(CopyToState cstate, int16 val);
 
+/*
+ * CopyToRoutine implementations.
+ */
+
+/*
+ * CopyToTextLikeSendEndOfRow
+ *
+ * Apply line terminations for a line sent in text or CSV format depending
+ * on the destination, then send the end of a row.
+ */
+static inline void
+CopyToTextLikeSendEndOfRow(CopyToState cstate)
+{
+    switch (cstate->copy_dest)
+    {
+        case COPY_FILE:
+            /* Default line termination depends on platform */
+#ifndef WIN32
+            CopySendChar(cstate, '\n');
+#else
+            CopySendString(cstate, "\r\n");
+#endif
+            break;
+        case COPY_FRONTEND:
+            /* The FE/BE protocol uses \n as newline for all platforms */
+            CopySendChar(cstate, '\n');
+            break;
+        default:
+            break;
+    }
+
+    /* Now take the actions related to the end of a row */
+    CopySendEndOfRow(cstate);
+}
+
+/*
+ * CopyToTextLikeStart
+ *
+ * Start of COPY TO for text and CSV format.
+ */
+static void
+CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
+{
+    /*
+     * For non-binary copy, we need to convert null_print to file encoding,
+     * because it will be sent directly with CopySendString.
+     */
+    if (cstate->need_transcoding)
+        cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print,
+                                                          cstate->opts.null_print_len,
+                                                          cstate->file_encoding);
+
+    /* if a header has been requested send the line */
+    if (cstate->opts.header_line)
+    {
+        ListCell   *cur;
+        bool        hdr_delim = false;
+
+        foreach(cur, cstate->attnumlist)
+        {
+            int            attnum = lfirst_int(cur);
+            char       *colname;
+
+            if (hdr_delim)
+                CopySendChar(cstate, cstate->opts.delim[0]);
+            hdr_delim = true;
+
+            colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
+
+            if (cstate->opts.csv_mode)
+                CopyAttributeOutCSV(cstate, colname, false);
+            else
+                CopyAttributeOutText(cstate, colname);
+        }
+
+        CopyToTextLikeSendEndOfRow(cstate);
+    }
+}
+
+/*
+ * CopyToTextLikeOutFunc
+ *
+ * Assign output function data for a relation's attribute in text/CSV format.
+ */
+static void
+CopyToTextLikeOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo)
+{
+    Oid            func_oid;
+    bool        is_varlena;
+
+    /* Set output function for an attribute */
+    getTypeOutputInfo(atttypid, &func_oid, &is_varlena);
+    fmgr_info(func_oid, finfo);
+}
+
+
+/*
+ * CopyToTextLikeOneRow
+ *
+ * Process one row for text/CSV format.
+ *
+ * Workhorse for CopyToTextOneRow() and CopyToCSVOneRow().
+ */
+static inline void
+CopyToTextLikeOneRow(CopyToState cstate,
+                     TupleTableSlot *slot,
+                     bool is_csv)
+{
+    bool        need_delim = false;
+    FmgrInfo   *out_functions = cstate->out_functions;
+
+    foreach_int(attnum, cstate->attnumlist)
+    {
+        Datum        value = slot->tts_values[attnum - 1];
+        bool        isnull = slot->tts_isnull[attnum - 1];
+
+        if (need_delim)
+            CopySendChar(cstate, cstate->opts.delim[0]);
+        need_delim = true;
+
+        if (isnull)
+        {
+            CopySendString(cstate, cstate->opts.null_print_client);
+        }
+        else
+        {
+            char       *string;
+
+            string = OutputFunctionCall(&out_functions[attnum - 1],
+                                        value);
+
+            /*
+             * is_csv will be optimized away by compiler, as argument is
+             * constant at caller.
+             */
+            if (is_csv)
+                CopyAttributeOutCSV(cstate, string,
+                                    cstate->opts.force_quote_flags[attnum - 1]);
+            else
+                CopyAttributeOutText(cstate, string);
+        }
+    }
+
+    CopyToTextLikeSendEndOfRow(cstate);
+}
+
+/*
+ * CopyToTextOneRow
+ *
+ * Per-row callback for COPY TO with text format.
+ */
+static void
+CopyToTextOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+    CopyToTextLikeOneRow(cstate, slot, false);
+}
+
+/*
+ * CopyToTextOneRow
+ *
+ * Per-row callback for COPY TO with CSV format.
+ */
+static void
+CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+    CopyToTextLikeOneRow(cstate, slot, true);
+}
+
+/*
+ * CopyToTextLikeEnd
+ *
+ * End of COPY TO for text/CSV format.
+ */
+static void
+CopyToTextLikeEnd(CopyToState cstate)
+{
+    /* Nothing to do here */
+}
+
+/*
+ * CopyToRoutine implementation for "binary".
+ */
+
+/*
+ * CopyToBinaryStart
+ *
+ * Start of COPY TO for binary format.
+ */
+static void
+CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc)
+{
+    /* Generate header for a binary copy */
+    int32        tmp;
+
+    /* Signature */
+    CopySendData(cstate, BinarySignature, 11);
+    /* Flags field */
+    tmp = 0;
+    CopySendInt32(cstate, tmp);
+    /* No header extension */
+    tmp = 0;
+    CopySendInt32(cstate, tmp);
+}
+
+/*
+ * CopyToBinaryOutFunc
+ *
+ * Assign output function data for a relation's attribute in binary format.
+ */
+static void
+CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo)
+{
+    Oid            func_oid;
+    bool        is_varlena;
+
+    /* Set output function for an attribute */
+    getTypeBinaryOutputInfo(atttypid, &func_oid, &is_varlena);
+    fmgr_info(func_oid, finfo);
+}
+
+/*
+ * CopyToBinaryOneRow
+ *
+ * Process one row for binary format.
+ */
+static void
+CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+    FmgrInfo   *out_functions = cstate->out_functions;
+
+    /* Binary per-tuple header */
+    CopySendInt16(cstate, list_length(cstate->attnumlist));
+
+    foreach_int(attnum, cstate->attnumlist)
+    {
+        Datum        value = slot->tts_values[attnum - 1];
+        bool        isnull = slot->tts_isnull[attnum - 1];
+
+        if (isnull)
+        {
+            CopySendInt32(cstate, -1);
+        }
+        else
+        {
+            bytea       *outputbytes;
+
+            outputbytes = SendFunctionCall(&out_functions[attnum - 1],
+                                           value);
+            CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
+            CopySendData(cstate, VARDATA(outputbytes),
+                         VARSIZE(outputbytes) - VARHDRSZ);
+        }
+    }
+
+    CopySendEndOfRow(cstate);
+}
+
+/*
+ * CopyToBinaryEnd
+ *
+ * End of COPY TO for binary format.
+ */
+static void
+CopyToBinaryEnd(CopyToState cstate)
+{
+    /* Generate trailer for a binary copy */
+    CopySendInt16(cstate, -1);
+    /* Need to flush out the trailer */
+    CopySendEndOfRow(cstate);
+}
+
+/*
+ * CSV and text share the same implementation, at the exception of the
+ * output representation and per-row callbacks.
+ */
+static const CopyToRoutine CopyToRoutineText = {
+    .CopyToStart = CopyToTextLikeStart,
+    .CopyToOutFunc = CopyToTextLikeOutFunc,
+    .CopyToOneRow = CopyToTextOneRow,
+    .CopyToEnd = CopyToTextLikeEnd,
+};
+
+static const CopyToRoutine CopyToRoutineCSV = {
+    .CopyToStart = CopyToTextLikeStart,
+    .CopyToOutFunc = CopyToTextLikeOutFunc,
+    .CopyToOneRow = CopyToCSVOneRow,
+    .CopyToEnd = CopyToTextLikeEnd,
+};
+
+static const CopyToRoutine CopyToRoutineBinary = {
+    .CopyToStart = CopyToBinaryStart,
+    .CopyToOutFunc = CopyToBinaryOutFunc,
+    .CopyToOneRow = CopyToBinaryOneRow,
+    .CopyToEnd = CopyToBinaryEnd,
+};
+
+/*
+ * Define the COPY TO routines to use for a format.  This should be called
+ * after options are parsed.
+ */
+static const CopyToRoutine *
+CopyToGetRoutine(CopyFormatOptions opts)
+{
+    if (opts.csv_mode)
+        return &CopyToRoutineCSV;
+    else if (opts.binary)
+        return &CopyToRoutineBinary;
+
+    /* default is text */
+    return &CopyToRoutineText;
+}
 
 /*
  * Send copy start/stop messages for frontend copies.  These have changed
@@ -195,16 +506,6 @@ CopySendEndOfRow(CopyToState cstate)
     switch (cstate->copy_dest)
     {
         case COPY_FILE:
-            if (!cstate->opts.binary)
-            {
-                /* Default line termination depends on platform */
-#ifndef WIN32
-                CopySendChar(cstate, '\n');
-#else
-                CopySendString(cstate, "\r\n");
-#endif
-            }
-
             if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1,
                        cstate->copy_file) != 1 ||
                 ferror(cstate->copy_file))
@@ -239,10 +540,6 @@ CopySendEndOfRow(CopyToState cstate)
             }
             break;
         case COPY_FRONTEND:
-            /* The FE/BE protocol uses \n as newline for all platforms */
-            if (!cstate->opts.binary)
-                CopySendChar(cstate, '\n');
-
             /* Dump the accumulated row as one CopyData message */
             (void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len);
             break;
@@ -430,6 +727,9 @@ BeginCopyTo(ParseState *pstate,
     /* Extract options from the statement node tree */
     ProcessCopyOptions(pstate, &cstate->opts, false /* is_from */ , options);
 
+    /* Set format routine */
+    cstate->routine = CopyToGetRoutine(cstate->opts);
+
     /* Process the source/target relation or query */
     if (rel)
     {
@@ -771,27 +1071,10 @@ DoCopyTo(CopyToState cstate)
     foreach(cur, cstate->attnumlist)
     {
         int            attnum = lfirst_int(cur);
-        Oid            out_func_oid;
-        bool        isvarlena;
         Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
 
-        if (cstate->opts.binary)
-        {
-            getTypeBinaryOutputInfo(attr->atttypid,
-                                    &out_func_oid,
-                                    &isvarlena);
-            fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
-        }
-        else if (cstate->routine)
-            cstate->routine->CopyToOutFunc(cstate, attr->atttypid,
-                                           &cstate->out_functions[attnum - 1]);
-        else
-        {
-            getTypeOutputInfo(attr->atttypid,
-                              &out_func_oid,
-                              &isvarlena);
-            fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
-        }
+        cstate->routine->CopyToOutFunc(cstate, attr->atttypid,
+                                       &cstate->out_functions[attnum - 1]);
     }
 
     /*
@@ -804,58 +1087,7 @@ DoCopyTo(CopyToState cstate)
                                                "COPY TO",
                                                ALLOCSET_DEFAULT_SIZES);
 
-    if (cstate->opts.binary)
-    {
-        /* Generate header for a binary copy */
-        int32        tmp;
-
-        /* Signature */
-        CopySendData(cstate, BinarySignature, 11);
-        /* Flags field */
-        tmp = 0;
-        CopySendInt32(cstate, tmp);
-        /* No header extension */
-        tmp = 0;
-        CopySendInt32(cstate, tmp);
-    }
-    else if (cstate->routine)
-        cstate->routine->CopyToStart(cstate, tupDesc);
-    else
-    {
-        /*
-         * For non-binary copy, we need to convert null_print to file
-         * encoding, because it will be sent directly with CopySendString.
-         */
-        if (cstate->need_transcoding)
-            cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print,
-                                                              cstate->opts.null_print_len,
-                                                              cstate->file_encoding);
-
-        /* if a header has been requested send the line */
-        if (cstate->opts.header_line)
-        {
-            bool        hdr_delim = false;
-
-            foreach(cur, cstate->attnumlist)
-            {
-                int            attnum = lfirst_int(cur);
-                char       *colname;
-
-                if (hdr_delim)
-                    CopySendChar(cstate, cstate->opts.delim[0]);
-                hdr_delim = true;
-
-                colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
-
-                if (cstate->opts.csv_mode)
-                    CopyAttributeOutCSV(cstate, colname, false);
-                else
-                    CopyAttributeOutText(cstate, colname);
-            }
-
-            CopySendEndOfRow(cstate);
-        }
-    }
+    cstate->routine->CopyToStart(cstate, tupDesc);
 
     if (cstate->rel)
     {
@@ -894,15 +1126,7 @@ DoCopyTo(CopyToState cstate)
         processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
     }
 
-    if (cstate->opts.binary)
-    {
-        /* Generate trailer for a binary copy */
-        CopySendInt16(cstate, -1);
-        /* Need to flush out the trailer */
-        CopySendEndOfRow(cstate);
-    }
-    else if (cstate->routine)
-        cstate->routine->CopyToEnd(cstate);
+    cstate->routine->CopyToEnd(cstate);
 
     MemoryContextDelete(cstate->rowcontext);
 
@@ -918,7 +1142,6 @@ DoCopyTo(CopyToState cstate)
 static void
 CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 {
-    FmgrInfo   *out_functions = cstate->out_functions;
     MemoryContext oldcontext;
 
     MemoryContextReset(cstate->rowcontext);
@@ -927,69 +1150,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
     /* Make sure the tuple is fully deconstructed */
     slot_getallattrs(slot);
 
-    if (cstate->routine)
-    {
-        cstate->routine->CopyToOneRow(cstate, slot);
-        MemoryContextSwitchTo(oldcontext);
-        return;
-    }
-
-    if (cstate->opts.binary)
-    {
-        /* Binary per-tuple header */
-        CopySendInt16(cstate, list_length(cstate->attnumlist));
-    }
-
-    if (!cstate->opts.binary)
-    {
-        bool        need_delim = false;
-
-        foreach_int(attnum, cstate->attnumlist)
-        {
-            Datum        value = slot->tts_values[attnum - 1];
-            bool        isnull = slot->tts_isnull[attnum - 1];
-            char       *string;
-
-            if (need_delim)
-                CopySendChar(cstate, cstate->opts.delim[0]);
-            need_delim = true;
-
-            if (isnull)
-                CopySendString(cstate, cstate->opts.null_print_client);
-            else
-            {
-                string = OutputFunctionCall(&out_functions[attnum - 1],
-                                            value);
-                if (cstate->opts.csv_mode)
-                    CopyAttributeOutCSV(cstate, string,
-                                        cstate->opts.force_quote_flags[attnum - 1]);
-                else
-                    CopyAttributeOutText(cstate, string);
-            }
-        }
-    }
-    else
-    {
-        foreach_int(attnum, cstate->attnumlist)
-        {
-            Datum        value = slot->tts_values[attnum - 1];
-            bool        isnull = slot->tts_isnull[attnum - 1];
-            bytea       *outputbytes;
-
-            if (isnull)
-                CopySendInt32(cstate, -1);
-            else
-            {
-                outputbytes = SendFunctionCall(&out_functions[attnum - 1],
-                                               value);
-                CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
-                CopySendData(cstate, VARDATA(outputbytes),
-                             VARSIZE(outputbytes) - VARHDRSZ);
-            }
-        }
-    }
-
-    CopySendEndOfRow(cstate);
+    cstate->routine->CopyToOneRow(cstate, slot);
 
     MemoryContextSwitchTo(oldcontext);
 }
-- 
2.45.2
From 9c6123756023e7515133c649a14be6d294a31f29 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Sat, 28 Sep 2024 23:40:54 +0900
Subject: [PATCH v21 03/10] Add support for adding custom COPY TO format
This uses the handler approach like tablesample. The approach creates
an internal function that returns an internal struct. In this case,
a COPY TO handler returns a CopyToRoutine.
This also add a test module for custom COPY TO handler.
---
 src/backend/commands/copy.c                   | 82 ++++++++++++++++---
 src/backend/commands/copyto.c                 |  4 +-
 src/backend/nodes/Makefile                    |  1 +
 src/backend/nodes/gen_node_support.pl         |  2 +
 src/backend/utils/adt/pseudotypes.c           |  1 +
 src/include/catalog/pg_proc.dat               |  6 ++
 src/include/catalog/pg_type.dat               |  6 ++
 src/include/commands/copy.h                   |  1 +
 src/include/commands/copyapi.h                |  2 +
 src/include/nodes/meson.build                 |  1 +
 src/test/modules/Makefile                     |  1 +
 src/test/modules/meson.build                  |  1 +
 src/test/modules/test_copy_format/.gitignore  |  4 +
 src/test/modules/test_copy_format/Makefile    | 23 ++++++
 .../expected/test_copy_format.out             | 17 ++++
 src/test/modules/test_copy_format/meson.build | 33 ++++++++
 .../test_copy_format/sql/test_copy_format.sql |  5 ++
 .../test_copy_format--1.0.sql                 |  8 ++
 .../test_copy_format/test_copy_format.c       | 63 ++++++++++++++
 .../test_copy_format/test_copy_format.control |  4 +
 20 files changed, 251 insertions(+), 14 deletions(-)
 create mode 100644 src/test/modules/test_copy_format/.gitignore
 create mode 100644 src/test/modules/test_copy_format/Makefile
 create mode 100644 src/test/modules/test_copy_format/expected/test_copy_format.out
 create mode 100644 src/test/modules/test_copy_format/meson.build
 create mode 100644 src/test/modules/test_copy_format/sql/test_copy_format.sql
 create mode 100644 src/test/modules/test_copy_format/test_copy_format--1.0.sql
 create mode 100644 src/test/modules/test_copy_format/test_copy_format.c
 create mode 100644 src/test/modules/test_copy_format/test_copy_format.control
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3bb579a3a44..3aea654ab8a 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -32,6 +32,7 @@
 #include "parser/parse_coerce.h"
 #include "parser/parse_collate.h"
 #include "parser/parse_expr.h"
+#include "parser/parse_func.h"
 #include "parser/parse_relation.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
@@ -443,6 +444,73 @@ defGetCopyLogVerbosityChoice(DefElem *def, ParseState *pstate)
     return COPY_LOG_VERBOSITY_DEFAULT;    /* keep compiler quiet */
 }
 
+/*
+ * Process the "format" option.
+ *
+ * This function checks whether the option value is a built-in format such as
+ * "text" and "csv" or not. If the option value isn't a built-in format, this
+ * function finds a COPY format handler that returns a CopyToRoutine (for
+ * is_from == false). If no COPY format handler is found, this function
+ * reports an error.
+ */
+static void
+ProcessCopyOptionFormat(ParseState *pstate,
+                        CopyFormatOptions *opts_out,
+                        bool is_from,
+                        DefElem *defel)
+{
+    char       *format;
+    Oid            funcargtypes[1];
+    Oid            handlerOid = InvalidOid;
+    Datum        datum;
+    Node       *routine;
+
+    format = defGetString(defel);
+
+    /* built-in formats */
+    if (strcmp(format, "text") == 0)
+         /* default format */ return;
+    else if (strcmp(format, "csv") == 0)
+    {
+        opts_out->csv_mode = true;
+        return;
+    }
+    else if (strcmp(format, "binary") == 0)
+    {
+        opts_out->binary = true;
+        return;
+    }
+
+    /* custom format */
+    if (!is_from)
+    {
+        funcargtypes[0] = INTERNALOID;
+        handlerOid = LookupFuncName(list_make1(makeString(format)), 1,
+                                    funcargtypes, true);
+    }
+    if (!OidIsValid(handlerOid))
+        ereport(ERROR,
+                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                 errmsg("COPY format \"%s\" not recognized", format),
+                 parser_errposition(pstate, defel->location)));
+
+    datum = OidFunctionCall1(handlerOid, BoolGetDatum(is_from));
+    routine = (Node *) DatumGetPointer(datum);
+    if (routine == NULL || !IsA(routine, CopyToRoutine))
+        ereport(
+                ERROR,
+                (errcode(
+                         ERRCODE_INVALID_PARAMETER_VALUE),
+                 errmsg("COPY handler function "
+                        "%s(%u) did not return a "
+                        "CopyToRoutine struct",
+                        format, handlerOid),
+                 parser_errposition(
+                                    pstate, defel->location)));
+
+    opts_out->routine = routine;
+}
+
 /*
  * Process the statement option list for COPY.
  *
@@ -485,22 +553,10 @@ ProcessCopyOptions(ParseState *pstate,
 
         if (strcmp(defel->defname, "format") == 0)
         {
-            char       *fmt = defGetString(defel);
-
             if (format_specified)
                 errorConflictingDefElem(defel, pstate);
             format_specified = true;
-            if (strcmp(fmt, "text") == 0)
-                 /* default format */ ;
-            else if (strcmp(fmt, "csv") == 0)
-                opts_out->csv_mode = true;
-            else if (strcmp(fmt, "binary") == 0)
-                opts_out->binary = true;
-            else
-                ereport(ERROR,
-                        (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-                         errmsg("COPY format \"%s\" not recognized", fmt),
-                         parser_errposition(pstate, defel->location)));
+            ProcessCopyOptionFormat(pstate, opts_out, is_from, defel);
         }
         else if (strcmp(defel->defname, "freeze") == 0)
         {
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 86dc1b742cc..8ddbddb119d 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -431,7 +431,9 @@ static const CopyToRoutine CopyToRoutineBinary = {
 static const CopyToRoutine *
 CopyToGetRoutine(CopyFormatOptions opts)
 {
-    if (opts.csv_mode)
+    if (opts.routine)
+        return (const CopyToRoutine *) opts.routine;
+    else if (opts.csv_mode)
         return &CopyToRoutineCSV;
     else if (opts.binary)
         return &CopyToRoutineBinary;
diff --git a/src/backend/nodes/Makefile b/src/backend/nodes/Makefile
index 66bbad8e6e0..173ee11811c 100644
--- a/src/backend/nodes/Makefile
+++ b/src/backend/nodes/Makefile
@@ -49,6 +49,7 @@ node_headers = \
     access/sdir.h \
     access/tableam.h \
     access/tsmapi.h \
+    commands/copyapi.h \
     commands/event_trigger.h \
     commands/trigger.h \
     executor/tuptable.h \
diff --git a/src/backend/nodes/gen_node_support.pl b/src/backend/nodes/gen_node_support.pl
index 81df3bdf95f..428ab4f0d93 100644
--- a/src/backend/nodes/gen_node_support.pl
+++ b/src/backend/nodes/gen_node_support.pl
@@ -61,6 +61,7 @@ my @all_input_files = qw(
   access/sdir.h
   access/tableam.h
   access/tsmapi.h
+  commands/copyapi.h
   commands/event_trigger.h
   commands/trigger.h
   executor/tuptable.h
@@ -85,6 +86,7 @@ my @nodetag_only_files = qw(
   access/sdir.h
   access/tableam.h
   access/tsmapi.h
+  commands/copyapi.h
   commands/event_trigger.h
   commands/trigger.h
   executor/tuptable.h
diff --git a/src/backend/utils/adt/pseudotypes.c b/src/backend/utils/adt/pseudotypes.c
index e189e9b79d2..25f24ab95d2 100644
--- a/src/backend/utils/adt/pseudotypes.c
+++ b/src/backend/utils/adt/pseudotypes.c
@@ -370,6 +370,7 @@ PSEUDOTYPE_DUMMY_IO_FUNCS(fdw_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(table_am_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(index_am_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(tsm_handler);
+PSEUDOTYPE_DUMMY_IO_FUNCS(copy_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(internal);
 PSEUDOTYPE_DUMMY_IO_FUNCS(anyelement);
 PSEUDOTYPE_DUMMY_IO_FUNCS(anynonarray);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 322114d72a7..f108780e8b6 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -7741,6 +7741,12 @@
 { oid => '3312', descr => 'I/O',
   proname => 'tsm_handler_out', prorettype => 'cstring',
   proargtypes => 'tsm_handler', prosrc => 'tsm_handler_out' },
+{ oid => '8753', descr => 'I/O',
+  proname => 'copy_handler_in', proisstrict => 'f', prorettype => 'copy_handler',
+  proargtypes => 'cstring', prosrc => 'copy_handler_in' },
+{ oid => '8754', descr => 'I/O',
+  proname => 'copy_handler_out', prorettype => 'cstring',
+  proargtypes => 'copy_handler', prosrc => 'copy_handler_out' },
 { oid => '267', descr => 'I/O',
   proname => 'table_am_handler_in', proisstrict => 'f',
   prorettype => 'table_am_handler', proargtypes => 'cstring',
diff --git a/src/include/catalog/pg_type.dat b/src/include/catalog/pg_type.dat
index ceff66ccde1..793dd671935 100644
--- a/src/include/catalog/pg_type.dat
+++ b/src/include/catalog/pg_type.dat
@@ -633,6 +633,12 @@
   typcategory => 'P', typinput => 'tsm_handler_in',
   typoutput => 'tsm_handler_out', typreceive => '-', typsend => '-',
   typalign => 'i' },
+{ oid => '8752',
+  descr => 'pseudo-type for the result of a copy to method function',
+  typname => 'copy_handler', typlen => '4', typbyval => 't', typtype => 'p',
+  typcategory => 'P', typinput => 'copy_handler_in',
+  typoutput => 'copy_handler_out', typreceive => '-', typsend => '-',
+  typalign => 'i' },
 { oid => '269',
   descr => 'pseudo-type for the result of a table AM handler function',
   typname => 'table_am_handler', typlen => '4', typbyval => 't', typtype => 'p',
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 141fd48dc10..e9ed8443210 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -84,6 +84,7 @@ typedef struct CopyFormatOptions
     CopyOnErrorChoice on_error; /* what to do when error happened */
     CopyLogVerbosityChoice log_verbosity;    /* verbosity of logged messages */
     List       *convert_select; /* list of column names (can be NIL) */
+    Node       *routine;        /* CopyToRoutine (can be NULL) */
 } CopyFormatOptions;
 
 /* These are private in commands/copy[from|to].c */
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 5ce24f195dc..05b7d92ddba 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -26,6 +26,8 @@ typedef struct CopyToStateData *CopyToState;
  */
 typedef struct CopyToRoutine
 {
+    NodeTag        type;
+
     /*
      * Called when COPY TO is started to set up the output functions
      * associated with the relation's attributes reading from.  `finfo` can be
diff --git a/src/include/nodes/meson.build b/src/include/nodes/meson.build
index b665e55b657..103df1a7873 100644
--- a/src/include/nodes/meson.build
+++ b/src/include/nodes/meson.build
@@ -11,6 +11,7 @@ node_support_input_i = [
   'access/sdir.h',
   'access/tableam.h',
   'access/tsmapi.h',
+  'commands/copyapi.h',
   'commands/event_trigger.h',
   'commands/trigger.h',
   'executor/tuptable.h',
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index 256799f520a..b7b46928a19 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -15,6 +15,7 @@ SUBDIRS = \
           spgist_name_ops \
           test_bloomfilter \
           test_copy_callbacks \
+          test_copy_format \
           test_custom_rmgrs \
           test_ddl_deparse \
           test_dsa \
diff --git a/src/test/modules/meson.build b/src/test/modules/meson.build
index d8fe059d236..c42b4b2b31f 100644
--- a/src/test/modules/meson.build
+++ b/src/test/modules/meson.build
@@ -14,6 +14,7 @@ subdir('spgist_name_ops')
 subdir('ssl_passphrase_callback')
 subdir('test_bloomfilter')
 subdir('test_copy_callbacks')
+subdir('test_copy_format')
 subdir('test_custom_rmgrs')
 subdir('test_ddl_deparse')
 subdir('test_dsa')
diff --git a/src/test/modules/test_copy_format/.gitignore b/src/test/modules/test_copy_format/.gitignore
new file mode 100644
index 00000000000..5dcb3ff9723
--- /dev/null
+++ b/src/test/modules/test_copy_format/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/src/test/modules/test_copy_format/Makefile b/src/test/modules/test_copy_format/Makefile
new file mode 100644
index 00000000000..8497f91624d
--- /dev/null
+++ b/src/test/modules/test_copy_format/Makefile
@@ -0,0 +1,23 @@
+# src/test/modules/test_copy_format/Makefile
+
+MODULE_big = test_copy_format
+OBJS = \
+    $(WIN32RES) \
+    test_copy_format.o
+PGFILEDESC = "test_copy_format - test custom COPY FORMAT"
+
+EXTENSION = test_copy_format
+DATA = test_copy_format--1.0.sql
+
+REGRESS = test_copy_format
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_copy_format
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out
b/src/test/modules/test_copy_format/expected/test_copy_format.out
new file mode 100644
index 00000000000..606c78f6878
--- /dev/null
+++ b/src/test/modules/test_copy_format/expected/test_copy_format.out
@@ -0,0 +1,17 @@
+CREATE EXTENSION test_copy_format;
+CREATE TABLE public.test (a smallint, b integer, c bigint);
+INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+COPY public.test FROM stdin WITH (format 'test_copy_format');
+ERROR:  COPY format "test_copy_format" not recognized
+LINE 1: COPY public.test FROM stdin WITH (format 'test_copy_format')...
+                                          ^
+COPY public.test TO stdout WITH (format 'test_copy_format');
+NOTICE:  test_copy_format: is_from=false
+NOTICE:  CopyToOutFunc: atttypid=21
+NOTICE:  CopyToOutFunc: atttypid=23
+NOTICE:  CopyToOutFunc: atttypid=20
+NOTICE:  CopyToStart: natts=3
+NOTICE:  CopyToOneRow: tts_nvalid=3
+NOTICE:  CopyToOneRow: tts_nvalid=3
+NOTICE:  CopyToOneRow: tts_nvalid=3
+NOTICE:  CopyToEnd
diff --git a/src/test/modules/test_copy_format/meson.build b/src/test/modules/test_copy_format/meson.build
new file mode 100644
index 00000000000..4cefe7b709a
--- /dev/null
+++ b/src/test/modules/test_copy_format/meson.build
@@ -0,0 +1,33 @@
+# Copyright (c) 2024, PostgreSQL Global Development Group
+
+test_copy_format_sources = files(
+  'test_copy_format.c',
+)
+
+if host_system == 'windows'
+  test_copy_format_sources += rc_lib_gen.process(win32ver_rc, extra_args: [
+    '--NAME', 'test_copy_format',
+    '--FILEDESC', 'test_copy_format - test custom COPY FORMAT',])
+endif
+
+test_copy_format = shared_module('test_copy_format',
+  test_copy_format_sources,
+  kwargs: pg_test_mod_args,
+)
+test_install_libs += test_copy_format
+
+test_install_data += files(
+  'test_copy_format.control',
+  'test_copy_format--1.0.sql',
+)
+
+tests += {
+  'name': 'test_copy_format',
+  'sd': meson.current_source_dir(),
+  'bd': meson.current_build_dir(),
+  'regress': {
+    'sql': [
+      'test_copy_format',
+    ],
+  },
+}
diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql
b/src/test/modules/test_copy_format/sql/test_copy_format.sql
new file mode 100644
index 00000000000..9406b3be3d4
--- /dev/null
+++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql
@@ -0,0 +1,5 @@
+CREATE EXTENSION test_copy_format;
+CREATE TABLE public.test (a smallint, b integer, c bigint);
+INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+COPY public.test FROM stdin WITH (format 'test_copy_format');
+COPY public.test TO stdout WITH (format 'test_copy_format');
diff --git a/src/test/modules/test_copy_format/test_copy_format--1.0.sql
b/src/test/modules/test_copy_format/test_copy_format--1.0.sql
new file mode 100644
index 00000000000..d24ea03ce99
--- /dev/null
+++ b/src/test/modules/test_copy_format/test_copy_format--1.0.sql
@@ -0,0 +1,8 @@
+/* src/test/modules/test_copy_format/test_copy_format--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_copy_format" to load this file. \quit
+
+CREATE FUNCTION test_copy_format(internal)
+    RETURNS copy_handler
+    AS 'MODULE_PATHNAME' LANGUAGE C;
diff --git a/src/test/modules/test_copy_format/test_copy_format.c
b/src/test/modules/test_copy_format/test_copy_format.c
new file mode 100644
index 00000000000..e064f40473b
--- /dev/null
+++ b/src/test/modules/test_copy_format/test_copy_format.c
@@ -0,0 +1,63 @@
+/*--------------------------------------------------------------------------
+ *
+ * test_copy_format.c
+ *        Code for testing custom COPY format.
+ *
+ * Portions Copyright (c) 2024, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *        src/test/modules/test_copy_format/test_copy_format.c
+ *
+ * -------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "commands/copyapi.h"
+#include "commands/defrem.h"
+
+PG_MODULE_MAGIC;
+
+static void
+CopyToOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo)
+{
+    ereport(NOTICE, (errmsg("CopyToOutFunc: atttypid=%d", atttypid)));
+}
+
+static void
+CopyToStart(CopyToState cstate, TupleDesc tupDesc)
+{
+    ereport(NOTICE, (errmsg("CopyToStart: natts=%d", tupDesc->natts)));
+}
+
+static void
+CopyToOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+    ereport(NOTICE, (errmsg("CopyToOneRow: tts_nvalid=%u", slot->tts_nvalid)));
+}
+
+static void
+CopyToEnd(CopyToState cstate)
+{
+    ereport(NOTICE, (errmsg("CopyToEnd")));
+}
+
+static const CopyToRoutine CopyToRoutineTestCopyFormat = {
+    .type = T_CopyToRoutine,
+    .CopyToOutFunc = CopyToOutFunc,
+    .CopyToStart = CopyToStart,
+    .CopyToOneRow = CopyToOneRow,
+    .CopyToEnd = CopyToEnd,
+};
+
+PG_FUNCTION_INFO_V1(test_copy_format);
+Datum
+test_copy_format(PG_FUNCTION_ARGS)
+{
+    bool        is_from = PG_GETARG_BOOL(0);
+
+    ereport(NOTICE,
+            (errmsg("test_copy_format: is_from=%s", is_from ? "true" : "false")));
+
+    PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat);
+}
diff --git a/src/test/modules/test_copy_format/test_copy_format.control
b/src/test/modules/test_copy_format/test_copy_format.control
new file mode 100644
index 00000000000..f05a6362358
--- /dev/null
+++ b/src/test/modules/test_copy_format/test_copy_format.control
@@ -0,0 +1,4 @@
+comment = 'Test code for custom COPY format'
+default_version = '1.0'
+module_pathname = '$libdir/test_copy_format'
+relocatable = true
-- 
2.45.2
From c0c504a899861da2d9d66fd9bd3c31c56735ffe0 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Sat, 28 Sep 2024 23:56:36 +0900
Subject: [PATCH v21 04/10] Export CopyToStateData
It's for custom COPY TO format handlers implemented as extension.
This just moves codes. This doesn't change codes except CopyDest enum
values. CopyDest/CopyFrom enum values such as COPY_FILE are conflicted
each other. So COPY_DEST_ prefix instead of COPY_ prefix is used for
CopyDest enum values. For example, COPY_FILE in CopyDest is renamed to
COPY_DEST_FILE.
Note that this isn't enough to implement custom COPY TO format
handlers as extension. We'll do the followings in a subsequent commit:
1. Add an opaque space for custom COPY TO format handler
2. Export CopySendEndOfRow() to flush buffer
---
 src/backend/commands/copyto.c  |  77 ++-----------------
 src/include/commands/copy.h    |  74 +-----------------
 src/include/commands/copyapi.h | 134 ++++++++++++++++++++++++++++++++-
 3 files changed, 143 insertions(+), 142 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 8ddbddb119d..37b150b44ba 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -37,67 +37,6 @@
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
-/*
- * Represents the different dest cases we need to worry about at
- * the bottom level
- */
-typedef enum CopyDest
-{
-    COPY_FILE,                    /* to file (or a piped program) */
-    COPY_FRONTEND,                /* to frontend */
-    COPY_CALLBACK,                /* to callback function */
-} CopyDest;
-
-/*
- * This struct contains all the state variables used throughout a COPY TO
- * operation.
- *
- * Multi-byte encodings: all supported client-side encodings encode multi-byte
- * characters by having the first byte's high bit set. Subsequent bytes of the
- * character can have the high bit not set. When scanning data in such an
- * encoding to look for a match to a single-byte (ie ASCII) character, we must
- * use the full pg_encoding_mblen() machinery to skip over multibyte
- * characters, else we might find a false match to a trailing byte. In
- * supported server encodings, there is no possibility of a false match, and
- * it's faster to make useless comparisons to trailing bytes than it is to
- * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true
- * when we have to do it the hard way.
- */
-typedef struct CopyToStateData
-{
-    /* format routine */
-    const CopyToRoutine *routine;
-
-    /* low-level state data */
-    CopyDest    copy_dest;        /* type of copy source/destination */
-    FILE       *copy_file;        /* used if copy_dest == COPY_FILE */
-    StringInfo    fe_msgbuf;        /* used for all dests during COPY TO */
-
-    int            file_encoding;    /* file or remote side's character encoding */
-    bool        need_transcoding;    /* file encoding diff from server? */
-    bool        encoding_embeds_ascii;    /* ASCII can be non-first byte? */
-
-    /* parameters from the COPY command */
-    Relation    rel;            /* relation to copy to */
-    QueryDesc  *queryDesc;        /* executable query to copy from */
-    List       *attnumlist;        /* integer list of attnums to copy */
-    char       *filename;        /* filename, or NULL for STDOUT */
-    bool        is_program;        /* is 'filename' a program to popen? */
-    copy_data_dest_cb data_dest_cb; /* function for writing data */
-
-    CopyFormatOptions opts;
-    Node       *whereClause;    /* WHERE condition (or NULL) */
-
-    /*
-     * Working state
-     */
-    MemoryContext copycontext;    /* per-copy execution context */
-
-    FmgrInfo   *out_functions;    /* lookup info for output functions */
-    MemoryContext rowcontext;    /* per-row evaluation context */
-    uint64        bytes_processed;    /* number of bytes processed so far */
-} CopyToStateData;
-
 /* DestReceiver for COPY (query) TO */
 typedef struct
 {
@@ -143,7 +82,7 @@ CopyToTextLikeSendEndOfRow(CopyToState cstate)
 {
     switch (cstate->copy_dest)
     {
-        case COPY_FILE:
+        case COPY_DEST_FILE:
             /* Default line termination depends on platform */
 #ifndef WIN32
             CopySendChar(cstate, '\n');
@@ -151,7 +90,7 @@ CopyToTextLikeSendEndOfRow(CopyToState cstate)
             CopySendString(cstate, "\r\n");
 #endif
             break;
-        case COPY_FRONTEND:
+        case COPY_DEST_FRONTEND:
             /* The FE/BE protocol uses \n as newline for all platforms */
             CopySendChar(cstate, '\n');
             break;
@@ -460,7 +399,7 @@ SendCopyBegin(CopyToState cstate)
     for (i = 0; i < natts; i++)
         pq_sendint16(&buf, format); /* per-column formats */
     pq_endmessage(&buf);
-    cstate->copy_dest = COPY_FRONTEND;
+    cstate->copy_dest = COPY_DEST_FRONTEND;
 }
 
 static void
@@ -507,7 +446,7 @@ CopySendEndOfRow(CopyToState cstate)
 
     switch (cstate->copy_dest)
     {
-        case COPY_FILE:
+        case COPY_DEST_FILE:
             if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1,
                        cstate->copy_file) != 1 ||
                 ferror(cstate->copy_file))
@@ -541,11 +480,11 @@ CopySendEndOfRow(CopyToState cstate)
                              errmsg("could not write to COPY file: %m")));
             }
             break;
-        case COPY_FRONTEND:
+        case COPY_DEST_FRONTEND:
             /* Dump the accumulated row as one CopyData message */
             (void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len);
             break;
-        case COPY_CALLBACK:
+        case COPY_DEST_CALLBACK:
             cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len);
             break;
     }
@@ -925,12 +864,12 @@ BeginCopyTo(ParseState *pstate,
     /* See Multibyte encoding comment above */
     cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding);
 
-    cstate->copy_dest = COPY_FILE;    /* default */
+    cstate->copy_dest = COPY_DEST_FILE; /* default */
 
     if (data_dest_cb)
     {
         progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK;
-        cstate->copy_dest = COPY_CALLBACK;
+        cstate->copy_dest = COPY_DEST_CALLBACK;
         cstate->data_dest_cb = data_dest_cb;
     }
     else if (pipe)
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index e9ed8443210..dd645eaa030 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -14,85 +14,15 @@
 #ifndef COPY_H
 #define COPY_H
 
-#include "nodes/execnodes.h"
+#include "commands/copyapi.h"
 #include "nodes/parsenodes.h"
 #include "parser/parse_node.h"
 #include "tcop/dest.h"
 
-/*
- * Represents whether a header line should be present, and whether it must
- * match the actual names (which implies "true").
- */
-typedef enum CopyHeaderChoice
-{
-    COPY_HEADER_FALSE = 0,
-    COPY_HEADER_TRUE,
-    COPY_HEADER_MATCH,
-} CopyHeaderChoice;
-
-/*
- * Represents where to save input processing errors.  More values to be added
- * in the future.
- */
-typedef enum CopyOnErrorChoice
-{
-    COPY_ON_ERROR_STOP = 0,        /* immediately throw errors, default */
-    COPY_ON_ERROR_IGNORE,        /* ignore errors */
-} CopyOnErrorChoice;
-
-/*
- * Represents verbosity of logged messages by COPY command.
- */
-typedef enum CopyLogVerbosityChoice
-{
-    COPY_LOG_VERBOSITY_DEFAULT = 0, /* logs no additional messages, default */
-    COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
-} CopyLogVerbosityChoice;
-
-/*
- * A struct to hold COPY options, in a parsed form. All of these are related
- * to formatting, except for 'freeze', which doesn't really belong here, but
- * it's expedient to parse it along with all the other options.
- */
-typedef struct CopyFormatOptions
-{
-    /* parameters from the COPY command */
-    int            file_encoding;    /* file or remote side's character encoding,
-                                 * -1 if not specified */
-    bool        binary;            /* binary format? */
-    bool        freeze;            /* freeze rows on loading? */
-    bool        csv_mode;        /* Comma Separated Value format? */
-    CopyHeaderChoice header_line;    /* header line? */
-    char       *null_print;        /* NULL marker string (server encoding!) */
-    int            null_print_len; /* length of same */
-    char       *null_print_client;    /* same converted to file encoding */
-    char       *default_print;    /* DEFAULT marker string */
-    int            default_print_len;    /* length of same */
-    char       *delim;            /* column delimiter (must be 1 byte) */
-    char       *quote;            /* CSV quote char (must be 1 byte) */
-    char       *escape;            /* CSV escape char (must be 1 byte) */
-    List       *force_quote;    /* list of column names */
-    bool        force_quote_all;    /* FORCE_QUOTE *? */
-    bool       *force_quote_flags;    /* per-column CSV FQ flags */
-    List       *force_notnull;    /* list of column names */
-    bool        force_notnull_all;    /* FORCE_NOT_NULL *? */
-    bool       *force_notnull_flags;    /* per-column CSV FNN flags */
-    List       *force_null;        /* list of column names */
-    bool        force_null_all; /* FORCE_NULL *? */
-    bool       *force_null_flags;    /* per-column CSV FN flags */
-    bool        convert_selectively;    /* do selective binary conversion? */
-    CopyOnErrorChoice on_error; /* what to do when error happened */
-    CopyLogVerbosityChoice log_verbosity;    /* verbosity of logged messages */
-    List       *convert_select; /* list of column names (can be NIL) */
-    Node       *routine;        /* CopyToRoutine (can be NULL) */
-} CopyFormatOptions;
-
-/* These are private in commands/copy[from|to].c */
+/* This is private in commands/copyfrom.c */
 typedef struct CopyFromStateData *CopyFromState;
-typedef struct CopyToStateData *CopyToState;
 
 typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
-typedef void (*copy_data_dest_cb) (void *data, int len);
 
 extern void DoCopy(ParseState *pstate, const CopyStmt *stmt,
                    int stmt_location, int stmt_len,
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 05b7d92ddba..03779c15f43 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -14,10 +14,79 @@
 #ifndef COPYAPI_H
 #define COPYAPI_H
 
+#include "commands/trigger.h"
+#include "executor/execdesc.h"
 #include "executor/tuptable.h"
 #include "nodes/execnodes.h"
 
-/* This is private in commands/copyto.c */
+/*
+ * Represents whether a header line should be present, and whether it must
+ * match the actual names (which implies "true").
+ */
+typedef enum CopyHeaderChoice
+{
+    COPY_HEADER_FALSE = 0,
+    COPY_HEADER_TRUE,
+    COPY_HEADER_MATCH,
+} CopyHeaderChoice;
+
+/*
+ * Represents where to save input processing errors.  More values to be added
+ * in the future.
+ */
+typedef enum CopyOnErrorChoice
+{
+    COPY_ON_ERROR_STOP = 0,        /* immediately throw errors, default */
+    COPY_ON_ERROR_IGNORE,        /* ignore errors */
+} CopyOnErrorChoice;
+
+/*
+ * Represents verbosity of logged messages by COPY command.
+ */
+typedef enum CopyLogVerbosityChoice
+{
+    COPY_LOG_VERBOSITY_DEFAULT = 0, /* logs no additional messages, default */
+    COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
+} CopyLogVerbosityChoice;
+
+/*
+ * A struct to hold COPY options, in a parsed form. All of these are related
+ * to formatting, except for 'freeze', which doesn't really belong here, but
+ * it's expedient to parse it along with all the other options.
+ */
+typedef struct CopyFormatOptions
+{
+    /* parameters from the COPY command */
+    int            file_encoding;    /* file or remote side's character encoding,
+                                 * -1 if not specified */
+    bool        binary;            /* binary format? */
+    bool        freeze;            /* freeze rows on loading? */
+    bool        csv_mode;        /* Comma Separated Value format? */
+    CopyHeaderChoice header_line;    /* header line? */
+    char       *null_print;        /* NULL marker string (server encoding!) */
+    int            null_print_len; /* length of same */
+    char       *null_print_client;    /* same converted to file encoding */
+    char       *default_print;    /* DEFAULT marker string */
+    int            default_print_len;    /* length of same */
+    char       *delim;            /* column delimiter (must be 1 byte) */
+    char       *quote;            /* CSV quote char (must be 1 byte) */
+    char       *escape;            /* CSV escape char (must be 1 byte) */
+    List       *force_quote;    /* list of column names */
+    bool        force_quote_all;    /* FORCE_QUOTE *? */
+    bool       *force_quote_flags;    /* per-column CSV FQ flags */
+    List       *force_notnull;    /* list of column names */
+    bool        force_notnull_all;    /* FORCE_NOT_NULL *? */
+    bool       *force_notnull_flags;    /* per-column CSV FNN flags */
+    List       *force_null;        /* list of column names */
+    bool        force_null_all; /* FORCE_NULL *? */
+    bool       *force_null_flags;    /* per-column CSV FN flags */
+    bool        convert_selectively;    /* do selective binary conversion? */
+    CopyOnErrorChoice on_error; /* what to do when error happened */
+    CopyLogVerbosityChoice log_verbosity;    /* verbosity of logged messages */
+    List       *convert_select; /* list of column names (can be NIL) */
+    Node       *routine;        /* CopyToRoutine (can be NULL) */
+} CopyFormatOptions;
+
 typedef struct CopyToStateData *CopyToState;
 
 /*
@@ -57,4 +126,67 @@ typedef struct CopyToRoutine
     void        (*CopyToEnd) (CopyToState cstate);
 } CopyToRoutine;
 
+/*
+ * Represents the different dest cases we need to worry about at
+ * the bottom level
+ */
+typedef enum CopyDest
+{
+    COPY_DEST_FILE,                /* to file (or a piped program) */
+    COPY_DEST_FRONTEND,            /* to frontend */
+    COPY_DEST_CALLBACK,            /* to callback function */
+} CopyDest;
+
+typedef void (*copy_data_dest_cb) (void *data, int len);
+
+/*
+ * This struct contains all the state variables used throughout a COPY TO
+ * operation.
+ *
+ * Multi-byte encodings: all supported client-side encodings encode multi-byte
+ * characters by having the first byte's high bit set. Subsequent bytes of the
+ * character can have the high bit not set. When scanning data in such an
+ * encoding to look for a match to a single-byte (ie ASCII) character, we must
+ * use the full pg_encoding_mblen() machinery to skip over multibyte
+ * characters, else we might find a false match to a trailing byte. In
+ * supported server encodings, there is no possibility of a false match, and
+ * it's faster to make useless comparisons to trailing bytes than it is to
+ * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true
+ * when we have to do it the hard way.
+ */
+typedef struct CopyToStateData
+{
+    /* format routine */
+    const CopyToRoutine *routine;
+
+    /* low-level state data */
+    CopyDest    copy_dest;        /* type of copy source/destination */
+    FILE       *copy_file;        /* used if copy_dest == COPY_FILE */
+    StringInfo    fe_msgbuf;        /* used for all dests during COPY TO */
+
+    int            file_encoding;    /* file or remote side's character encoding */
+    bool        need_transcoding;    /* file encoding diff from server? */
+    bool        encoding_embeds_ascii;    /* ASCII can be non-first byte? */
+
+    /* parameters from the COPY command */
+    Relation    rel;            /* relation to copy to */
+    QueryDesc  *queryDesc;        /* executable query to copy from */
+    List       *attnumlist;        /* integer list of attnums to copy */
+    char       *filename;        /* filename, or NULL for STDOUT */
+    bool        is_program;        /* is 'filename' a program to popen? */
+    copy_data_dest_cb data_dest_cb; /* function for writing data */
+
+    CopyFormatOptions opts;
+    Node       *whereClause;    /* WHERE condition (or NULL) */
+
+    /*
+     * Working state
+     */
+    MemoryContext copycontext;    /* per-copy execution context */
+
+    FmgrInfo   *out_functions;    /* lookup info for output functions */
+    MemoryContext rowcontext;    /* per-row evaluation context */
+    uint64        bytes_processed;    /* number of bytes processed so far */
+} CopyToStateData;
+
 #endif                            /* COPYAPI_H */
-- 
2.45.2
From 9a2c88f860f29d0eae0d2a486516451d4a288e2f Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Sat, 28 Sep 2024 23:59:34 +0900
Subject: [PATCH v21 05/10] Add support for implementing custom COPY TO format
 as extension
* Add CopyToStateData::opaque that can be used to keep data for custom
  COPY TO format implementation
* Export CopySendEndOfRow() to flush data in CopyToStateData::fe_msgbuf
  as CopyToStateFlush()
---
 src/backend/commands/copyto.c  | 14 ++++++++++++++
 src/include/commands/copyapi.h |  5 +++++
 2 files changed, 19 insertions(+)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 37b150b44ba..c99edae575b 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -496,6 +496,20 @@ CopySendEndOfRow(CopyToState cstate)
     resetStringInfo(fe_msgbuf);
 }
 
+/*
+ * CopyToStateFlush
+ *
+ * Export CopySendEndOfRow() for extensions. We want to keep
+ * CopySendEndOfRow() as a static function for
+ * optimization. CopySendEndOfRow() calls in this file may be optimized by a
+ * compiler.
+ */
+void
+CopyToStateFlush(CopyToState cstate)
+{
+    CopySendEndOfRow(cstate);
+}
+
 /*
  * These functions do apply some data conversion
  */
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 03779c15f43..30765951e2e 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -187,6 +187,11 @@ typedef struct CopyToStateData
     FmgrInfo   *out_functions;    /* lookup info for output functions */
     MemoryContext rowcontext;    /* per-row evaluation context */
     uint64        bytes_processed;    /* number of bytes processed so far */
+
+    /* For custom format implementation */
+    void       *opaque;            /* private space */
 } CopyToStateData;
 
+extern void CopyToStateFlush(CopyToState cstate);
+
 #endif                            /* COPYAPI_H */
-- 
2.45.2
From a75c8076668416cc6112c4166c728cf2552c7e4a Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Sun, 29 Sep 2024 00:06:20 +0900
Subject: [PATCH v21 06/10] Add CopyFromRoutine
This is for implementing custom COPY FROM format. But this is not
enough to implement custom COPY FROM format yet. We'll export some
APIs to receive data and add "format" option to COPY FROM later.
Existing text/csv/binary format implementations don't use
CopyFromRoutine for now. We have a patch for it but we defer
it. Because there are some mysterious profile results in spite of we
get faster runtimes. See [1] for details.
[1] https://www.postgresql.org/message-id/ZdbtQJ-p5H1_EDwE%40paquier.xyz
Note that this doesn't change existing text/csv/binary format
implementations.
---
 src/backend/commands/copyfrom.c          | 24 ++++++++++--
 src/backend/commands/copyfromparse.c     |  5 +++
 src/include/commands/copyapi.h           | 47 +++++++++++++++++++++++-
 src/include/commands/copyfrom_internal.h |  4 ++
 src/tools/pgindent/typedefs.list         |  1 +
 5 files changed, 76 insertions(+), 5 deletions(-)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 2d3462913e1..c42485bd9cb 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1632,12 +1632,22 @@ BeginCopyFrom(ParseState *pstate,
 
         /* Fetch the input function and typioparam info */
         if (cstate->opts.binary)
+        {
             getTypeBinaryInputInfo(att->atttypid,
                                    &in_func_oid, &typioparams[attnum - 1]);
+            fmgr_info(in_func_oid, &in_functions[attnum - 1]);
+        }
+        else if (cstate->routine)
+            cstate->routine->CopyFromInFunc(cstate, att->atttypid,
+                                            &in_functions[attnum - 1],
+                                            &typioparams[attnum - 1]);
+
         else
+        {
             getTypeInputInfo(att->atttypid,
                              &in_func_oid, &typioparams[attnum - 1]);
-        fmgr_info(in_func_oid, &in_functions[attnum - 1]);
+            fmgr_info(in_func_oid, &in_functions[attnum - 1]);
+        }
 
         /* Get default info if available */
         defexprs[attnum - 1] = NULL;
@@ -1777,10 +1787,13 @@ BeginCopyFrom(ParseState *pstate,
         /* Read and verify binary header */
         ReceiveCopyBinaryHeader(cstate);
     }
-
-    /* create workspace for CopyReadAttributes results */
-    if (!cstate->opts.binary)
+    else if (cstate->routine)
     {
+        cstate->routine->CopyFromStart(cstate, tupDesc);
+    }
+    else
+    {
+        /* create workspace for CopyReadAttributes results */
         AttrNumber    attr_count = list_length(cstate->attnumlist);
 
         cstate->max_fields = attr_count;
@@ -1798,6 +1811,9 @@ BeginCopyFrom(ParseState *pstate,
 void
 EndCopyFrom(CopyFromState cstate)
 {
+    if (cstate->routine)
+        cstate->routine->CopyFromEnd(cstate);
+
     /* No COPY FROM related resources except memory. */
     if (cstate->is_program)
     {
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 97a4c387a30..2e126448019 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -1012,6 +1012,11 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 
         Assert(fieldno == attr_count);
     }
+    else if (cstate->routine)
+    {
+        if (!cstate->routine->CopyFromOneRow(cstate, econtext, values, nulls))
+            return false;
+    }
     else
     {
         /* binary */
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 30765951e2e..7421241de83 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -1,7 +1,7 @@
 /*-------------------------------------------------------------------------
  *
  * copyapi.h
- *      API for COPY TO handlers
+ *      API for COPY TO/FROM handlers
  *
  *
  * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
@@ -87,6 +87,51 @@ typedef struct CopyFormatOptions
     Node       *routine;        /* CopyToRoutine (can be NULL) */
 } CopyFormatOptions;
 
+/* This is private in commands/copyfrom.c */
+typedef struct CopyFromStateData *CopyFromState;
+
+/*
+ * API structure for a COPY FROM format implementation.  Note this must be
+ * allocated in a server-lifetime manner, typically as a static const struct.
+ */
+typedef struct CopyFromRoutine
+{
+    /*
+     * Called when COPY FROM is started to set up the input functions
+     * associated with the relation's attributes writing to.  `finfo` can be
+     * optionally filled to provide the catalog information of the input
+     * function.  `typioparam` can be optionally filled to define the OID of
+     * the type to pass to the input function.  `atttypid` is the OID of data
+     * type used by the relation's attribute.
+     */
+    void        (*CopyFromInFunc) (CopyFromState cstate, Oid atttypid,
+                                   FmgrInfo *finfo, Oid *typioparam);
+
+    /*
+     * Called when COPY FROM is started.
+     *
+     * `tupDesc` is the tuple descriptor of the relation where the data needs
+     * to be copied.  This can be used for any initialization steps required
+     * by a format.
+     */
+    void        (*CopyFromStart) (CopyFromState cstate, TupleDesc tupDesc);
+
+    /*
+     * Copy one row to a set of `values` and `nulls` of size tupDesc->natts.
+     *
+     * 'econtext' is used to evaluate default expression for each column that
+     * is either not read from the file or is using the DEFAULT option of COPY
+     * FROM.  It is NULL if no default values are used.
+     *
+     * Returns false if there are no more tuples to copy.
+     */
+    bool        (*CopyFromOneRow) (CopyFromState cstate, ExprContext *econtext,
+                                   Datum *values, bool *nulls);
+
+    /* Called when COPY FROM has ended. */
+    void        (*CopyFromEnd) (CopyFromState cstate);
+} CopyFromRoutine;
+
 typedef struct CopyToStateData *CopyToState;
 
 /*
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index cad52fcc783..509b9e92a18 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -15,6 +15,7 @@
 #define COPYFROM_INTERNAL_H
 
 #include "commands/copy.h"
+#include "commands/copyapi.h"
 #include "commands/trigger.h"
 #include "nodes/miscnodes.h"
 
@@ -58,6 +59,9 @@ typedef enum CopyInsertMethod
  */
 typedef struct CopyFromStateData
 {
+    /* format routine */
+    const CopyFromRoutine *routine;
+
     /* low-level state data */
     CopySource    copy_src;        /* type of copy source */
     FILE       *copy_file;        /* used if copy_src == COPY_FILE */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 8eb537bfa77..4c4bf60d9e5 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -492,6 +492,7 @@ ConvertRowtypeExpr
 CookedConstraint
 CopyDest
 CopyFormatOptions
+CopyFromRoutine
 CopyFromState
 CopyFromStateData
 CopyHeaderChoice
-- 
2.45.2
From 5790fb08795adcc43c72186e662b125e32b5986f Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Sun, 29 Sep 2024 00:09:29 +0900
Subject: [PATCH v21 07/10] Use CopyFromRoutine for the existing formats
The existing formats are text, csv and binary. If we find any
performance regression by this, we will not merge this to master.
This will increase indirect function call costs but this will reduce
runtime "if (cstate->opts.binary)" and "if (cstate->opts.csv_mode)"
branch costs.
This uses an optimization based of static inline function and a
constant argument call for cstate->opts.csv_mode. For example,
CopyFromTextLikeOneRow() uses this optimization. It accepts the "bool
is_csv" argument instead of using cstate->opts.csv_mode in
it. CopyFromTextOneRow() calls CopyFromTextLikeOneRow() with
false (constant) for "bool is_csv". Compiler will remove "if (is_csv)"
branch in it by this optimization.
This doesn't change existing logic. This just moves existing codes.
---
 src/backend/commands/copyfrom.c          | 215 ++++++---
 src/backend/commands/copyfromparse.c     | 550 +++++++++++++----------
 src/include/commands/copy.h              |   2 -
 src/include/commands/copyfrom_internal.h |   8 +
 4 files changed, 487 insertions(+), 288 deletions(-)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index c42485bd9cb..14f95f17124 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -106,6 +106,157 @@ typedef struct CopyMultiInsertInfo
 /* non-export function prototypes */
 static void ClosePipeFromProgram(CopyFromState cstate);
 
+
+/*
+ * CopyFromRoutine implementations for text and CSV.
+ */
+
+/*
+ * CopyFromTextLikeInFunc
+ *
+ * Assign input function data for a relation's attribute in text/CSV format.
+ */
+static void
+CopyFromTextLikeInFunc(CopyFromState cstate, Oid atttypid,
+                       FmgrInfo *finfo, Oid *typioparam)
+{
+    Oid            func_oid;
+
+    getTypeInputInfo(atttypid, &func_oid, typioparam);
+    fmgr_info(func_oid, finfo);
+}
+
+/*
+ * CopyFromTextLikeStart
+ *
+ * Start of COPY FROM for text/CSV format.
+ */
+static void
+CopyFromTextLikeStart(CopyFromState cstate, TupleDesc tupDesc)
+{
+    AttrNumber    attr_count;
+
+    /*
+     * If encoding conversion is needed, we need another buffer to hold the
+     * converted input data.  Otherwise, we can just point input_buf to the
+     * same buffer as raw_buf.
+     */
+    if (cstate->need_transcoding)
+    {
+        cstate->input_buf = (char *) palloc(INPUT_BUF_SIZE + 1);
+        cstate->input_buf_index = cstate->input_buf_len = 0;
+    }
+    else
+        cstate->input_buf = cstate->raw_buf;
+    cstate->input_reached_eof = false;
+
+    initStringInfo(&cstate->line_buf);
+
+    /*
+     * Create workspace for CopyReadAttributes results; used by CSV and text
+     * format.
+     */
+    attr_count = list_length(cstate->attnumlist);
+    cstate->max_fields = attr_count;
+    cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *));
+}
+
+/*
+ * CopyFromTextLikeEnd
+ *
+ * End of COPY FROM for text/CSV format.
+ */
+static void
+CopyFromTextLikeEnd(CopyFromState cstate)
+{
+    /* nothing to do */
+}
+
+/*
+ * CopyFromRoutine implementation for "binary".
+ */
+
+/*
+ * CopyFromBinaryInFunc
+ *
+ * Assign input function data for a relation's attribute in binary format.
+ */
+static void
+CopyFromBinaryInFunc(CopyFromState cstate, Oid atttypid,
+                     FmgrInfo *finfo, Oid *typioparam)
+{
+    Oid            func_oid;
+
+    getTypeBinaryInputInfo(atttypid, &func_oid, typioparam);
+    fmgr_info(func_oid, finfo);
+}
+
+/*
+ * CopyFromBinaryStart
+ *
+ * Start of COPY FROM for binary format.
+ */
+static void
+CopyFromBinaryStart(CopyFromState cstate, TupleDesc tupDesc)
+{
+    /* Read and verify binary header */
+    ReceiveCopyBinaryHeader(cstate);
+}
+
+/*
+ * CopyFromBinaryEnd
+ *
+ * End of COPY FROM for binary format.
+ */
+static void
+CopyFromBinaryEnd(CopyFromState cstate)
+{
+    /* nothing to do */
+}
+
+/*
+ * Routines assigned to each format.
++
+ * CSV and text share the same implementation, at the exception of the
+ * per-row callback.
+ */
+static const CopyFromRoutine CopyFromRoutineText = {
+    .CopyFromInFunc = CopyFromTextLikeInFunc,
+    .CopyFromStart = CopyFromTextLikeStart,
+    .CopyFromOneRow = CopyFromTextOneRow,
+    .CopyFromEnd = CopyFromTextLikeEnd,
+};
+
+static const CopyFromRoutine CopyFromRoutineCSV = {
+    .CopyFromInFunc = CopyFromTextLikeInFunc,
+    .CopyFromStart = CopyFromTextLikeStart,
+    .CopyFromOneRow = CopyFromCSVOneRow,
+    .CopyFromEnd = CopyFromTextLikeEnd,
+};
+
+static const CopyFromRoutine CopyFromRoutineBinary = {
+    .CopyFromInFunc = CopyFromBinaryInFunc,
+    .CopyFromStart = CopyFromBinaryStart,
+    .CopyFromOneRow = CopyFromBinaryOneRow,
+    .CopyFromEnd = CopyFromBinaryEnd,
+};
+
+/*
+ * Define the COPY FROM routines to use for a format.
+ */
+static const CopyFromRoutine *
+CopyFromGetRoutine(CopyFormatOptions opts)
+{
+    if (opts.csv_mode)
+        return &CopyFromRoutineCSV;
+    else if (opts.binary)
+        return &CopyFromRoutineBinary;
+
+    /* default is text */
+    return &CopyFromRoutineText;
+}
+
+
 /*
  * error context callback for COPY FROM
  *
@@ -1393,7 +1544,6 @@ BeginCopyFrom(ParseState *pstate,
                 num_defaults;
     FmgrInfo   *in_functions;
     Oid           *typioparams;
-    Oid            in_func_oid;
     int           *defmap;
     ExprState **defexprs;
     MemoryContext oldcontext;
@@ -1425,6 +1575,9 @@ BeginCopyFrom(ParseState *pstate,
     /* Extract options from the statement node tree */
     ProcessCopyOptions(pstate, &cstate->opts, true /* is_from */ , options);
 
+    /* Set format routine */
+    cstate->routine = CopyFromGetRoutine(cstate->opts);
+
     /* Process the target relation */
     cstate->rel = rel;
 
@@ -1580,25 +1733,6 @@ BeginCopyFrom(ParseState *pstate,
     cstate->raw_buf_index = cstate->raw_buf_len = 0;
     cstate->raw_reached_eof = false;
 
-    if (!cstate->opts.binary)
-    {
-        /*
-         * If encoding conversion is needed, we need another buffer to hold
-         * the converted input data.  Otherwise, we can just point input_buf
-         * to the same buffer as raw_buf.
-         */
-        if (cstate->need_transcoding)
-        {
-            cstate->input_buf = (char *) palloc(INPUT_BUF_SIZE + 1);
-            cstate->input_buf_index = cstate->input_buf_len = 0;
-        }
-        else
-            cstate->input_buf = cstate->raw_buf;
-        cstate->input_reached_eof = false;
-
-        initStringInfo(&cstate->line_buf);
-    }
-
     initStringInfo(&cstate->attribute_buf);
 
     /* Assign range table and rteperminfos, we'll need them in CopyFrom. */
@@ -1631,23 +1765,9 @@ BeginCopyFrom(ParseState *pstate,
             continue;
 
         /* Fetch the input function and typioparam info */
-        if (cstate->opts.binary)
-        {
-            getTypeBinaryInputInfo(att->atttypid,
-                                   &in_func_oid, &typioparams[attnum - 1]);
-            fmgr_info(in_func_oid, &in_functions[attnum - 1]);
-        }
-        else if (cstate->routine)
-            cstate->routine->CopyFromInFunc(cstate, att->atttypid,
-                                            &in_functions[attnum - 1],
-                                            &typioparams[attnum - 1]);
-
-        else
-        {
-            getTypeInputInfo(att->atttypid,
-                             &in_func_oid, &typioparams[attnum - 1]);
-            fmgr_info(in_func_oid, &in_functions[attnum - 1]);
-        }
+        cstate->routine->CopyFromInFunc(cstate, att->atttypid,
+                                        &in_functions[attnum - 1],
+                                        &typioparams[attnum - 1]);
 
         /* Get default info if available */
         defexprs[attnum - 1] = NULL;
@@ -1782,23 +1902,7 @@ BeginCopyFrom(ParseState *pstate,
 
     pgstat_progress_update_multi_param(3, progress_cols, progress_vals);
 
-    if (cstate->opts.binary)
-    {
-        /* Read and verify binary header */
-        ReceiveCopyBinaryHeader(cstate);
-    }
-    else if (cstate->routine)
-    {
-        cstate->routine->CopyFromStart(cstate, tupDesc);
-    }
-    else
-    {
-        /* create workspace for CopyReadAttributes results */
-        AttrNumber    attr_count = list_length(cstate->attnumlist);
-
-        cstate->max_fields = attr_count;
-        cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *));
-    }
+    cstate->routine->CopyFromStart(cstate, tupDesc);
 
     MemoryContextSwitchTo(oldcontext);
 
@@ -1811,8 +1915,7 @@ BeginCopyFrom(ParseState *pstate,
 void
 EndCopyFrom(CopyFromState cstate)
 {
-    if (cstate->routine)
-        cstate->routine->CopyFromEnd(cstate);
+    cstate->routine->CopyFromEnd(cstate);
 
     /* No COPY FROM related resources except memory. */
     if (cstate->is_program)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 2e126448019..5f63b683d17 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -149,8 +149,8 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 
 
 /* non-export function prototypes */
-static bool CopyReadLine(CopyFromState cstate);
-static bool CopyReadLineText(CopyFromState cstate);
+static bool CopyReadLine(CopyFromState cstate, bool is_csv);
+static inline bool CopyReadLineText(CopyFromState cstate, bool is_csv);
 static int    CopyReadAttributesText(CopyFromState cstate);
 static int    CopyReadAttributesCSV(CopyFromState cstate);
 static Datum CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
@@ -750,8 +750,8 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes)
  *
  * NOTE: force_not_null option are not applied to the returned fields.
  */
-bool
-NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
+static inline bool
+NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields, bool is_csv)
 {
     int            fldct;
     bool        done;
@@ -768,13 +768,17 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
         tupDesc = RelationGetDescr(cstate->rel);
 
         cstate->cur_lineno++;
-        done = CopyReadLine(cstate);
+        done = CopyReadLine(cstate, is_csv);
 
         if (cstate->opts.header_line == COPY_HEADER_MATCH)
         {
             int            fldnum;
 
-            if (cstate->opts.csv_mode)
+            /*
+             * is_csv will be optimized away by compiler, as argument is
+             * constant at caller.
+             */
+            if (is_csv)
                 fldct = CopyReadAttributesCSV(cstate);
             else
                 fldct = CopyReadAttributesText(cstate);
@@ -818,7 +822,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
     cstate->cur_lineno++;
 
     /* Actually read the line into memory here */
-    done = CopyReadLine(cstate);
+    done = CopyReadLine(cstate, is_csv);
 
     /*
      * EOF at start of line means we're done.  If we see EOF after some
@@ -828,8 +832,13 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
     if (done && cstate->line_buf.len == 0)
         return false;
 
-    /* Parse the line into de-escaped field values */
-    if (cstate->opts.csv_mode)
+    /*
+     * Parse the line into de-escaped field values
+     *
+     * is_csv will be optimized away by compiler, as argument is constant at
+     * caller.
+     */
+    if (is_csv)
         fldct = CopyReadAttributesCSV(cstate);
     else
         fldct = CopyReadAttributesText(cstate);
@@ -839,6 +848,267 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
     return true;
 }
 
+/*
+ * CopyFromTextLikeOneRow
+ *
+ * Copy one row to a set of `values` and `nulls` for the text and CSV
+ * formats.
+ *
+ * Workhorse for CopyFromTextOneRow() and CopyFromCSVOneRow().
+ */
+static inline bool
+CopyFromTextLikeOneRow(CopyFromState cstate,
+                       ExprContext *econtext,
+                       Datum *values,
+                       bool *nulls,
+                       bool is_csv)
+{
+    TupleDesc    tupDesc;
+    AttrNumber    attr_count;
+    FmgrInfo   *in_functions = cstate->in_functions;
+    Oid           *typioparams = cstate->typioparams;
+    ExprState **defexprs = cstate->defexprs;
+    char      **field_strings;
+    ListCell   *cur;
+    int            fldct;
+    int            fieldno;
+    char       *string;
+
+    tupDesc = RelationGetDescr(cstate->rel);
+    attr_count = list_length(cstate->attnumlist);
+
+    /* read raw fields in the next line */
+    if (!NextCopyFromRawFields(cstate, &field_strings, &fldct, is_csv))
+        return false;
+
+    /* check for overflowing fields */
+    if (attr_count > 0 && fldct > attr_count)
+        ereport(ERROR,
+                (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                 errmsg("extra data after last expected column")));
+
+    fieldno = 0;
+
+    /* Loop to read the user attributes on the line. */
+    foreach(cur, cstate->attnumlist)
+    {
+        int            attnum = lfirst_int(cur);
+        int            m = attnum - 1;
+        Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+        if (fieldno >= fldct)
+            ereport(ERROR,
+                    (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                     errmsg("missing data for column \"%s\"",
+                            NameStr(att->attname))));
+        string = field_strings[fieldno++];
+
+        if (cstate->convert_select_flags &&
+            !cstate->convert_select_flags[m])
+        {
+            /* ignore input field, leaving column as NULL */
+            continue;
+        }
+
+        if (is_csv)
+        {
+            if (string == NULL &&
+                cstate->opts.force_notnull_flags[m])
+            {
+                /*
+                 * FORCE_NOT_NULL option is set and column is NULL - convert
+                 * it to the NULL string.
+                 */
+                string = cstate->opts.null_print;
+            }
+            else if (string != NULL && cstate->opts.force_null_flags[m]
+                     && strcmp(string, cstate->opts.null_print) == 0)
+            {
+                /*
+                 * FORCE_NULL option is set and column matches the NULL
+                 * string. It must have been quoted, or otherwise the string
+                 * would already have been set to NULL. Convert it to NULL as
+                 * specified.
+                 */
+                string = NULL;
+            }
+        }
+
+        cstate->cur_attname = NameStr(att->attname);
+        cstate->cur_attval = string;
+
+        if (string != NULL)
+            nulls[m] = false;
+
+        if (cstate->defaults[m])
+        {
+            /*
+             * The caller must supply econtext and have switched into the
+             * per-tuple memory context in it.
+             */
+            Assert(econtext != NULL);
+            Assert(CurrentMemoryContext == econtext->ecxt_per_tuple_memory);
+
+            values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]);
+        }
+
+        /*
+         * If ON_ERROR is specified with IGNORE, skip rows with soft errors
+         */
+        else if (!InputFunctionCallSafe(&in_functions[m],
+                                        string,
+                                        typioparams[m],
+                                        att->atttypmod,
+                                        (Node *) cstate->escontext,
+                                        &values[m]))
+        {
+            Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
+
+            cstate->num_errors++;
+
+            if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+            {
+                /*
+                 * Since we emit line number and column info in the below
+                 * notice message, we suppress error context information other
+                 * than the relation name.
+                 */
+                Assert(!cstate->relname_only);
+                cstate->relname_only = true;
+
+                if (cstate->cur_attval)
+                {
+                    char       *attval;
+
+                    attval = CopyLimitPrintoutLength(cstate->cur_attval);
+                    ereport(NOTICE,
+                            errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\":
\"%s\"",
+                                   (unsigned long long) cstate->cur_lineno,
+                                   cstate->cur_attname,
+                                   attval));
+                    pfree(attval);
+                }
+                else
+                    ereport(NOTICE,
+                            errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": null
input",
+                                   (unsigned long long) cstate->cur_lineno,
+                                   cstate->cur_attname));
+
+                /* reset relname_only */
+                cstate->relname_only = false;
+            }
+
+            return true;
+        }
+
+        cstate->cur_attname = NULL;
+        cstate->cur_attval = NULL;
+    }
+
+    Assert(fieldno == attr_count);
+
+    return true;
+}
+
+
+/*
+ * CopyFromTextOneRow
+ *
+ * Per-row callback for COPY FROM with text format.
+ */
+bool
+CopyFromTextOneRow(CopyFromState cstate,
+                   ExprContext *econtext,
+                   Datum *values,
+                   bool *nulls)
+{
+    return CopyFromTextLikeOneRow(cstate, econtext, values, nulls, false);
+}
+
+/*
+ * CopyFromCSVOneRow
+ *
+ * Per-row callback for COPY FROM with CSV format.
+ */
+bool
+CopyFromCSVOneRow(CopyFromState cstate,
+                  ExprContext *econtext,
+                  Datum *values,
+                  bool *nulls)
+{
+    return CopyFromTextLikeOneRow(cstate, econtext, values, nulls, true);
+}
+
+/*
+ * CopyFromBinaryOneRow
+ *
+ * Copy one row to a set of `values` and `nulls` for the binary format.
+ */
+bool
+CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext,
+                     Datum *values, bool *nulls)
+{
+    TupleDesc    tupDesc;
+    AttrNumber    attr_count;
+    FmgrInfo   *in_functions = cstate->in_functions;
+    Oid           *typioparams = cstate->typioparams;
+    int16        fld_count;
+    ListCell   *cur;
+
+    tupDesc = RelationGetDescr(cstate->rel);
+    attr_count = list_length(cstate->attnumlist);
+
+    cstate->cur_lineno++;
+
+    if (!CopyGetInt16(cstate, &fld_count))
+    {
+        /* EOF detected (end of file, or protocol-level EOF) */
+        return false;
+    }
+
+    if (fld_count == -1)
+    {
+        /*
+         * Received EOF marker.  Wait for the protocol-level EOF, and complain
+         * if it doesn't come immediately.  In COPY FROM STDIN, this ensures
+         * that we correctly handle CopyFail, if client chooses to send that
+         * now.  When copying from file, we could ignore the rest of the file
+         * like in text mode, but we choose to be consistent with the COPY
+         * FROM STDIN case.
+         */
+        char        dummy;
+
+        if (CopyReadBinaryData(cstate, &dummy, 1) > 0)
+            ereport(ERROR,
+                    (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                     errmsg("received copy data after EOF marker")));
+        return false;
+    }
+
+    if (fld_count != attr_count)
+        ereport(ERROR,
+                (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                 errmsg("row field count is %d, expected %d",
+                        (int) fld_count, attr_count)));
+
+    foreach(cur, cstate->attnumlist)
+    {
+        int            attnum = lfirst_int(cur);
+        int            m = attnum - 1;
+        Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+        cstate->cur_attname = NameStr(att->attname);
+        values[m] = CopyReadBinaryAttribute(cstate,
+                                            &in_functions[m],
+                                            typioparams[m],
+                                            att->atttypmod,
+                                            &nulls[m]);
+        cstate->cur_attname = NULL;
+    }
+
+    return true;
+}
+
 /*
  * Read next tuple from file for COPY FROM. Return false if no more tuples.
  *
@@ -856,221 +1126,21 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 {
     TupleDesc    tupDesc;
     AttrNumber    num_phys_attrs,
-                attr_count,
                 num_defaults = cstate->num_defaults;
-    FmgrInfo   *in_functions = cstate->in_functions;
-    Oid           *typioparams = cstate->typioparams;
     int            i;
     int           *defmap = cstate->defmap;
     ExprState **defexprs = cstate->defexprs;
 
     tupDesc = RelationGetDescr(cstate->rel);
     num_phys_attrs = tupDesc->natts;
-    attr_count = list_length(cstate->attnumlist);
 
     /* Initialize all values for row to NULL */
     MemSet(values, 0, num_phys_attrs * sizeof(Datum));
     MemSet(nulls, true, num_phys_attrs * sizeof(bool));
     MemSet(cstate->defaults, false, num_phys_attrs * sizeof(bool));
 
-    if (!cstate->opts.binary)
-    {
-        char      **field_strings;
-        ListCell   *cur;
-        int            fldct;
-        int            fieldno;
-        char       *string;
-
-        /* read raw fields in the next line */
-        if (!NextCopyFromRawFields(cstate, &field_strings, &fldct))
-            return false;
-
-        /* check for overflowing fields */
-        if (attr_count > 0 && fldct > attr_count)
-            ereport(ERROR,
-                    (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                     errmsg("extra data after last expected column")));
-
-        fieldno = 0;
-
-        /* Loop to read the user attributes on the line. */
-        foreach(cur, cstate->attnumlist)
-        {
-            int            attnum = lfirst_int(cur);
-            int            m = attnum - 1;
-            Form_pg_attribute att = TupleDescAttr(tupDesc, m);
-
-            if (fieldno >= fldct)
-                ereport(ERROR,
-                        (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                         errmsg("missing data for column \"%s\"",
-                                NameStr(att->attname))));
-            string = field_strings[fieldno++];
-
-            if (cstate->convert_select_flags &&
-                !cstate->convert_select_flags[m])
-            {
-                /* ignore input field, leaving column as NULL */
-                continue;
-            }
-
-            if (cstate->opts.csv_mode)
-            {
-                if (string == NULL &&
-                    cstate->opts.force_notnull_flags[m])
-                {
-                    /*
-                     * FORCE_NOT_NULL option is set and column is NULL -
-                     * convert it to the NULL string.
-                     */
-                    string = cstate->opts.null_print;
-                }
-                else if (string != NULL && cstate->opts.force_null_flags[m]
-                         && strcmp(string, cstate->opts.null_print) == 0)
-                {
-                    /*
-                     * FORCE_NULL option is set and column matches the NULL
-                     * string. It must have been quoted, or otherwise the
-                     * string would already have been set to NULL. Convert it
-                     * to NULL as specified.
-                     */
-                    string = NULL;
-                }
-            }
-
-            cstate->cur_attname = NameStr(att->attname);
-            cstate->cur_attval = string;
-
-            if (string != NULL)
-                nulls[m] = false;
-
-            if (cstate->defaults[m])
-            {
-                /*
-                 * The caller must supply econtext and have switched into the
-                 * per-tuple memory context in it.
-                 */
-                Assert(econtext != NULL);
-                Assert(CurrentMemoryContext == econtext->ecxt_per_tuple_memory);
-
-                values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]);
-            }
-
-            /*
-             * If ON_ERROR is specified with IGNORE, skip rows with soft
-             * errors
-             */
-            else if (!InputFunctionCallSafe(&in_functions[m],
-                                            string,
-                                            typioparams[m],
-                                            att->atttypmod,
-                                            (Node *) cstate->escontext,
-                                            &values[m]))
-            {
-                Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
-
-                cstate->num_errors++;
-
-                if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
-                {
-                    /*
-                     * Since we emit line number and column info in the below
-                     * notice message, we suppress error context information
-                     * other than the relation name.
-                     */
-                    Assert(!cstate->relname_only);
-                    cstate->relname_only = true;
-
-                    if (cstate->cur_attval)
-                    {
-                        char       *attval;
-
-                        attval = CopyLimitPrintoutLength(cstate->cur_attval);
-                        ereport(NOTICE,
-                                errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\":
\"%s\"",
-                                       (unsigned long long) cstate->cur_lineno,
-                                       cstate->cur_attname,
-                                       attval));
-                        pfree(attval);
-                    }
-                    else
-                        ereport(NOTICE,
-                                errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\":
nullinput",
 
-                                       (unsigned long long) cstate->cur_lineno,
-                                       cstate->cur_attname));
-
-                    /* reset relname_only */
-                    cstate->relname_only = false;
-                }
-
-                return true;
-            }
-
-            cstate->cur_attname = NULL;
-            cstate->cur_attval = NULL;
-        }
-
-        Assert(fieldno == attr_count);
-    }
-    else if (cstate->routine)
-    {
-        if (!cstate->routine->CopyFromOneRow(cstate, econtext, values, nulls))
-            return false;
-    }
-    else
-    {
-        /* binary */
-        int16        fld_count;
-        ListCell   *cur;
-
-        cstate->cur_lineno++;
-
-        if (!CopyGetInt16(cstate, &fld_count))
-        {
-            /* EOF detected (end of file, or protocol-level EOF) */
-            return false;
-        }
-
-        if (fld_count == -1)
-        {
-            /*
-             * Received EOF marker.  Wait for the protocol-level EOF, and
-             * complain if it doesn't come immediately.  In COPY FROM STDIN,
-             * this ensures that we correctly handle CopyFail, if client
-             * chooses to send that now.  When copying from file, we could
-             * ignore the rest of the file like in text mode, but we choose to
-             * be consistent with the COPY FROM STDIN case.
-             */
-            char        dummy;
-
-            if (CopyReadBinaryData(cstate, &dummy, 1) > 0)
-                ereport(ERROR,
-                        (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                         errmsg("received copy data after EOF marker")));
-            return false;
-        }
-
-        if (fld_count != attr_count)
-            ereport(ERROR,
-                    (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                     errmsg("row field count is %d, expected %d",
-                            (int) fld_count, attr_count)));
-
-        foreach(cur, cstate->attnumlist)
-        {
-            int            attnum = lfirst_int(cur);
-            int            m = attnum - 1;
-            Form_pg_attribute att = TupleDescAttr(tupDesc, m);
-
-            cstate->cur_attname = NameStr(att->attname);
-            values[m] = CopyReadBinaryAttribute(cstate,
-                                                &in_functions[m],
-                                                typioparams[m],
-                                                att->atttypmod,
-                                                &nulls[m]);
-            cstate->cur_attname = NULL;
-        }
-    }
+    if (!cstate->routine->CopyFromOneRow(cstate, econtext, values, nulls))
+        return false;
 
     /*
      * Now compute and insert any defaults available for the columns not
@@ -1101,7 +1171,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
  * in the final value of line_buf.
  */
 static bool
-CopyReadLine(CopyFromState cstate)
+CopyReadLine(CopyFromState cstate, bool is_csv)
 {
     bool        result;
 
@@ -1109,7 +1179,7 @@ CopyReadLine(CopyFromState cstate)
     cstate->line_buf_valid = false;
 
     /* Parse data and transfer into line_buf */
-    result = CopyReadLineText(cstate);
+    result = CopyReadLineText(cstate, is_csv);
 
     if (result)
     {
@@ -1176,8 +1246,8 @@ CopyReadLine(CopyFromState cstate)
 /*
  * CopyReadLineText - inner loop of CopyReadLine for text mode
  */
-static bool
-CopyReadLineText(CopyFromState cstate)
+static inline bool
+CopyReadLineText(CopyFromState cstate, bool is_csv)
 {
     char       *copy_input_buf;
     int            input_buf_ptr;
@@ -1193,7 +1263,11 @@ CopyReadLineText(CopyFromState cstate)
     char        quotec = '\0';
     char        escapec = '\0';
 
-    if (cstate->opts.csv_mode)
+    /*
+     * is_csv will be optimized away by compiler, as argument is constant at
+     * caller.
+     */
+    if (is_csv)
     {
         quotec = cstate->opts.quote[0];
         escapec = cstate->opts.escape[0];
@@ -1270,7 +1344,11 @@ CopyReadLineText(CopyFromState cstate)
         prev_raw_ptr = input_buf_ptr;
         c = copy_input_buf[input_buf_ptr++];
 
-        if (cstate->opts.csv_mode)
+        /*
+         * is_csv will be optimized away by compiler, as argument is constant
+         * at caller.
+         */
+        if (is_csv)
         {
             /*
              * If character is '\\' or '\r', we may need to look ahead below.
@@ -1309,7 +1387,7 @@ CopyReadLineText(CopyFromState cstate)
         }
 
         /* Process \r */
-        if (c == '\r' && (!cstate->opts.csv_mode || !in_quote))
+        if (c == '\r' && (!is_csv || !in_quote))
         {
             /* Check for \r\n on first line, _and_ handle \r\n. */
             if (cstate->eol_type == EOL_UNKNOWN ||
@@ -1337,10 +1415,10 @@ CopyReadLineText(CopyFromState cstate)
                     if (cstate->eol_type == EOL_CRNL)
                         ereport(ERROR,
                                 (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                                 !cstate->opts.csv_mode ?
+                                 !is_csv ?
                                  errmsg("literal carriage return found in data") :
                                  errmsg("unquoted carriage return found in data"),
-                                 !cstate->opts.csv_mode ?
+                                 !is_csv ?
                                  errhint("Use \"\\r\" to represent carriage return.") :
                                  errhint("Use quoted CSV field to represent carriage return.")));
 
@@ -1354,10 +1432,10 @@ CopyReadLineText(CopyFromState cstate)
             else if (cstate->eol_type == EOL_NL)
                 ereport(ERROR,
                         (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                         !cstate->opts.csv_mode ?
+                         !is_csv ?
                          errmsg("literal carriage return found in data") :
                          errmsg("unquoted carriage return found in data"),
-                         !cstate->opts.csv_mode ?
+                         !is_csv ?
                          errhint("Use \"\\r\" to represent carriage return.") :
                          errhint("Use quoted CSV field to represent carriage return.")));
             /* If reach here, we have found the line terminator */
@@ -1365,15 +1443,15 @@ CopyReadLineText(CopyFromState cstate)
         }
 
         /* Process \n */
-        if (c == '\n' && (!cstate->opts.csv_mode || !in_quote))
+        if (c == '\n' && (!is_csv || !in_quote))
         {
             if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL)
                 ereport(ERROR,
                         (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                         !cstate->opts.csv_mode ?
+                         !is_csv ?
                          errmsg("literal newline found in data") :
                          errmsg("unquoted newline found in data"),
-                         !cstate->opts.csv_mode ?
+                         !is_csv ?
                          errhint("Use \"\\n\" to represent newline.") :
                          errhint("Use quoted CSV field to represent newline.")));
             cstate->eol_type = EOL_NL;    /* in case not set yet */
@@ -1385,7 +1463,7 @@ CopyReadLineText(CopyFromState cstate)
          * In CSV mode, we only recognize \. alone on a line.  This is because
          * \. is a valid CSV data value.
          */
-        if (c == '\\' && (!cstate->opts.csv_mode || first_char_in_line))
+        if (c == '\\' && (!is_csv || first_char_in_line))
         {
             char        c2;
 
@@ -1418,7 +1496,11 @@ CopyReadLineText(CopyFromState cstate)
 
                     if (c2 == '\n')
                     {
-                        if (!cstate->opts.csv_mode)
+                        /*
+                         * is_csv will be optimized away by compiler, as
+                         * argument is constant at caller.
+                         */
+                        if (!is_csv)
                             ereport(ERROR,
                                     (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
                                      errmsg("end-of-copy marker does not match previous newline style")));
@@ -1427,7 +1509,11 @@ CopyReadLineText(CopyFromState cstate)
                     }
                     else if (c2 != '\r')
                     {
-                        if (!cstate->opts.csv_mode)
+                        /*
+                         * is_csv will be optimized away by compiler, as
+                         * argument is constant at caller.
+                         */
+                        if (!is_csv)
                             ereport(ERROR,
                                     (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
                                      errmsg("end-of-copy marker corrupt")));
@@ -1443,7 +1529,11 @@ CopyReadLineText(CopyFromState cstate)
 
                 if (c2 != '\r' && c2 != '\n')
                 {
-                    if (!cstate->opts.csv_mode)
+                    /*
+                     * is_csv will be optimized away by compiler, as argument
+                     * is constant at caller.
+                     */
+                    if (!is_csv)
                         ereport(ERROR,
                                 (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
                                  errmsg("end-of-copy marker corrupt")));
@@ -1472,7 +1562,7 @@ CopyReadLineText(CopyFromState cstate)
                 result = true;    /* report EOF */
                 break;
             }
-            else if (!cstate->opts.csv_mode)
+            else if (!is_csv)
             {
                 /*
                  * If we are here, it means we found a backslash followed by
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index dd645eaa030..e5696839637 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -35,8 +35,6 @@ extern CopyFromState BeginCopyFrom(ParseState *pstate, Relation rel, Node *where
 extern void EndCopyFrom(CopyFromState cstate);
 extern bool NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
                          Datum *values, bool *nulls);
-extern bool NextCopyFromRawFields(CopyFromState cstate,
-                                  char ***fields, int *nfields);
 extern void CopyFromErrorCallback(void *arg);
 extern char *CopyLimitPrintoutLength(const char *str);
 
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 509b9e92a18..c11b5ff3cc0 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -187,4 +187,12 @@ typedef struct CopyFromStateData
 extern void ReceiveCopyBegin(CopyFromState cstate);
 extern void ReceiveCopyBinaryHeader(CopyFromState cstate);
 
+/* Callbacks for CopyFromRoutine->CopyFromOneRow */
+extern bool CopyFromTextOneRow(CopyFromState cstate, ExprContext *econtext,
+                               Datum *values, bool *nulls);
+extern bool CopyFromCSVOneRow(CopyFromState cstate, ExprContext *econtext,
+                              Datum *values, bool *nulls);
+extern bool CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext,
+                                 Datum *values, bool *nulls);
+
 #endif                            /* COPYFROM_INTERNAL_H */
-- 
2.45.2
From be33c9cf092ae2593391260d1ce99848cee1eef4 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Sun, 29 Sep 2024 00:17:53 +0900
Subject: [PATCH v21 08/10] Add support for adding custom COPY FROM format
This uses the same handler for COPY TO and COPY FROM but uses
different routine. This uses CopyToRoutine for COPY TO and
CopyFromRoutine for COPY FROM. PostgreSQL calls a COPY TO/FROM handler
with "is_from" argument. It's true for COPY FROM and false for COPY
TO:
    copy_handler(true) returns CopyToRoutine
    copy_handler(false) returns CopyFromRoutine
This also add a test module for custom COPY FROM handler.
---
 src/backend/commands/copy.c                   | 52 ++++++++++++-------
 src/backend/commands/copyfrom.c               |  4 +-
 src/include/catalog/pg_type.dat               |  2 +-
 src/include/commands/copyapi.h                |  5 +-
 .../expected/test_copy_format.out             | 10 ++--
 .../test_copy_format/sql/test_copy_format.sql |  1 +
 .../test_copy_format/test_copy_format.c       | 39 +++++++++++++-
 7 files changed, 87 insertions(+), 26 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3aea654ab8a..d7d409379d1 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -450,8 +450,8 @@ defGetCopyLogVerbosityChoice(DefElem *def, ParseState *pstate)
  * This function checks whether the option value is a built-in format such as
  * "text" and "csv" or not. If the option value isn't a built-in format, this
  * function finds a COPY format handler that returns a CopyToRoutine (for
- * is_from == false). If no COPY format handler is found, this function
- * reports an error.
+ * is_from == false) or CopyFromRountine (for is_from == true). If no COPY
+ * format handler is found, this function reports an error.
  */
 static void
 ProcessCopyOptionFormat(ParseState *pstate,
@@ -482,12 +482,9 @@ ProcessCopyOptionFormat(ParseState *pstate,
     }
 
     /* custom format */
-    if (!is_from)
-    {
-        funcargtypes[0] = INTERNALOID;
-        handlerOid = LookupFuncName(list_make1(makeString(format)), 1,
-                                    funcargtypes, true);
-    }
+    funcargtypes[0] = INTERNALOID;
+    handlerOid = LookupFuncName(list_make1(makeString(format)), 1,
+                                funcargtypes, true);
     if (!OidIsValid(handlerOid))
         ereport(ERROR,
                 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -496,17 +493,34 @@ ProcessCopyOptionFormat(ParseState *pstate,
 
     datum = OidFunctionCall1(handlerOid, BoolGetDatum(is_from));
     routine = (Node *) DatumGetPointer(datum);
-    if (routine == NULL || !IsA(routine, CopyToRoutine))
-        ereport(
-                ERROR,
-                (errcode(
-                         ERRCODE_INVALID_PARAMETER_VALUE),
-                 errmsg("COPY handler function "
-                        "%s(%u) did not return a "
-                        "CopyToRoutine struct",
-                        format, handlerOid),
-                 parser_errposition(
-                                    pstate, defel->location)));
+    if (is_from)
+    {
+        if (routine == NULL || !IsA(routine, CopyFromRoutine))
+            ereport(
+                    ERROR,
+                    (errcode(
+                             ERRCODE_INVALID_PARAMETER_VALUE),
+                     errmsg("COPY handler function "
+                            "%s(%u) did not return a "
+                            "CopyFromRoutine struct",
+                            format, handlerOid),
+                     parser_errposition(
+                                        pstate, defel->location)));
+    }
+    else
+    {
+        if (routine == NULL || !IsA(routine, CopyToRoutine))
+            ereport(
+                    ERROR,
+                    (errcode(
+                             ERRCODE_INVALID_PARAMETER_VALUE),
+                     errmsg("COPY handler function "
+                            "%s(%u) did not return a "
+                            "CopyToRoutine struct",
+                            format, handlerOid),
+                     parser_errposition(
+                                        pstate, defel->location)));
+    }
 
     opts_out->routine = routine;
 }
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 14f95f17124..7ecd9a1ad2c 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -247,7 +247,9 @@ static const CopyFromRoutine CopyFromRoutineBinary = {
 static const CopyFromRoutine *
 CopyFromGetRoutine(CopyFormatOptions opts)
 {
-    if (opts.csv_mode)
+    if (opts.routine)
+        return (const CopyFromRoutine *) opts.routine;
+    else if (opts.csv_mode)
         return &CopyFromRoutineCSV;
     else if (opts.binary)
         return &CopyFromRoutineBinary;
diff --git a/src/include/catalog/pg_type.dat b/src/include/catalog/pg_type.dat
index 793dd671935..37ebfa0908f 100644
--- a/src/include/catalog/pg_type.dat
+++ b/src/include/catalog/pg_type.dat
@@ -634,7 +634,7 @@
   typoutput => 'tsm_handler_out', typreceive => '-', typsend => '-',
   typalign => 'i' },
 { oid => '8752',
-  descr => 'pseudo-type for the result of a copy to method function',
+  descr => 'pseudo-type for the result of a copy to/from method function',
   typname => 'copy_handler', typlen => '4', typbyval => 't', typtype => 'p',
   typcategory => 'P', typinput => 'copy_handler_in',
   typoutput => 'copy_handler_out', typreceive => '-', typsend => '-',
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 7421241de83..744d432f387 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -84,7 +84,8 @@ typedef struct CopyFormatOptions
     CopyOnErrorChoice on_error; /* what to do when error happened */
     CopyLogVerbosityChoice log_verbosity;    /* verbosity of logged messages */
     List       *convert_select; /* list of column names (can be NIL) */
-    Node       *routine;        /* CopyToRoutine (can be NULL) */
+    Node       *routine;        /* CopyToRoutine or CopyFromRoutine (can be
+                                 * NULL) */
 } CopyFormatOptions;
 
 /* This is private in commands/copyfrom.c */
@@ -96,6 +97,8 @@ typedef struct CopyFromStateData *CopyFromState;
  */
 typedef struct CopyFromRoutine
 {
+    NodeTag        type;
+
     /*
      * Called when COPY FROM is started to set up the input functions
      * associated with the relation's attributes writing to.  `finfo` can be
diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out
b/src/test/modules/test_copy_format/expected/test_copy_format.out
index 606c78f6878..4ed7c0b12db 100644
--- a/src/test/modules/test_copy_format/expected/test_copy_format.out
+++ b/src/test/modules/test_copy_format/expected/test_copy_format.out
@@ -2,9 +2,13 @@ CREATE EXTENSION test_copy_format;
 CREATE TABLE public.test (a smallint, b integer, c bigint);
 INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
 COPY public.test FROM stdin WITH (format 'test_copy_format');
-ERROR:  COPY format "test_copy_format" not recognized
-LINE 1: COPY public.test FROM stdin WITH (format 'test_copy_format')...
-                                          ^
+NOTICE:  test_copy_format: is_from=true
+NOTICE:  CopyFromInFunc: atttypid=21
+NOTICE:  CopyFromInFunc: atttypid=23
+NOTICE:  CopyFromInFunc: atttypid=20
+NOTICE:  CopyFromStart: natts=3
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromEnd
 COPY public.test TO stdout WITH (format 'test_copy_format');
 NOTICE:  test_copy_format: is_from=false
 NOTICE:  CopyToOutFunc: atttypid=21
diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql
b/src/test/modules/test_copy_format/sql/test_copy_format.sql
index 9406b3be3d4..e805f7cb011 100644
--- a/src/test/modules/test_copy_format/sql/test_copy_format.sql
+++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql
@@ -2,4 +2,5 @@ CREATE EXTENSION test_copy_format;
 CREATE TABLE public.test (a smallint, b integer, c bigint);
 INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
 COPY public.test FROM stdin WITH (format 'test_copy_format');
+\.
 COPY public.test TO stdout WITH (format 'test_copy_format');
diff --git a/src/test/modules/test_copy_format/test_copy_format.c
b/src/test/modules/test_copy_format/test_copy_format.c
index e064f40473b..f6b105659ab 100644
--- a/src/test/modules/test_copy_format/test_copy_format.c
+++ b/src/test/modules/test_copy_format/test_copy_format.c
@@ -18,6 +18,40 @@
 
 PG_MODULE_MAGIC;
 
+static void
+CopyFromInFunc(CopyFromState cstate, Oid atttypid,
+               FmgrInfo *finfo, Oid *typioparam)
+{
+    ereport(NOTICE, (errmsg("CopyFromInFunc: atttypid=%d", atttypid)));
+}
+
+static void
+CopyFromStart(CopyFromState cstate, TupleDesc tupDesc)
+{
+    ereport(NOTICE, (errmsg("CopyFromStart: natts=%d", tupDesc->natts)));
+}
+
+static bool
+CopyFromOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls)
+{
+    ereport(NOTICE, (errmsg("CopyFromOneRow")));
+    return false;
+}
+
+static void
+CopyFromEnd(CopyFromState cstate)
+{
+    ereport(NOTICE, (errmsg("CopyFromEnd")));
+}
+
+static const CopyFromRoutine CopyFromRoutineTestCopyFormat = {
+    .type = T_CopyFromRoutine,
+    .CopyFromInFunc = CopyFromInFunc,
+    .CopyFromStart = CopyFromStart,
+    .CopyFromOneRow = CopyFromOneRow,
+    .CopyFromEnd = CopyFromEnd,
+};
+
 static void
 CopyToOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo)
 {
@@ -59,5 +93,8 @@ test_copy_format(PG_FUNCTION_ARGS)
     ereport(NOTICE,
             (errmsg("test_copy_format: is_from=%s", is_from ? "true" : "false")));
 
-    PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat);
+    if (is_from)
+        PG_RETURN_POINTER(&CopyFromRoutineTestCopyFormat);
+    else
+        PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat);
 }
-- 
2.45.2
From 0c4550533ccd44febf576c144cfd5425002a9a41 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Sun, 29 Sep 2024 00:28:02 +0900
Subject: [PATCH v21 09/10] Export CopyFromStateData
It's for custom COPY FROM format handlers implemented as extension.
This just moves codes. This doesn't change codes except CopySource
enum values. This changes COPY_ prefix of CopySource enum values to
COPY_SOURCE_ prefix like the CopyDest enum values prefix change. For
example, COPY_FILE in CopySource is renamed to COPY_SOURCE_FILE.
Note that this isn't enough to implement custom COPY FROM format
handlers as extension. We'll do the followings in a subsequent commit:
1. Add an opaque space for custom COPY FROM format handler
2. Export CopyReadBinaryData() to read the next data
---
 src/backend/commands/copyfrom.c          |   4 +-
 src/backend/commands/copyfromparse.c     |  10 +-
 src/include/commands/copy.h              |   5 -
 src/include/commands/copyapi.h           | 168 ++++++++++++++++++++++-
 src/include/commands/copyfrom_internal.h | 165 ----------------------
 5 files changed, 174 insertions(+), 178 deletions(-)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 7ecd9a1ad2c..dd4342f8b9c 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1713,7 +1713,7 @@ BeginCopyFrom(ParseState *pstate,
                             pg_encoding_to_char(GetDatabaseEncoding()))));
     }
 
-    cstate->copy_src = COPY_FILE;    /* default */
+    cstate->copy_src = COPY_SOURCE_FILE;    /* default */
 
     cstate->whereClause = whereClause;
 
@@ -1841,7 +1841,7 @@ BeginCopyFrom(ParseState *pstate,
     if (data_source_cb)
     {
         progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK;
-        cstate->copy_src = COPY_CALLBACK;
+        cstate->copy_src = COPY_SOURCE_CALLBACK;
         cstate->data_source_cb = data_source_cb;
     }
     else if (pipe)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 5f63b683d17..ec86a17b3b3 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -180,7 +180,7 @@ ReceiveCopyBegin(CopyFromState cstate)
     for (i = 0; i < natts; i++)
         pq_sendint16(&buf, format); /* per-column formats */
     pq_endmessage(&buf);
-    cstate->copy_src = COPY_FRONTEND;
+    cstate->copy_src = COPY_SOURCE_FRONTEND;
     cstate->fe_msgbuf = makeStringInfo();
     /* We *must* flush here to ensure FE knows it can send. */
     pq_flush();
@@ -248,7 +248,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
 
     switch (cstate->copy_src)
     {
-        case COPY_FILE:
+        case COPY_SOURCE_FILE:
             bytesread = fread(databuf, 1, maxread, cstate->copy_file);
             if (ferror(cstate->copy_file))
                 ereport(ERROR,
@@ -257,7 +257,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
             if (bytesread == 0)
                 cstate->raw_reached_eof = true;
             break;
-        case COPY_FRONTEND:
+        case COPY_SOURCE_FRONTEND:
             while (maxread > 0 && bytesread < minread && !cstate->raw_reached_eof)
             {
                 int            avail;
@@ -340,7 +340,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
                 bytesread += avail;
             }
             break;
-        case COPY_CALLBACK:
+        case COPY_SOURCE_CALLBACK:
             bytesread = cstate->data_source_cb(databuf, minread, maxread);
             break;
     }
@@ -1188,7 +1188,7 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
          * after \. up to the protocol end of copy data.  (XXX maybe better
          * not to treat \. as special?)
          */
-        if (cstate->copy_src == COPY_FRONTEND)
+        if (cstate->copy_src == COPY_SOURCE_FRONTEND)
         {
             int            inbytes;
 
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index e5696839637..e2411848e9f 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -19,11 +19,6 @@
 #include "parser/parse_node.h"
 #include "tcop/dest.h"
 
-/* This is private in commands/copyfrom.c */
-typedef struct CopyFromStateData *CopyFromState;
-
-typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
-
 extern void DoCopy(ParseState *pstate, const CopyStmt *stmt,
                    int stmt_location, int stmt_len,
                    uint64 *processed);
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 744d432f387..c118558ee71 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -88,7 +88,6 @@ typedef struct CopyFormatOptions
                                  * NULL) */
 } CopyFormatOptions;
 
-/* This is private in commands/copyfrom.c */
 typedef struct CopyFromStateData *CopyFromState;
 
 /*
@@ -135,6 +134,173 @@ typedef struct CopyFromRoutine
     void        (*CopyFromEnd) (CopyFromState cstate);
 } CopyFromRoutine;
 
+/*
+ * Represents the different source cases we need to worry about at
+ * the bottom level
+ */
+typedef enum CopySource
+{
+    COPY_SOURCE_FILE,            /* from file (or a piped program) */
+    COPY_SOURCE_FRONTEND,        /* from frontend */
+    COPY_SOURCE_CALLBACK,        /* from callback function */
+} CopySource;
+
+/*
+ * Represents the end-of-line terminator type of the input
+ */
+typedef enum EolType
+{
+    EOL_UNKNOWN,
+    EOL_NL,
+    EOL_CR,
+    EOL_CRNL,
+} EolType;
+
+/*
+ * Represents the insert method to be used during COPY FROM.
+ */
+typedef enum CopyInsertMethod
+{
+    CIM_SINGLE,                    /* use table_tuple_insert or ExecForeignInsert */
+    CIM_MULTI,                    /* always use table_multi_insert or
+                                 * ExecForeignBatchInsert */
+    CIM_MULTI_CONDITIONAL,        /* use table_multi_insert or
+                                 * ExecForeignBatchInsert only if valid */
+} CopyInsertMethod;
+
+typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
+
+/*
+ * This struct contains all the state variables used throughout a COPY FROM
+ * operation.
+ */
+typedef struct CopyFromStateData
+{
+    /* format routine */
+    const CopyFromRoutine *routine;
+
+    /* low-level state data */
+    CopySource    copy_src;        /* type of copy source */
+    FILE       *copy_file;        /* used if copy_src == COPY_FILE */
+    StringInfo    fe_msgbuf;        /* used if copy_src == COPY_FRONTEND */
+
+    EolType        eol_type;        /* EOL type of input */
+    int            file_encoding;    /* file or remote side's character encoding */
+    bool        need_transcoding;    /* file encoding diff from server? */
+    Oid            conversion_proc;    /* encoding conversion function */
+
+    /* parameters from the COPY command */
+    Relation    rel;            /* relation to copy from */
+    List       *attnumlist;        /* integer list of attnums to copy */
+    char       *filename;        /* filename, or NULL for STDIN */
+    bool        is_program;        /* is 'filename' a program to popen? */
+    copy_data_source_cb data_source_cb; /* function for reading data */
+
+    CopyFormatOptions opts;
+    bool       *convert_select_flags;    /* per-column CSV/TEXT CS flags */
+    Node       *whereClause;    /* WHERE condition (or NULL) */
+
+    /* these are just for error messages, see CopyFromErrorCallback */
+    const char *cur_relname;    /* table name for error messages */
+    uint64        cur_lineno;        /* line number for error messages */
+    const char *cur_attname;    /* current att for error messages */
+    const char *cur_attval;        /* current att value for error messages */
+    bool        relname_only;    /* don't output line number, att, etc. */
+
+    /*
+     * Working state
+     */
+    MemoryContext copycontext;    /* per-copy execution context */
+
+    AttrNumber    num_defaults;    /* count of att that are missing and have
+                                 * default value */
+    FmgrInfo   *in_functions;    /* array of input functions for each attrs */
+    Oid           *typioparams;    /* array of element types for in_functions */
+    ErrorSaveContext *escontext;    /* soft error trapper during in_functions
+                                     * execution */
+    uint64        num_errors;        /* total number of rows which contained soft
+                                 * errors */
+    int           *defmap;            /* array of default att numbers related to
+                                 * missing att */
+    ExprState **defexprs;        /* array of default att expressions for all
+                                 * att */
+    bool       *defaults;        /* if DEFAULT marker was found for
+                                 * corresponding att */
+    bool        volatile_defexprs;    /* is any of defexprs volatile? */
+    List       *range_table;    /* single element list of RangeTblEntry */
+    List       *rteperminfos;    /* single element list of RTEPermissionInfo */
+    ExprState  *qualexpr;
+
+    TransitionCaptureState *transition_capture;
+
+    /*
+     * These variables are used to reduce overhead in COPY FROM.
+     *
+     * attribute_buf holds the separated, de-escaped text for each field of
+     * the current line.  The CopyReadAttributes functions return arrays of
+     * pointers into this buffer.  We avoid palloc/pfree overhead by re-using
+     * the buffer on each cycle.
+     *
+     * In binary COPY FROM, attribute_buf holds the binary data for the
+     * current field, but the usage is otherwise similar.
+     */
+    StringInfoData attribute_buf;
+
+    /* field raw data pointers found by COPY FROM */
+
+    int            max_fields;
+    char      **raw_fields;
+
+    /*
+     * Similarly, line_buf holds the whole input line being processed. The
+     * input cycle is first to read the whole line into line_buf, and then
+     * extract the individual attribute fields into attribute_buf.  line_buf
+     * is preserved unmodified so that we can display it in error messages if
+     * appropriate.  (In binary mode, line_buf is not used.)
+     */
+    StringInfoData line_buf;
+    bool        line_buf_valid; /* contains the row being processed? */
+
+    /*
+     * input_buf holds input data, already converted to database encoding.
+     *
+     * In text mode, CopyReadLine parses this data sufficiently to locate line
+     * boundaries, then transfers the data to line_buf. We guarantee that
+     * there is a \0 at input_buf[input_buf_len] at all times.  (In binary
+     * mode, input_buf is not used.)
+     *
+     * If encoding conversion is not required, input_buf is not a separate
+     * buffer but points directly to raw_buf.  In that case, input_buf_len
+     * tracks the number of bytes that have been verified as valid in the
+     * database encoding, and raw_buf_len is the total number of bytes stored
+     * in the buffer.
+     */
+#define INPUT_BUF_SIZE 65536    /* we palloc INPUT_BUF_SIZE+1 bytes */
+    char       *input_buf;
+    int            input_buf_index;    /* next byte to process */
+    int            input_buf_len;    /* total # of bytes stored */
+    bool        input_reached_eof;    /* true if we reached EOF */
+    bool        input_reached_error;    /* true if a conversion error happened */
+    /* Shorthand for number of unconsumed bytes available in input_buf */
+#define INPUT_BUF_BYTES(cstate) ((cstate)->input_buf_len - (cstate)->input_buf_index)
+
+    /*
+     * raw_buf holds raw input data read from the data source (file or client
+     * connection), not yet converted to the database encoding.  Like with
+     * 'input_buf', we guarantee that there is a \0 at raw_buf[raw_buf_len].
+     */
+#define RAW_BUF_SIZE 65536        /* we palloc RAW_BUF_SIZE+1 bytes */
+    char       *raw_buf;
+    int            raw_buf_index;    /* next byte to process */
+    int            raw_buf_len;    /* total # of bytes stored */
+    bool        raw_reached_eof;    /* true if we reached EOF */
+
+    /* Shorthand for number of unconsumed bytes available in raw_buf */
+#define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index)
+
+    uint64        bytes_processed;    /* number of bytes processed so far */
+} CopyFromStateData;
+
 typedef struct CopyToStateData *CopyToState;
 
 /*
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index c11b5ff3cc0..3863d26d5b7 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -19,171 +19,6 @@
 #include "commands/trigger.h"
 #include "nodes/miscnodes.h"
 
-/*
- * Represents the different source cases we need to worry about at
- * the bottom level
- */
-typedef enum CopySource
-{
-    COPY_FILE,                    /* from file (or a piped program) */
-    COPY_FRONTEND,                /* from frontend */
-    COPY_CALLBACK,                /* from callback function */
-} CopySource;
-
-/*
- *    Represents the end-of-line terminator type of the input
- */
-typedef enum EolType
-{
-    EOL_UNKNOWN,
-    EOL_NL,
-    EOL_CR,
-    EOL_CRNL,
-} EolType;
-
-/*
- * Represents the insert method to be used during COPY FROM.
- */
-typedef enum CopyInsertMethod
-{
-    CIM_SINGLE,                    /* use table_tuple_insert or ExecForeignInsert */
-    CIM_MULTI,                    /* always use table_multi_insert or
-                                 * ExecForeignBatchInsert */
-    CIM_MULTI_CONDITIONAL,        /* use table_multi_insert or
-                                 * ExecForeignBatchInsert only if valid */
-} CopyInsertMethod;
-
-/*
- * This struct contains all the state variables used throughout a COPY FROM
- * operation.
- */
-typedef struct CopyFromStateData
-{
-    /* format routine */
-    const CopyFromRoutine *routine;
-
-    /* low-level state data */
-    CopySource    copy_src;        /* type of copy source */
-    FILE       *copy_file;        /* used if copy_src == COPY_FILE */
-    StringInfo    fe_msgbuf;        /* used if copy_src == COPY_FRONTEND */
-
-    EolType        eol_type;        /* EOL type of input */
-    int            file_encoding;    /* file or remote side's character encoding */
-    bool        need_transcoding;    /* file encoding diff from server? */
-    Oid            conversion_proc;    /* encoding conversion function */
-
-    /* parameters from the COPY command */
-    Relation    rel;            /* relation to copy from */
-    List       *attnumlist;        /* integer list of attnums to copy */
-    char       *filename;        /* filename, or NULL for STDIN */
-    bool        is_program;        /* is 'filename' a program to popen? */
-    copy_data_source_cb data_source_cb; /* function for reading data */
-
-    CopyFormatOptions opts;
-    bool       *convert_select_flags;    /* per-column CSV/TEXT CS flags */
-    Node       *whereClause;    /* WHERE condition (or NULL) */
-
-    /* these are just for error messages, see CopyFromErrorCallback */
-    const char *cur_relname;    /* table name for error messages */
-    uint64        cur_lineno;        /* line number for error messages */
-    const char *cur_attname;    /* current att for error messages */
-    const char *cur_attval;        /* current att value for error messages */
-    bool        relname_only;    /* don't output line number, att, etc. */
-
-    /*
-     * Working state
-     */
-    MemoryContext copycontext;    /* per-copy execution context */
-
-    AttrNumber    num_defaults;    /* count of att that are missing and have
-                                 * default value */
-    FmgrInfo   *in_functions;    /* array of input functions for each attrs */
-    Oid           *typioparams;    /* array of element types for in_functions */
-    ErrorSaveContext *escontext;    /* soft error trapper during in_functions
-                                     * execution */
-    uint64        num_errors;        /* total number of rows which contained soft
-                                 * errors */
-    int           *defmap;            /* array of default att numbers related to
-                                 * missing att */
-    ExprState **defexprs;        /* array of default att expressions for all
-                                 * att */
-    bool       *defaults;        /* if DEFAULT marker was found for
-                                 * corresponding att */
-    bool        volatile_defexprs;    /* is any of defexprs volatile? */
-    List       *range_table;    /* single element list of RangeTblEntry */
-    List       *rteperminfos;    /* single element list of RTEPermissionInfo */
-    ExprState  *qualexpr;
-
-    TransitionCaptureState *transition_capture;
-
-    /*
-     * These variables are used to reduce overhead in COPY FROM.
-     *
-     * attribute_buf holds the separated, de-escaped text for each field of
-     * the current line.  The CopyReadAttributes functions return arrays of
-     * pointers into this buffer.  We avoid palloc/pfree overhead by re-using
-     * the buffer on each cycle.
-     *
-     * In binary COPY FROM, attribute_buf holds the binary data for the
-     * current field, but the usage is otherwise similar.
-     */
-    StringInfoData attribute_buf;
-
-    /* field raw data pointers found by COPY FROM */
-
-    int            max_fields;
-    char      **raw_fields;
-
-    /*
-     * Similarly, line_buf holds the whole input line being processed. The
-     * input cycle is first to read the whole line into line_buf, and then
-     * extract the individual attribute fields into attribute_buf.  line_buf
-     * is preserved unmodified so that we can display it in error messages if
-     * appropriate.  (In binary mode, line_buf is not used.)
-     */
-    StringInfoData line_buf;
-    bool        line_buf_valid; /* contains the row being processed? */
-
-    /*
-     * input_buf holds input data, already converted to database encoding.
-     *
-     * In text mode, CopyReadLine parses this data sufficiently to locate line
-     * boundaries, then transfers the data to line_buf. We guarantee that
-     * there is a \0 at input_buf[input_buf_len] at all times.  (In binary
-     * mode, input_buf is not used.)
-     *
-     * If encoding conversion is not required, input_buf is not a separate
-     * buffer but points directly to raw_buf.  In that case, input_buf_len
-     * tracks the number of bytes that have been verified as valid in the
-     * database encoding, and raw_buf_len is the total number of bytes stored
-     * in the buffer.
-     */
-#define INPUT_BUF_SIZE 65536    /* we palloc INPUT_BUF_SIZE+1 bytes */
-    char       *input_buf;
-    int            input_buf_index;    /* next byte to process */
-    int            input_buf_len;    /* total # of bytes stored */
-    bool        input_reached_eof;    /* true if we reached EOF */
-    bool        input_reached_error;    /* true if a conversion error happened */
-    /* Shorthand for number of unconsumed bytes available in input_buf */
-#define INPUT_BUF_BYTES(cstate) ((cstate)->input_buf_len - (cstate)->input_buf_index)
-
-    /*
-     * raw_buf holds raw input data read from the data source (file or client
-     * connection), not yet converted to the database encoding.  Like with
-     * 'input_buf', we guarantee that there is a \0 at raw_buf[raw_buf_len].
-     */
-#define RAW_BUF_SIZE 65536        /* we palloc RAW_BUF_SIZE+1 bytes */
-    char       *raw_buf;
-    int            raw_buf_index;    /* next byte to process */
-    int            raw_buf_len;    /* total # of bytes stored */
-    bool        raw_reached_eof;    /* true if we reached EOF */
-
-    /* Shorthand for number of unconsumed bytes available in raw_buf */
-#define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index)
-
-    uint64        bytes_processed;    /* number of bytes processed so far */
-} CopyFromStateData;
-
 extern void ReceiveCopyBegin(CopyFromState cstate);
 extern void ReceiveCopyBinaryHeader(CopyFromState cstate);
 
-- 
2.45.2
From 0c31ea16da2fe11004443e3fc1646d4f0dca0247 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Sun, 29 Sep 2024 00:32:31 +0900
Subject: [PATCH v21 10/10] Add support for implementing custom COPY FROM
 format as extension
* Add CopyFromStateData::opaque that can be used to keep data for
  custom COPY From format implementation
* Export CopyReadBinaryData() to read the next data as
  CopyFromStateRead()
---
 src/backend/commands/copyfromparse.c | 14 ++++++++++++++
 src/include/commands/copyapi.h       |  6 ++++++
 2 files changed, 20 insertions(+)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index ec86a17b3b3..64772877b0f 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -739,6 +739,20 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes)
     return copied_bytes;
 }
 
+/*
+ * CopyFromStateRead
+ *
+ * Export CopyReadBinaryData() for extensions. We want to keep
+ * CopyReadBinaryData() as a static function for
+ * optimization. CopyReadBinaryData() calls in this file may be optimized by
+ * a compiler.
+ */
+int
+CopyFromStateRead(CopyFromState cstate, char *dest, int nbytes)
+{
+    return CopyReadBinaryData(cstate, dest, nbytes);
+}
+
 /*
  * Read raw fields in the next line for COPY FROM in text or csv mode.
  * Return false if no more lines.
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index c118558ee71..c1e9fe366f3 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -299,8 +299,14 @@ typedef struct CopyFromStateData
 #define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index)
 
     uint64        bytes_processed;    /* number of bytes processed so far */
+
+    /* For custom format implementation */
+    void       *opaque;            /* private space */
 } CopyFromStateData;
 
+extern int    CopyFromStateRead(CopyFromState cstate, char *dest, int nbytes);
+
+
 typedef struct CopyToStateData *CopyToState;
 
 /*
-- 
2.45.2
			
		On Tue, Oct 8, 2024 at 8:34 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Mon, Oct 07, 2024 at 03:23:08PM -0700, Masahiko Sawada wrote: > > In the benchmark, I've applied the v20 patch set and 'master' in the > > result refers to a19f83f87966. And I disabled CPU turbo boost where > > possible. Overall, v20 patch got a similar or better performance in > > both COPY FROM and COPY TO compared to master except for on MacOS. > > I'm not sure that changes made to master since the last benchmark run by > > Tomas and Suto-san might contribute to these results. > > Don't think so. FWIW, I have been looking at the set of tests with > previous patch versions around v7 and v10 I have done, and did notice > a similar pattern where COPY FROM was getting slightly better for text > and binary. It did not look like only noise involved, and it was > kind of reproducible. As long as we avoid the function pointer > redirection for the per-row processing when dealing with in-core > formats, we should be fine as far as I understand. That's what the > latest patch set is doing based on a read of v21. Yeah, what v21 patch is doing makes sense to me. > > > I'll try to investigate the performance regression that happened on MacOS. > > I don't have a good explanation for this one. Did you mount the data > folder on a tmpfs and made sure that all the workloads were > CPU-bounded? Yes, I used tmpfs and workloads were CPU-bound. > > > I think that other performance differences in my results seem to be within > > noises and could be acceptable. Of course, it would be great if others > > also could try to run benchmark tests. > > Yeah. At 1~2% it could be noise, but there are reproducible 1~2% > evolutions. In the good sense here, it means. In real workloads, COPY FROM/TO operations would be more disk I/O bound. I think that 1~2% performance differences that were shown in CPU-bound workload would not be a problem in practice. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Hi, In <CAD21AoAdj-EJOH1o2fTLke-uskSvuenT--fKW9nkLzYcLwU_eg@mail.gmail.com> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 4 Nov 2024 22:19:07 -0800, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > I've further investigated the performance regression, and found out it > might be relevant that the compiler doesn't inline the > CopyFromTextLikeOneRow() function. It might be worth testing with > pg_attribute_always_inline instead of 'inline' as below: Wow! Good catch! I've rebased on the current master and updated the v20 and v21 patch sets with "pg_attribute_always_inline" not "inline". The v22 patch set is for the v20 patch set. (TO/FROM changes are in one commit.) The v23 patch set is for the v21 patch set. (TO/FROM changes are separated for easy to merge only FROM or TO part.) I'll run benchmark on my environment again. Thanks, -- kou From 960414f4d256b0d250a70156aac50f88e07de19a Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Mon, 4 Mar 2024 13:52:34 +0900 Subject: [PATCH v22 1/5] Add CopyToRoutine/CopyFromRountine They are for implementing custom COPY TO/FROM format. But this is not enough to implement custom COPY TO/FROM format yet. We'll export some APIs to receive/send data and add "format" option to COPY TO/FROM later. Existing text/csv/binary format implementations don't use CopyToRoutine/CopyFromRoutine for now. We have a patch for it but we defer it. Because there are some mysterious profile results in spite of we get faster runtimes. See [1] for details. [1] https://www.postgresql.org/message-id/ZdbtQJ-p5H1_EDwE%40paquier.xyz Note that this doesn't change existing text/csv/binary format implementations. --- src/backend/commands/copyfrom.c | 24 +++++- src/backend/commands/copyfromparse.c | 5 ++ src/backend/commands/copyto.c | 31 ++++++- src/include/commands/copyapi.h | 101 +++++++++++++++++++++++ src/include/commands/copyfrom_internal.h | 4 + src/tools/pgindent/typedefs.list | 2 + 6 files changed, 159 insertions(+), 8 deletions(-) create mode 100644 src/include/commands/copyapi.h diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index 07cbd5d22b8..909375e81b7 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -1635,12 +1635,22 @@ BeginCopyFrom(ParseState *pstate, /* Fetch the input function and typioparam info */ if (cstate->opts.binary) + { getTypeBinaryInputInfo(att->atttypid, &in_func_oid, &typioparams[attnum - 1]); + fmgr_info(in_func_oid, &in_functions[attnum - 1]); + } + else if (cstate->routine) + cstate->routine->CopyFromInFunc(cstate, att->atttypid, + &in_functions[attnum - 1], + &typioparams[attnum - 1]); + else + { getTypeInputInfo(att->atttypid, &in_func_oid, &typioparams[attnum - 1]); - fmgr_info(in_func_oid, &in_functions[attnum - 1]); + fmgr_info(in_func_oid, &in_functions[attnum - 1]); + } /* Get default info if available */ defexprs[attnum - 1] = NULL; @@ -1780,10 +1790,13 @@ BeginCopyFrom(ParseState *pstate, /* Read and verify binary header */ ReceiveCopyBinaryHeader(cstate); } - - /* create workspace for CopyReadAttributes results */ - if (!cstate->opts.binary) + else if (cstate->routine) { + cstate->routine->CopyFromStart(cstate, tupDesc); + } + else + { + /* create workspace for CopyReadAttributes results */ AttrNumber attr_count = list_length(cstate->attnumlist); cstate->max_fields = attr_count; @@ -1801,6 +1814,9 @@ BeginCopyFrom(ParseState *pstate, void EndCopyFrom(CopyFromState cstate) { + if (cstate->routine) + cstate->routine->CopyFromEnd(cstate); + /* No COPY FROM related resources except memory. */ if (cstate->is_program) { diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index d1d43b53d83..b104e4a9114 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -1003,6 +1003,11 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext, Assert(fieldno == attr_count); } + else if (cstate->routine) + { + if (!cstate->routine->CopyFromOneRow(cstate, econtext, values, nulls)) + return false; + } else { /* binary */ diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c index f55e6d96751..405e1782685 100644 --- a/src/backend/commands/copyto.c +++ b/src/backend/commands/copyto.c @@ -20,6 +20,7 @@ #include "access/tableam.h" #include "commands/copy.h" +#include "commands/copyapi.h" #include "commands/progress.h" #include "executor/execdesc.h" #include "executor/executor.h" @@ -64,6 +65,9 @@ typedef enum CopyDest */ typedef struct CopyToStateData { + /* format routine */ + const CopyToRoutine *routine; + /* low-level state data */ CopyDest copy_dest; /* type of copy source/destination */ FILE *copy_file; /* used if copy_dest == COPY_FILE */ @@ -776,14 +780,22 @@ DoCopyTo(CopyToState cstate) Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1); if (cstate->opts.binary) + { getTypeBinaryOutputInfo(attr->atttypid, &out_func_oid, &isvarlena); + fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]); + } + else if (cstate->routine) + cstate->routine->CopyToOutFunc(cstate, attr->atttypid, + &cstate->out_functions[attnum - 1]); else + { getTypeOutputInfo(attr->atttypid, &out_func_oid, &isvarlena); - fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]); + fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]); + } } /* @@ -810,6 +822,8 @@ DoCopyTo(CopyToState cstate) tmp = 0; CopySendInt32(cstate, tmp); } + else if (cstate->routine) + cstate->routine->CopyToStart(cstate, tupDesc); else { /* @@ -891,6 +905,8 @@ DoCopyTo(CopyToState cstate) /* Need to flush out the trailer */ CopySendEndOfRow(cstate); } + else if (cstate->routine) + cstate->routine->CopyToEnd(cstate); MemoryContextDelete(cstate->rowcontext); @@ -912,15 +928,22 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot) MemoryContextReset(cstate->rowcontext); oldcontext = MemoryContextSwitchTo(cstate->rowcontext); + /* Make sure the tuple is fully deconstructed */ + slot_getallattrs(slot); + + if (cstate->routine) + { + cstate->routine->CopyToOneRow(cstate, slot); + MemoryContextSwitchTo(oldcontext); + return; + } + if (cstate->opts.binary) { /* Binary per-tuple header */ CopySendInt16(cstate, list_length(cstate->attnumlist)); } - /* Make sure the tuple is fully deconstructed */ - slot_getallattrs(slot); - if (!cstate->opts.binary) { bool need_delim = false; diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h new file mode 100644 index 00000000000..d1289424c67 --- /dev/null +++ b/src/include/commands/copyapi.h @@ -0,0 +1,101 @@ +/*------------------------------------------------------------------------- + * + * copyapi.h + * API for COPY TO/FROM handlers + * + * + * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + * src/include/commands/copyapi.h + * + *------------------------------------------------------------------------- + */ +#ifndef COPYAPI_H +#define COPYAPI_H + +#include "executor/tuptable.h" +#include "nodes/execnodes.h" + +/* These are private in commands/copy[from|to].c */ +typedef struct CopyFromStateData *CopyFromState; +typedef struct CopyToStateData *CopyToState; + +/* + * API structure for a COPY FROM format implementation. Note this must be + * allocated in a server-lifetime manner, typically as a static const struct. + */ +typedef struct CopyFromRoutine +{ + /* + * Called when COPY FROM is started to set up the input functions + * associated with the relation's attributes writing to. `finfo` can be + * optionally filled to provide the catalog information of the input + * function. `typioparam` can be optionally filled to define the OID of + * the type to pass to the input function. `atttypid` is the OID of data + * type used by the relation's attribute. + */ + void (*CopyFromInFunc) (CopyFromState cstate, Oid atttypid, + FmgrInfo *finfo, Oid *typioparam); + + /* + * Called when COPY FROM is started. + * + * `tupDesc` is the tuple descriptor of the relation where the data needs + * to be copied. This can be used for any initialization steps required + * by a format. + */ + void (*CopyFromStart) (CopyFromState cstate, TupleDesc tupDesc); + + /* + * Copy one row to a set of `values` and `nulls` of size tupDesc->natts. + * + * 'econtext' is used to evaluate default expression for each column that + * is either not read from the file or is using the DEFAULT option of COPY + * FROM. It is NULL if no default values are used. + * + * Returns false if there are no more tuples to copy. + */ + bool (*CopyFromOneRow) (CopyFromState cstate, ExprContext *econtext, + Datum *values, bool *nulls); + + /* Called when COPY FROM has ended. */ + void (*CopyFromEnd) (CopyFromState cstate); +} CopyFromRoutine; + +/* + * API structure for a COPY TO format implementation. Note this must be + * allocated in a server-lifetime manner, typically as a static const struct. + */ +typedef struct CopyToRoutine +{ + /* + * Called when COPY TO is started to set up the output functions + * associated with the relation's attributes reading from. `finfo` can be + * optionally filled to provide the catalog information of the output + * function. `atttypid` is the OID of data type used by the relation's + * attribute. + */ + void (*CopyToOutFunc) (CopyToState cstate, Oid atttypid, + FmgrInfo *finfo); + + /* + * Called when COPY TO is started. + * + * `tupDesc` is the tuple descriptor of the relation from where the data + * is read. + */ + void (*CopyToStart) (CopyToState cstate, TupleDesc tupDesc); + + /* + * Copy one row for COPY TO. + * + * `slot` is the tuple slot where the data is emitted. + */ + void (*CopyToOneRow) (CopyToState cstate, TupleTableSlot *slot); + + /* Called when COPY TO has ended */ + void (*CopyToEnd) (CopyToState cstate); +} CopyToRoutine; + +#endif /* COPYAPI_H */ diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h index cad52fcc783..509b9e92a18 100644 --- a/src/include/commands/copyfrom_internal.h +++ b/src/include/commands/copyfrom_internal.h @@ -15,6 +15,7 @@ #define COPYFROM_INTERNAL_H #include "commands/copy.h" +#include "commands/copyapi.h" #include "commands/trigger.h" #include "nodes/miscnodes.h" @@ -58,6 +59,9 @@ typedef enum CopyInsertMethod */ typedef struct CopyFromStateData { + /* format routine */ + const CopyFromRoutine *routine; + /* low-level state data */ CopySource copy_src; /* type of copy source */ FILE *copy_file; /* used if copy_src == COPY_FILE */ diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list index 1847bbfa95c..a8422fa4d35 100644 --- a/src/tools/pgindent/typedefs.list +++ b/src/tools/pgindent/typedefs.list @@ -492,6 +492,7 @@ ConvertRowtypeExpr CookedConstraint CopyDest CopyFormatOptions +CopyFromRoutine CopyFromState CopyFromStateData CopyHeaderChoice @@ -503,6 +504,7 @@ CopyMultiInsertInfo CopyOnErrorChoice CopySource CopyStmt +CopyToRoutine CopyToState CopyToStateData Cost -- 2.45.2 From 78ed1bf847051f09f417980931c031cfa5d93e4c Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Tue, 23 Jul 2024 16:44:44 +0900 Subject: [PATCH v22 2/5] Use CopyToRoutine/CopyFromRountine for the existing formats The existing formats are text, csv and binary. If we find any performance regression by this, we will not merge this to master. This will increase indirect function call costs but this will reduce runtime "if (cstate->opts.binary)" and "if (cstate->opts.csv_mode)" branch costs. This uses an optimization based of static inline function and a constant argument call for cstate->opts.csv_mode. For example, CopyFromTextLikeOneRow() uses this optimization. It accepts the "bool is_csv" argument instead of using cstate->opts.csv_mode in it. CopyFromTextOneRow() calls CopyFromTextLikeOneRow() with false (constant) for "bool is_csv". Compiler will remove "if (is_csv)" branch in it by this optimization. This doesn't change existing logic. This just moves existing codes. --- src/backend/commands/copyfrom.c | 215 ++++++--- src/backend/commands/copyfromparse.c | 530 +++++++++++++---------- src/backend/commands/copyto.c | 477 +++++++++++++------- src/include/commands/copy.h | 2 - src/include/commands/copyfrom_internal.h | 8 + 5 files changed, 790 insertions(+), 442 deletions(-) diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index 909375e81b7..e6ea9ce1602 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -106,6 +106,157 @@ typedef struct CopyMultiInsertInfo /* non-export function prototypes */ static void ClosePipeFromProgram(CopyFromState cstate); + +/* + * CopyFromRoutine implementations for text and CSV. + */ + +/* + * CopyFromTextLikeInFunc + * + * Assign input function data for a relation's attribute in text/CSV format. + */ +static void +CopyFromTextLikeInFunc(CopyFromState cstate, Oid atttypid, + FmgrInfo *finfo, Oid *typioparam) +{ + Oid func_oid; + + getTypeInputInfo(atttypid, &func_oid, typioparam); + fmgr_info(func_oid, finfo); +} + +/* + * CopyFromTextLikeStart + * + * Start of COPY FROM for text/CSV format. + */ +static void +CopyFromTextLikeStart(CopyFromState cstate, TupleDesc tupDesc) +{ + AttrNumber attr_count; + + /* + * If encoding conversion is needed, we need another buffer to hold the + * converted input data. Otherwise, we can just point input_buf to the + * same buffer as raw_buf. + */ + if (cstate->need_transcoding) + { + cstate->input_buf = (char *) palloc(INPUT_BUF_SIZE + 1); + cstate->input_buf_index = cstate->input_buf_len = 0; + } + else + cstate->input_buf = cstate->raw_buf; + cstate->input_reached_eof = false; + + initStringInfo(&cstate->line_buf); + + /* + * Create workspace for CopyReadAttributes results; used by CSV and text + * format. + */ + attr_count = list_length(cstate->attnumlist); + cstate->max_fields = attr_count; + cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *)); +} + +/* + * CopyFromTextLikeEnd + * + * End of COPY FROM for text/CSV format. + */ +static void +CopyFromTextLikeEnd(CopyFromState cstate) +{ + /* nothing to do */ +} + +/* + * CopyFromRoutine implementation for "binary". + */ + +/* + * CopyFromBinaryInFunc + * + * Assign input function data for a relation's attribute in binary format. + */ +static void +CopyFromBinaryInFunc(CopyFromState cstate, Oid atttypid, + FmgrInfo *finfo, Oid *typioparam) +{ + Oid func_oid; + + getTypeBinaryInputInfo(atttypid, &func_oid, typioparam); + fmgr_info(func_oid, finfo); +} + +/* + * CopyFromBinaryStart + * + * Start of COPY FROM for binary format. + */ +static void +CopyFromBinaryStart(CopyFromState cstate, TupleDesc tupDesc) +{ + /* Read and verify binary header */ + ReceiveCopyBinaryHeader(cstate); +} + +/* + * CopyFromBinaryEnd + * + * End of COPY FROM for binary format. + */ +static void +CopyFromBinaryEnd(CopyFromState cstate) +{ + /* nothing to do */ +} + +/* + * Routines assigned to each format. ++ + * CSV and text share the same implementation, at the exception of the + * per-row callback. + */ +static const CopyFromRoutine CopyFromRoutineText = { + .CopyFromInFunc = CopyFromTextLikeInFunc, + .CopyFromStart = CopyFromTextLikeStart, + .CopyFromOneRow = CopyFromTextOneRow, + .CopyFromEnd = CopyFromTextLikeEnd, +}; + +static const CopyFromRoutine CopyFromRoutineCSV = { + .CopyFromInFunc = CopyFromTextLikeInFunc, + .CopyFromStart = CopyFromTextLikeStart, + .CopyFromOneRow = CopyFromCSVOneRow, + .CopyFromEnd = CopyFromTextLikeEnd, +}; + +static const CopyFromRoutine CopyFromRoutineBinary = { + .CopyFromInFunc = CopyFromBinaryInFunc, + .CopyFromStart = CopyFromBinaryStart, + .CopyFromOneRow = CopyFromBinaryOneRow, + .CopyFromEnd = CopyFromBinaryEnd, +}; + +/* + * Define the COPY FROM routines to use for a format. + */ +static const CopyFromRoutine * +CopyFromGetRoutine(CopyFormatOptions opts) +{ + if (opts.csv_mode) + return &CopyFromRoutineCSV; + else if (opts.binary) + return &CopyFromRoutineBinary; + + /* default is text */ + return &CopyFromRoutineText; +} + + /* * error context callback for COPY FROM * @@ -1396,7 +1547,6 @@ BeginCopyFrom(ParseState *pstate, num_defaults; FmgrInfo *in_functions; Oid *typioparams; - Oid in_func_oid; int *defmap; ExprState **defexprs; MemoryContext oldcontext; @@ -1428,6 +1578,9 @@ BeginCopyFrom(ParseState *pstate, /* Extract options from the statement node tree */ ProcessCopyOptions(pstate, &cstate->opts, true /* is_from */ , options); + /* Set format routine */ + cstate->routine = CopyFromGetRoutine(cstate->opts); + /* Process the target relation */ cstate->rel = rel; @@ -1583,25 +1736,6 @@ BeginCopyFrom(ParseState *pstate, cstate->raw_buf_index = cstate->raw_buf_len = 0; cstate->raw_reached_eof = false; - if (!cstate->opts.binary) - { - /* - * If encoding conversion is needed, we need another buffer to hold - * the converted input data. Otherwise, we can just point input_buf - * to the same buffer as raw_buf. - */ - if (cstate->need_transcoding) - { - cstate->input_buf = (char *) palloc(INPUT_BUF_SIZE + 1); - cstate->input_buf_index = cstate->input_buf_len = 0; - } - else - cstate->input_buf = cstate->raw_buf; - cstate->input_reached_eof = false; - - initStringInfo(&cstate->line_buf); - } - initStringInfo(&cstate->attribute_buf); /* Assign range table and rteperminfos, we'll need them in CopyFrom. */ @@ -1634,23 +1768,9 @@ BeginCopyFrom(ParseState *pstate, continue; /* Fetch the input function and typioparam info */ - if (cstate->opts.binary) - { - getTypeBinaryInputInfo(att->atttypid, - &in_func_oid, &typioparams[attnum - 1]); - fmgr_info(in_func_oid, &in_functions[attnum - 1]); - } - else if (cstate->routine) - cstate->routine->CopyFromInFunc(cstate, att->atttypid, - &in_functions[attnum - 1], - &typioparams[attnum - 1]); - - else - { - getTypeInputInfo(att->atttypid, - &in_func_oid, &typioparams[attnum - 1]); - fmgr_info(in_func_oid, &in_functions[attnum - 1]); - } + cstate->routine->CopyFromInFunc(cstate, att->atttypid, + &in_functions[attnum - 1], + &typioparams[attnum - 1]); /* Get default info if available */ defexprs[attnum - 1] = NULL; @@ -1785,23 +1905,7 @@ BeginCopyFrom(ParseState *pstate, pgstat_progress_update_multi_param(3, progress_cols, progress_vals); - if (cstate->opts.binary) - { - /* Read and verify binary header */ - ReceiveCopyBinaryHeader(cstate); - } - else if (cstate->routine) - { - cstate->routine->CopyFromStart(cstate, tupDesc); - } - else - { - /* create workspace for CopyReadAttributes results */ - AttrNumber attr_count = list_length(cstate->attnumlist); - - cstate->max_fields = attr_count; - cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *)); - } + cstate->routine->CopyFromStart(cstate, tupDesc); MemoryContextSwitchTo(oldcontext); @@ -1814,8 +1918,7 @@ BeginCopyFrom(ParseState *pstate, void EndCopyFrom(CopyFromState cstate) { - if (cstate->routine) - cstate->routine->CopyFromEnd(cstate); + cstate->routine->CopyFromEnd(cstate); /* No COPY FROM related resources except memory. */ if (cstate->is_program) diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index b104e4a9114..0447c4df7e0 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -140,8 +140,8 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0"; /* non-export function prototypes */ -static bool CopyReadLine(CopyFromState cstate); -static bool CopyReadLineText(CopyFromState cstate); +static bool CopyReadLine(CopyFromState cstate, bool is_csv); +static pg_attribute_always_inline bool CopyReadLineText(CopyFromState cstate, bool is_csv); static int CopyReadAttributesText(CopyFromState cstate); static int CopyReadAttributesCSV(CopyFromState cstate); static Datum CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo, @@ -741,8 +741,8 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes) * * NOTE: force_not_null option are not applied to the returned fields. */ -bool -NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields) +static pg_attribute_always_inline bool +NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields, bool is_csv) { int fldct; bool done; @@ -759,13 +759,17 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields) tupDesc = RelationGetDescr(cstate->rel); cstate->cur_lineno++; - done = CopyReadLine(cstate); + done = CopyReadLine(cstate, is_csv); if (cstate->opts.header_line == COPY_HEADER_MATCH) { int fldnum; - if (cstate->opts.csv_mode) + /* + * is_csv will be optimized away by compiler, as argument is + * constant at caller. + */ + if (is_csv) fldct = CopyReadAttributesCSV(cstate); else fldct = CopyReadAttributesText(cstate); @@ -809,7 +813,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields) cstate->cur_lineno++; /* Actually read the line into memory here */ - done = CopyReadLine(cstate); + done = CopyReadLine(cstate, is_csv); /* * EOF at start of line means we're done. If we see EOF after some @@ -819,8 +823,13 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields) if (done && cstate->line_buf.len == 0) return false; - /* Parse the line into de-escaped field values */ - if (cstate->opts.csv_mode) + /* + * Parse the line into de-escaped field values + * + * is_csv will be optimized away by compiler, as argument is constant at + * caller. + */ + if (is_csv) fldct = CopyReadAttributesCSV(cstate); else fldct = CopyReadAttributesText(cstate); @@ -830,6 +839,267 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields) return true; } +/* + * CopyFromTextLikeOneRow + * + * Copy one row to a set of `values` and `nulls` for the text and CSV + * formats. + * + * Workhorse for CopyFromTextOneRow() and CopyFromCSVOneRow(). + */ +static pg_attribute_always_inline bool +CopyFromTextLikeOneRow(CopyFromState cstate, + ExprContext *econtext, + Datum *values, + bool *nulls, + bool is_csv) +{ + TupleDesc tupDesc; + AttrNumber attr_count; + FmgrInfo *in_functions = cstate->in_functions; + Oid *typioparams = cstate->typioparams; + ExprState **defexprs = cstate->defexprs; + char **field_strings; + ListCell *cur; + int fldct; + int fieldno; + char *string; + + tupDesc = RelationGetDescr(cstate->rel); + attr_count = list_length(cstate->attnumlist); + + /* read raw fields in the next line */ + if (!NextCopyFromRawFields(cstate, &field_strings, &fldct, is_csv)) + return false; + + /* check for overflowing fields */ + if (attr_count > 0 && fldct > attr_count) + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("extra data after last expected column"))); + + fieldno = 0; + + /* Loop to read the user attributes on the line. */ + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + int m = attnum - 1; + Form_pg_attribute att = TupleDescAttr(tupDesc, m); + + if (fieldno >= fldct) + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("missing data for column \"%s\"", + NameStr(att->attname)))); + string = field_strings[fieldno++]; + + if (cstate->convert_select_flags && + !cstate->convert_select_flags[m]) + { + /* ignore input field, leaving column as NULL */ + continue; + } + + if (is_csv) + { + if (string == NULL && + cstate->opts.force_notnull_flags[m]) + { + /* + * FORCE_NOT_NULL option is set and column is NULL - convert + * it to the NULL string. + */ + string = cstate->opts.null_print; + } + else if (string != NULL && cstate->opts.force_null_flags[m] + && strcmp(string, cstate->opts.null_print) == 0) + { + /* + * FORCE_NULL option is set and column matches the NULL + * string. It must have been quoted, or otherwise the string + * would already have been set to NULL. Convert it to NULL as + * specified. + */ + string = NULL; + } + } + + cstate->cur_attname = NameStr(att->attname); + cstate->cur_attval = string; + + if (string != NULL) + nulls[m] = false; + + if (cstate->defaults[m]) + { + /* + * The caller must supply econtext and have switched into the + * per-tuple memory context in it. + */ + Assert(econtext != NULL); + Assert(CurrentMemoryContext == econtext->ecxt_per_tuple_memory); + + values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]); + } + + /* + * If ON_ERROR is specified with IGNORE, skip rows with soft errors + */ + else if (!InputFunctionCallSafe(&in_functions[m], + string, + typioparams[m], + att->atttypmod, + (Node *) cstate->escontext, + &values[m])) + { + Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP); + + cstate->num_errors++; + + if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE) + { + /* + * Since we emit line number and column info in the below + * notice message, we suppress error context information other + * than the relation name. + */ + Assert(!cstate->relname_only); + cstate->relname_only = true; + + if (cstate->cur_attval) + { + char *attval; + + attval = CopyLimitPrintoutLength(cstate->cur_attval); + ereport(NOTICE, + errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": \"%s\"", + (unsigned long long) cstate->cur_lineno, + cstate->cur_attname, + attval)); + pfree(attval); + } + else + ereport(NOTICE, + errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": null input", + (unsigned long long) cstate->cur_lineno, + cstate->cur_attname)); + + /* reset relname_only */ + cstate->relname_only = false; + } + + return true; + } + + cstate->cur_attname = NULL; + cstate->cur_attval = NULL; + } + + Assert(fieldno == attr_count); + + return true; +} + + +/* + * CopyFromTextOneRow + * + * Per-row callback for COPY FROM with text format. + */ +bool +CopyFromTextOneRow(CopyFromState cstate, + ExprContext *econtext, + Datum *values, + bool *nulls) +{ + return CopyFromTextLikeOneRow(cstate, econtext, values, nulls, false); +} + +/* + * CopyFromCSVOneRow + * + * Per-row callback for COPY FROM with CSV format. + */ +bool +CopyFromCSVOneRow(CopyFromState cstate, + ExprContext *econtext, + Datum *values, + bool *nulls) +{ + return CopyFromTextLikeOneRow(cstate, econtext, values, nulls, true); +} + +/* + * CopyFromBinaryOneRow + * + * Copy one row to a set of `values` and `nulls` for the binary format. + */ +bool +CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext, + Datum *values, bool *nulls) +{ + TupleDesc tupDesc; + AttrNumber attr_count; + FmgrInfo *in_functions = cstate->in_functions; + Oid *typioparams = cstate->typioparams; + int16 fld_count; + ListCell *cur; + + tupDesc = RelationGetDescr(cstate->rel); + attr_count = list_length(cstate->attnumlist); + + cstate->cur_lineno++; + + if (!CopyGetInt16(cstate, &fld_count)) + { + /* EOF detected (end of file, or protocol-level EOF) */ + return false; + } + + if (fld_count == -1) + { + /* + * Received EOF marker. Wait for the protocol-level EOF, and complain + * if it doesn't come immediately. In COPY FROM STDIN, this ensures + * that we correctly handle CopyFail, if client chooses to send that + * now. When copying from file, we could ignore the rest of the file + * like in text mode, but we choose to be consistent with the COPY + * FROM STDIN case. + */ + char dummy; + + if (CopyReadBinaryData(cstate, &dummy, 1) > 0) + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("received copy data after EOF marker"))); + return false; + } + + if (fld_count != attr_count) + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("row field count is %d, expected %d", + (int) fld_count, attr_count))); + + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + int m = attnum - 1; + Form_pg_attribute att = TupleDescAttr(tupDesc, m); + + cstate->cur_attname = NameStr(att->attname); + values[m] = CopyReadBinaryAttribute(cstate, + &in_functions[m], + typioparams[m], + att->atttypmod, + &nulls[m]); + cstate->cur_attname = NULL; + } + + return true; +} + /* * Read next tuple from file for COPY FROM. Return false if no more tuples. * @@ -847,221 +1117,21 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext, { TupleDesc tupDesc; AttrNumber num_phys_attrs, - attr_count, num_defaults = cstate->num_defaults; - FmgrInfo *in_functions = cstate->in_functions; - Oid *typioparams = cstate->typioparams; int i; int *defmap = cstate->defmap; ExprState **defexprs = cstate->defexprs; tupDesc = RelationGetDescr(cstate->rel); num_phys_attrs = tupDesc->natts; - attr_count = list_length(cstate->attnumlist); /* Initialize all values for row to NULL */ MemSet(values, 0, num_phys_attrs * sizeof(Datum)); MemSet(nulls, true, num_phys_attrs * sizeof(bool)); MemSet(cstate->defaults, false, num_phys_attrs * sizeof(bool)); - if (!cstate->opts.binary) - { - char **field_strings; - ListCell *cur; - int fldct; - int fieldno; - char *string; - - /* read raw fields in the next line */ - if (!NextCopyFromRawFields(cstate, &field_strings, &fldct)) - return false; - - /* check for overflowing fields */ - if (attr_count > 0 && fldct > attr_count) - ereport(ERROR, - (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - errmsg("extra data after last expected column"))); - - fieldno = 0; - - /* Loop to read the user attributes on the line. */ - foreach(cur, cstate->attnumlist) - { - int attnum = lfirst_int(cur); - int m = attnum - 1; - Form_pg_attribute att = TupleDescAttr(tupDesc, m); - - if (fieldno >= fldct) - ereport(ERROR, - (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - errmsg("missing data for column \"%s\"", - NameStr(att->attname)))); - string = field_strings[fieldno++]; - - if (cstate->convert_select_flags && - !cstate->convert_select_flags[m]) - { - /* ignore input field, leaving column as NULL */ - continue; - } - - if (cstate->opts.csv_mode) - { - if (string == NULL && - cstate->opts.force_notnull_flags[m]) - { - /* - * FORCE_NOT_NULL option is set and column is NULL - - * convert it to the NULL string. - */ - string = cstate->opts.null_print; - } - else if (string != NULL && cstate->opts.force_null_flags[m] - && strcmp(string, cstate->opts.null_print) == 0) - { - /* - * FORCE_NULL option is set and column matches the NULL - * string. It must have been quoted, or otherwise the - * string would already have been set to NULL. Convert it - * to NULL as specified. - */ - string = NULL; - } - } - - cstate->cur_attname = NameStr(att->attname); - cstate->cur_attval = string; - - if (string != NULL) - nulls[m] = false; - - if (cstate->defaults[m]) - { - /* - * The caller must supply econtext and have switched into the - * per-tuple memory context in it. - */ - Assert(econtext != NULL); - Assert(CurrentMemoryContext == econtext->ecxt_per_tuple_memory); - - values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]); - } - - /* - * If ON_ERROR is specified with IGNORE, skip rows with soft - * errors - */ - else if (!InputFunctionCallSafe(&in_functions[m], - string, - typioparams[m], - att->atttypmod, - (Node *) cstate->escontext, - &values[m])) - { - Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP); - - cstate->num_errors++; - - if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE) - { - /* - * Since we emit line number and column info in the below - * notice message, we suppress error context information - * other than the relation name. - */ - Assert(!cstate->relname_only); - cstate->relname_only = true; - - if (cstate->cur_attval) - { - char *attval; - - attval = CopyLimitPrintoutLength(cstate->cur_attval); - ereport(NOTICE, - errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": \"%s\"", - (unsigned long long) cstate->cur_lineno, - cstate->cur_attname, - attval)); - pfree(attval); - } - else - ereport(NOTICE, - errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": nullinput", - (unsigned long long) cstate->cur_lineno, - cstate->cur_attname)); - - /* reset relname_only */ - cstate->relname_only = false; - } - - return true; - } - - cstate->cur_attname = NULL; - cstate->cur_attval = NULL; - } - - Assert(fieldno == attr_count); - } - else if (cstate->routine) - { - if (!cstate->routine->CopyFromOneRow(cstate, econtext, values, nulls)) - return false; - } - else - { - /* binary */ - int16 fld_count; - ListCell *cur; - - cstate->cur_lineno++; - - if (!CopyGetInt16(cstate, &fld_count)) - { - /* EOF detected (end of file, or protocol-level EOF) */ - return false; - } - - if (fld_count == -1) - { - /* - * Received EOF marker. Wait for the protocol-level EOF, and - * complain if it doesn't come immediately. In COPY FROM STDIN, - * this ensures that we correctly handle CopyFail, if client - * chooses to send that now. When copying from file, we could - * ignore the rest of the file like in text mode, but we choose to - * be consistent with the COPY FROM STDIN case. - */ - char dummy; - - if (CopyReadBinaryData(cstate, &dummy, 1) > 0) - ereport(ERROR, - (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - errmsg("received copy data after EOF marker"))); - return false; - } - - if (fld_count != attr_count) - ereport(ERROR, - (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - errmsg("row field count is %d, expected %d", - (int) fld_count, attr_count))); - - foreach(cur, cstate->attnumlist) - { - int attnum = lfirst_int(cur); - int m = attnum - 1; - Form_pg_attribute att = TupleDescAttr(tupDesc, m); - - cstate->cur_attname = NameStr(att->attname); - values[m] = CopyReadBinaryAttribute(cstate, - &in_functions[m], - typioparams[m], - att->atttypmod, - &nulls[m]); - cstate->cur_attname = NULL; - } - } + if (!cstate->routine->CopyFromOneRow(cstate, econtext, values, nulls)) + return false; /* * Now compute and insert any defaults available for the columns not @@ -1092,7 +1162,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext, * in the final value of line_buf. */ static bool -CopyReadLine(CopyFromState cstate) +CopyReadLine(CopyFromState cstate, bool is_csv) { bool result; @@ -1100,7 +1170,7 @@ CopyReadLine(CopyFromState cstate) cstate->line_buf_valid = false; /* Parse data and transfer into line_buf */ - result = CopyReadLineText(cstate); + result = CopyReadLineText(cstate, is_csv); if (result) { @@ -1167,8 +1237,8 @@ CopyReadLine(CopyFromState cstate) /* * CopyReadLineText - inner loop of CopyReadLine for text mode */ -static bool -CopyReadLineText(CopyFromState cstate) +static pg_attribute_always_inline bool +CopyReadLineText(CopyFromState cstate, bool is_csv) { char *copy_input_buf; int input_buf_ptr; @@ -1183,7 +1253,11 @@ CopyReadLineText(CopyFromState cstate) char quotec = '\0'; char escapec = '\0'; - if (cstate->opts.csv_mode) + /* + * is_csv will be optimized away by compiler, as argument is constant at + * caller. + */ + if (is_csv) { quotec = cstate->opts.quote[0]; escapec = cstate->opts.escape[0]; @@ -1260,7 +1334,11 @@ CopyReadLineText(CopyFromState cstate) prev_raw_ptr = input_buf_ptr; c = copy_input_buf[input_buf_ptr++]; - if (cstate->opts.csv_mode) + /* + * is_csv will be optimized away by compiler, as argument is constant + * at caller. + */ + if (is_csv) { /* * If character is '\r', we may need to look ahead below. Force @@ -1299,7 +1377,7 @@ CopyReadLineText(CopyFromState cstate) } /* Process \r */ - if (c == '\r' && (!cstate->opts.csv_mode || !in_quote)) + if (c == '\r' && (!is_csv || !in_quote)) { /* Check for \r\n on first line, _and_ handle \r\n. */ if (cstate->eol_type == EOL_UNKNOWN || @@ -1327,10 +1405,10 @@ CopyReadLineText(CopyFromState cstate) if (cstate->eol_type == EOL_CRNL) ereport(ERROR, (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - !cstate->opts.csv_mode ? + !is_csv ? errmsg("literal carriage return found in data") : errmsg("unquoted carriage return found in data"), - !cstate->opts.csv_mode ? + !is_csv ? errhint("Use \"\\r\" to represent carriage return.") : errhint("Use quoted CSV field to represent carriage return."))); @@ -1344,10 +1422,10 @@ CopyReadLineText(CopyFromState cstate) else if (cstate->eol_type == EOL_NL) ereport(ERROR, (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - !cstate->opts.csv_mode ? + !is_csv ? errmsg("literal carriage return found in data") : errmsg("unquoted carriage return found in data"), - !cstate->opts.csv_mode ? + !is_csv ? errhint("Use \"\\r\" to represent carriage return.") : errhint("Use quoted CSV field to represent carriage return."))); /* If reach here, we have found the line terminator */ @@ -1355,15 +1433,15 @@ CopyReadLineText(CopyFromState cstate) } /* Process \n */ - if (c == '\n' && (!cstate->opts.csv_mode || !in_quote)) + if (c == '\n' && (!is_csv || !in_quote)) { if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL) ereport(ERROR, (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - !cstate->opts.csv_mode ? + !is_csv ? errmsg("literal newline found in data") : errmsg("unquoted newline found in data"), - !cstate->opts.csv_mode ? + !is_csv ? errhint("Use \"\\n\" to represent newline.") : errhint("Use quoted CSV field to represent newline."))); cstate->eol_type = EOL_NL; /* in case not set yet */ @@ -1375,7 +1453,7 @@ CopyReadLineText(CopyFromState cstate) * Process backslash, except in CSV mode where backslash is a normal * character. */ - if (c == '\\' && !cstate->opts.csv_mode) + if (c == '\\' && !is_csv) { char c2; diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c index 405e1782685..46f3507a8b5 100644 --- a/src/backend/commands/copyto.c +++ b/src/backend/commands/copyto.c @@ -128,6 +128,317 @@ static void CopySendEndOfRow(CopyToState cstate); static void CopySendInt32(CopyToState cstate, int32 val); static void CopySendInt16(CopyToState cstate, int16 val); +/* + * CopyToRoutine implementations. + */ + +/* + * CopyToTextLikeSendEndOfRow + * + * Apply line terminations for a line sent in text or CSV format depending + * on the destination, then send the end of a row. + */ +static pg_attribute_always_inline void +CopyToTextLikeSendEndOfRow(CopyToState cstate) +{ + switch (cstate->copy_dest) + { + case COPY_FILE: + /* Default line termination depends on platform */ +#ifndef WIN32 + CopySendChar(cstate, '\n'); +#else + CopySendString(cstate, "\r\n"); +#endif + break; + case COPY_FRONTEND: + /* The FE/BE protocol uses \n as newline for all platforms */ + CopySendChar(cstate, '\n'); + break; + default: + break; + } + + /* Now take the actions related to the end of a row */ + CopySendEndOfRow(cstate); +} + +/* + * CopyToTextLikeStart + * + * Start of COPY TO for text and CSV format. + */ +static void +CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc) +{ + /* + * For non-binary copy, we need to convert null_print to file encoding, + * because it will be sent directly with CopySendString. + */ + if (cstate->need_transcoding) + cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print, + cstate->opts.null_print_len, + cstate->file_encoding); + + /* if a header has been requested send the line */ + if (cstate->opts.header_line) + { + ListCell *cur; + bool hdr_delim = false; + + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + char *colname; + + if (hdr_delim) + CopySendChar(cstate, cstate->opts.delim[0]); + hdr_delim = true; + + colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname); + + if (cstate->opts.csv_mode) + CopyAttributeOutCSV(cstate, colname, false); + else + CopyAttributeOutText(cstate, colname); + } + + CopyToTextLikeSendEndOfRow(cstate); + } +} + +/* + * CopyToTextLikeOutFunc + * + * Assign output function data for a relation's attribute in text/CSV format. + */ +static void +CopyToTextLikeOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo) +{ + Oid func_oid; + bool is_varlena; + + /* Set output function for an attribute */ + getTypeOutputInfo(atttypid, &func_oid, &is_varlena); + fmgr_info(func_oid, finfo); +} + + +/* + * CopyToTextLikeOneRow + * + * Process one row for text/CSV format. + * + * Workhorse for CopyToTextOneRow() and CopyToCSVOneRow(). + */ +static pg_attribute_always_inline void +CopyToTextLikeOneRow(CopyToState cstate, + TupleTableSlot *slot, + bool is_csv) +{ + bool need_delim = false; + FmgrInfo *out_functions = cstate->out_functions; + + foreach_int(attnum, cstate->attnumlist) + { + Datum value = slot->tts_values[attnum - 1]; + bool isnull = slot->tts_isnull[attnum - 1]; + + if (need_delim) + CopySendChar(cstate, cstate->opts.delim[0]); + need_delim = true; + + if (isnull) + { + CopySendString(cstate, cstate->opts.null_print_client); + } + else + { + char *string; + + string = OutputFunctionCall(&out_functions[attnum - 1], + value); + + /* + * is_csv will be optimized away by compiler, as argument is + * constant at caller. + */ + if (is_csv) + CopyAttributeOutCSV(cstate, string, + cstate->opts.force_quote_flags[attnum - 1]); + else + CopyAttributeOutText(cstate, string); + } + } + + CopyToTextLikeSendEndOfRow(cstate); +} + +/* + * CopyToTextOneRow + * + * Per-row callback for COPY TO with text format. + */ +static void +CopyToTextOneRow(CopyToState cstate, TupleTableSlot *slot) +{ + CopyToTextLikeOneRow(cstate, slot, false); +} + +/* + * CopyToTextOneRow + * + * Per-row callback for COPY TO with CSV format. + */ +static void +CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot) +{ + CopyToTextLikeOneRow(cstate, slot, true); +} + +/* + * CopyToTextLikeEnd + * + * End of COPY TO for text/CSV format. + */ +static void +CopyToTextLikeEnd(CopyToState cstate) +{ + /* Nothing to do here */ +} + +/* + * CopyToRoutine implementation for "binary". + */ + +/* + * CopyToBinaryStart + * + * Start of COPY TO for binary format. + */ +static void +CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc) +{ + /* Generate header for a binary copy */ + int32 tmp; + + /* Signature */ + CopySendData(cstate, BinarySignature, 11); + /* Flags field */ + tmp = 0; + CopySendInt32(cstate, tmp); + /* No header extension */ + tmp = 0; + CopySendInt32(cstate, tmp); +} + +/* + * CopyToBinaryOutFunc + * + * Assign output function data for a relation's attribute in binary format. + */ +static void +CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo) +{ + Oid func_oid; + bool is_varlena; + + /* Set output function for an attribute */ + getTypeBinaryOutputInfo(atttypid, &func_oid, &is_varlena); + fmgr_info(func_oid, finfo); +} + +/* + * CopyToBinaryOneRow + * + * Process one row for binary format. + */ +static void +CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot) +{ + FmgrInfo *out_functions = cstate->out_functions; + + /* Binary per-tuple header */ + CopySendInt16(cstate, list_length(cstate->attnumlist)); + + foreach_int(attnum, cstate->attnumlist) + { + Datum value = slot->tts_values[attnum - 1]; + bool isnull = slot->tts_isnull[attnum - 1]; + + if (isnull) + { + CopySendInt32(cstate, -1); + } + else + { + bytea *outputbytes; + + outputbytes = SendFunctionCall(&out_functions[attnum - 1], + value); + CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ); + CopySendData(cstate, VARDATA(outputbytes), + VARSIZE(outputbytes) - VARHDRSZ); + } + } + + CopySendEndOfRow(cstate); +} + +/* + * CopyToBinaryEnd + * + * End of COPY TO for binary format. + */ +static void +CopyToBinaryEnd(CopyToState cstate) +{ + /* Generate trailer for a binary copy */ + CopySendInt16(cstate, -1); + /* Need to flush out the trailer */ + CopySendEndOfRow(cstate); +} + +/* + * CSV and text share the same implementation, at the exception of the + * output representation and per-row callbacks. + */ +static const CopyToRoutine CopyToRoutineText = { + .CopyToStart = CopyToTextLikeStart, + .CopyToOutFunc = CopyToTextLikeOutFunc, + .CopyToOneRow = CopyToTextOneRow, + .CopyToEnd = CopyToTextLikeEnd, +}; + +static const CopyToRoutine CopyToRoutineCSV = { + .CopyToStart = CopyToTextLikeStart, + .CopyToOutFunc = CopyToTextLikeOutFunc, + .CopyToOneRow = CopyToCSVOneRow, + .CopyToEnd = CopyToTextLikeEnd, +}; + +static const CopyToRoutine CopyToRoutineBinary = { + .CopyToStart = CopyToBinaryStart, + .CopyToOutFunc = CopyToBinaryOutFunc, + .CopyToOneRow = CopyToBinaryOneRow, + .CopyToEnd = CopyToBinaryEnd, +}; + +/* + * Define the COPY TO routines to use for a format. This should be called + * after options are parsed. + */ +static const CopyToRoutine * +CopyToGetRoutine(CopyFormatOptions opts) +{ + if (opts.csv_mode) + return &CopyToRoutineCSV; + else if (opts.binary) + return &CopyToRoutineBinary; + + /* default is text */ + return &CopyToRoutineText; +} /* * Send copy start/stop messages for frontend copies. These have changed @@ -195,16 +506,6 @@ CopySendEndOfRow(CopyToState cstate) switch (cstate->copy_dest) { case COPY_FILE: - if (!cstate->opts.binary) - { - /* Default line termination depends on platform */ -#ifndef WIN32 - CopySendChar(cstate, '\n'); -#else - CopySendString(cstate, "\r\n"); -#endif - } - if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1, cstate->copy_file) != 1 || ferror(cstate->copy_file)) @@ -239,10 +540,6 @@ CopySendEndOfRow(CopyToState cstate) } break; case COPY_FRONTEND: - /* The FE/BE protocol uses \n as newline for all platforms */ - if (!cstate->opts.binary) - CopySendChar(cstate, '\n'); - /* Dump the accumulated row as one CopyData message */ (void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len); break; @@ -430,6 +727,9 @@ BeginCopyTo(ParseState *pstate, /* Extract options from the statement node tree */ ProcessCopyOptions(pstate, &cstate->opts, false /* is_from */ , options); + /* Set format routine */ + cstate->routine = CopyToGetRoutine(cstate->opts); + /* Process the source/target relation or query */ if (rel) { @@ -775,27 +1075,10 @@ DoCopyTo(CopyToState cstate) foreach(cur, cstate->attnumlist) { int attnum = lfirst_int(cur); - Oid out_func_oid; - bool isvarlena; Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1); - if (cstate->opts.binary) - { - getTypeBinaryOutputInfo(attr->atttypid, - &out_func_oid, - &isvarlena); - fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]); - } - else if (cstate->routine) - cstate->routine->CopyToOutFunc(cstate, attr->atttypid, - &cstate->out_functions[attnum - 1]); - else - { - getTypeOutputInfo(attr->atttypid, - &out_func_oid, - &isvarlena); - fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]); - } + cstate->routine->CopyToOutFunc(cstate, attr->atttypid, + &cstate->out_functions[attnum - 1]); } /* @@ -808,58 +1091,7 @@ DoCopyTo(CopyToState cstate) "COPY TO", ALLOCSET_DEFAULT_SIZES); - if (cstate->opts.binary) - { - /* Generate header for a binary copy */ - int32 tmp; - - /* Signature */ - CopySendData(cstate, BinarySignature, 11); - /* Flags field */ - tmp = 0; - CopySendInt32(cstate, tmp); - /* No header extension */ - tmp = 0; - CopySendInt32(cstate, tmp); - } - else if (cstate->routine) - cstate->routine->CopyToStart(cstate, tupDesc); - else - { - /* - * For non-binary copy, we need to convert null_print to file - * encoding, because it will be sent directly with CopySendString. - */ - if (cstate->need_transcoding) - cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print, - cstate->opts.null_print_len, - cstate->file_encoding); - - /* if a header has been requested send the line */ - if (cstate->opts.header_line) - { - bool hdr_delim = false; - - foreach(cur, cstate->attnumlist) - { - int attnum = lfirst_int(cur); - char *colname; - - if (hdr_delim) - CopySendChar(cstate, cstate->opts.delim[0]); - hdr_delim = true; - - colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname); - - if (cstate->opts.csv_mode) - CopyAttributeOutCSV(cstate, colname, false); - else - CopyAttributeOutText(cstate, colname); - } - - CopySendEndOfRow(cstate); - } - } + cstate->routine->CopyToStart(cstate, tupDesc); if (cstate->rel) { @@ -898,15 +1130,7 @@ DoCopyTo(CopyToState cstate) processed = ((DR_copy *) cstate->queryDesc->dest)->processed; } - if (cstate->opts.binary) - { - /* Generate trailer for a binary copy */ - CopySendInt16(cstate, -1); - /* Need to flush out the trailer */ - CopySendEndOfRow(cstate); - } - else if (cstate->routine) - cstate->routine->CopyToEnd(cstate); + cstate->routine->CopyToEnd(cstate); MemoryContextDelete(cstate->rowcontext); @@ -922,7 +1146,6 @@ DoCopyTo(CopyToState cstate) static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot) { - FmgrInfo *out_functions = cstate->out_functions; MemoryContext oldcontext; MemoryContextReset(cstate->rowcontext); @@ -931,69 +1154,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot) /* Make sure the tuple is fully deconstructed */ slot_getallattrs(slot); - if (cstate->routine) - { - cstate->routine->CopyToOneRow(cstate, slot); - MemoryContextSwitchTo(oldcontext); - return; - } - - if (cstate->opts.binary) - { - /* Binary per-tuple header */ - CopySendInt16(cstate, list_length(cstate->attnumlist)); - } - - if (!cstate->opts.binary) - { - bool need_delim = false; - - foreach_int(attnum, cstate->attnumlist) - { - Datum value = slot->tts_values[attnum - 1]; - bool isnull = slot->tts_isnull[attnum - 1]; - char *string; - - if (need_delim) - CopySendChar(cstate, cstate->opts.delim[0]); - need_delim = true; - - if (isnull) - CopySendString(cstate, cstate->opts.null_print_client); - else - { - string = OutputFunctionCall(&out_functions[attnum - 1], - value); - if (cstate->opts.csv_mode) - CopyAttributeOutCSV(cstate, string, - cstate->opts.force_quote_flags[attnum - 1]); - else - CopyAttributeOutText(cstate, string); - } - } - } - else - { - foreach_int(attnum, cstate->attnumlist) - { - Datum value = slot->tts_values[attnum - 1]; - bool isnull = slot->tts_isnull[attnum - 1]; - bytea *outputbytes; - - if (isnull) - CopySendInt32(cstate, -1); - else - { - outputbytes = SendFunctionCall(&out_functions[attnum - 1], - value); - CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ); - CopySendData(cstate, VARDATA(outputbytes), - VARSIZE(outputbytes) - VARHDRSZ); - } - } - } - - CopySendEndOfRow(cstate); + cstate->routine->CopyToOneRow(cstate, slot); MemoryContextSwitchTo(oldcontext); } diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h index 4002a7f5382..f2409013fba 100644 --- a/src/include/commands/copy.h +++ b/src/include/commands/copy.h @@ -107,8 +107,6 @@ extern CopyFromState BeginCopyFrom(ParseState *pstate, Relation rel, Node *where extern void EndCopyFrom(CopyFromState cstate); extern bool NextCopyFrom(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls); -extern bool NextCopyFromRawFields(CopyFromState cstate, - char ***fields, int *nfields); extern void CopyFromErrorCallback(void *arg); extern char *CopyLimitPrintoutLength(const char *str); diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h index 509b9e92a18..c11b5ff3cc0 100644 --- a/src/include/commands/copyfrom_internal.h +++ b/src/include/commands/copyfrom_internal.h @@ -187,4 +187,12 @@ typedef struct CopyFromStateData extern void ReceiveCopyBegin(CopyFromState cstate); extern void ReceiveCopyBinaryHeader(CopyFromState cstate); +/* Callbacks for CopyFromRoutine->CopyFromOneRow */ +extern bool CopyFromTextOneRow(CopyFromState cstate, ExprContext *econtext, + Datum *values, bool *nulls); +extern bool CopyFromCSVOneRow(CopyFromState cstate, ExprContext *econtext, + Datum *values, bool *nulls); +extern bool CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext, + Datum *values, bool *nulls); + #endif /* COPYFROM_INTERNAL_H */ -- 2.45.2 From 328bb34d626fdbcc2cb2e2013c46e24e6123faef Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Tue, 23 Jul 2024 17:39:41 +0900 Subject: [PATCH v22 3/5] Add support for adding custom COPY TO/FROM format This uses the handler approach like tablesample. The approach creates an internal function that returns an internal struct. In this case, a COPY TO handler returns a CopyToRoutine and a COPY FROM handler returns a CopyFromRoutine. This uses the same handler for COPY TO and COPY FROM. PostgreSQL calls a COPY TO/FROM handler with "is_from" argument. It's true for COPY FROM and false for COPY TO: copy_handler(true) returns CopyToRoutine copy_handler(false) returns CopyFromRoutine This also add a test module for custom COPY TO/FROM handler. --- src/backend/commands/copy.c | 96 ++++++++++++++--- src/backend/commands/copyfrom.c | 4 +- src/backend/commands/copyto.c | 4 +- src/backend/nodes/Makefile | 1 + src/backend/nodes/gen_node_support.pl | 2 + src/backend/utils/adt/pseudotypes.c | 1 + src/include/catalog/pg_proc.dat | 6 ++ src/include/catalog/pg_type.dat | 6 ++ src/include/commands/copy.h | 2 + src/include/commands/copyapi.h | 4 + src/include/nodes/meson.build | 1 + src/test/modules/Makefile | 1 + src/test/modules/meson.build | 1 + src/test/modules/test_copy_format/.gitignore | 4 + src/test/modules/test_copy_format/Makefile | 23 ++++ .../expected/test_copy_format.out | 21 ++++ src/test/modules/test_copy_format/meson.build | 33 ++++++ .../test_copy_format/sql/test_copy_format.sql | 6 ++ .../test_copy_format--1.0.sql | 8 ++ .../test_copy_format/test_copy_format.c | 100 ++++++++++++++++++ .../test_copy_format/test_copy_format.control | 4 + 21 files changed, 313 insertions(+), 15 deletions(-) create mode 100644 src/test/modules/test_copy_format/.gitignore create mode 100644 src/test/modules/test_copy_format/Makefile create mode 100644 src/test/modules/test_copy_format/expected/test_copy_format.out create mode 100644 src/test/modules/test_copy_format/meson.build create mode 100644 src/test/modules/test_copy_format/sql/test_copy_format.sql create mode 100644 src/test/modules/test_copy_format/test_copy_format--1.0.sql create mode 100644 src/test/modules/test_copy_format/test_copy_format.c create mode 100644 src/test/modules/test_copy_format/test_copy_format.control diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index 3485ba8663f..c8643b2dee7 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -32,6 +32,7 @@ #include "parser/parse_coerce.h" #include "parser/parse_collate.h" #include "parser/parse_expr.h" +#include "parser/parse_func.h" #include "parser/parse_relation.h" #include "utils/acl.h" #include "utils/builtins.h" @@ -462,6 +463,87 @@ defGetCopyLogVerbosityChoice(DefElem *def, ParseState *pstate) return COPY_LOG_VERBOSITY_DEFAULT; /* keep compiler quiet */ } +/* + * Process the "format" option. + * + * This function checks whether the option value is a built-in format such as + * "text" and "csv" or not. If the option value isn't a built-in format, this + * function finds a COPY format handler that returns a CopyToRoutine (for + * is_from == false) or CopyFromRountine (for is_from == true). If no COPY + * format handler is found, this function reports an error. + */ +static void +ProcessCopyOptionFormat(ParseState *pstate, + CopyFormatOptions *opts_out, + bool is_from, + DefElem *defel) +{ + char *format; + Oid funcargtypes[1]; + Oid handlerOid = InvalidOid; + Datum datum; + Node *routine; + + format = defGetString(defel); + + /* built-in formats */ + if (strcmp(format, "text") == 0) + /* default format */ return; + else if (strcmp(format, "csv") == 0) + { + opts_out->csv_mode = true; + return; + } + else if (strcmp(format, "binary") == 0) + { + opts_out->binary = true; + return; + } + + /* custom format */ + funcargtypes[0] = INTERNALOID; + handlerOid = LookupFuncName(list_make1(makeString(format)), 1, + funcargtypes, true); + if (!OidIsValid(handlerOid)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("COPY format \"%s\" not recognized", format), + parser_errposition(pstate, defel->location))); + + datum = OidFunctionCall1(handlerOid, BoolGetDatum(is_from)); + routine = (Node *) DatumGetPointer(datum); + if (is_from) + { + if (routine == NULL || !IsA(routine, CopyFromRoutine)) + ereport( + ERROR, + (errcode( + ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("COPY handler function " + "%s(%u) did not return a " + "CopyFromRoutine struct", + format, handlerOid), + parser_errposition( + pstate, defel->location))); + } + else + { + if (routine == NULL || !IsA(routine, CopyToRoutine)) + ereport( + ERROR, + (errcode( + ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("COPY handler function " + "%s(%u) did not return a " + "CopyToRoutine struct", + format, handlerOid), + parser_errposition( + pstate, defel->location))); + } + + opts_out->routine = routine; +} + /* * Process the statement option list for COPY. * @@ -505,22 +587,10 @@ ProcessCopyOptions(ParseState *pstate, if (strcmp(defel->defname, "format") == 0) { - char *fmt = defGetString(defel); - if (format_specified) errorConflictingDefElem(defel, pstate); format_specified = true; - if (strcmp(fmt, "text") == 0) - /* default format */ ; - else if (strcmp(fmt, "csv") == 0) - opts_out->csv_mode = true; - else if (strcmp(fmt, "binary") == 0) - opts_out->binary = true; - else - ereport(ERROR, - (errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("COPY format \"%s\" not recognized", fmt), - parser_errposition(pstate, defel->location))); + ProcessCopyOptionFormat(pstate, opts_out, is_from, defel); } else if (strcmp(defel->defname, "freeze") == 0) { diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index e6ea9ce1602..932f1ff4f6e 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -247,7 +247,9 @@ static const CopyFromRoutine CopyFromRoutineBinary = { static const CopyFromRoutine * CopyFromGetRoutine(CopyFormatOptions opts) { - if (opts.csv_mode) + if (opts.routine) + return (const CopyFromRoutine *) opts.routine; + else if (opts.csv_mode) return &CopyFromRoutineCSV; else if (opts.binary) return &CopyFromRoutineBinary; diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c index 46f3507a8b5..1f1d2baf9be 100644 --- a/src/backend/commands/copyto.c +++ b/src/backend/commands/copyto.c @@ -431,7 +431,9 @@ static const CopyToRoutine CopyToRoutineBinary = { static const CopyToRoutine * CopyToGetRoutine(CopyFormatOptions opts) { - if (opts.csv_mode) + if (opts.routine) + return (const CopyToRoutine *) opts.routine; + else if (opts.csv_mode) return &CopyToRoutineCSV; else if (opts.binary) return &CopyToRoutineBinary; diff --git a/src/backend/nodes/Makefile b/src/backend/nodes/Makefile index 66bbad8e6e0..173ee11811c 100644 --- a/src/backend/nodes/Makefile +++ b/src/backend/nodes/Makefile @@ -49,6 +49,7 @@ node_headers = \ access/sdir.h \ access/tableam.h \ access/tsmapi.h \ + commands/copyapi.h \ commands/event_trigger.h \ commands/trigger.h \ executor/tuptable.h \ diff --git a/src/backend/nodes/gen_node_support.pl b/src/backend/nodes/gen_node_support.pl index 81df3bdf95f..428ab4f0d93 100644 --- a/src/backend/nodes/gen_node_support.pl +++ b/src/backend/nodes/gen_node_support.pl @@ -61,6 +61,7 @@ my @all_input_files = qw( access/sdir.h access/tableam.h access/tsmapi.h + commands/copyapi.h commands/event_trigger.h commands/trigger.h executor/tuptable.h @@ -85,6 +86,7 @@ my @nodetag_only_files = qw( access/sdir.h access/tableam.h access/tsmapi.h + commands/copyapi.h commands/event_trigger.h commands/trigger.h executor/tuptable.h diff --git a/src/backend/utils/adt/pseudotypes.c b/src/backend/utils/adt/pseudotypes.c index e189e9b79d2..25f24ab95d2 100644 --- a/src/backend/utils/adt/pseudotypes.c +++ b/src/backend/utils/adt/pseudotypes.c @@ -370,6 +370,7 @@ PSEUDOTYPE_DUMMY_IO_FUNCS(fdw_handler); PSEUDOTYPE_DUMMY_IO_FUNCS(table_am_handler); PSEUDOTYPE_DUMMY_IO_FUNCS(index_am_handler); PSEUDOTYPE_DUMMY_IO_FUNCS(tsm_handler); +PSEUDOTYPE_DUMMY_IO_FUNCS(copy_handler); PSEUDOTYPE_DUMMY_IO_FUNCS(internal); PSEUDOTYPE_DUMMY_IO_FUNCS(anyelement); PSEUDOTYPE_DUMMY_IO_FUNCS(anynonarray); diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index f23321a41f1..6af90a26374 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -7761,6 +7761,12 @@ { oid => '3312', descr => 'I/O', proname => 'tsm_handler_out', prorettype => 'cstring', proargtypes => 'tsm_handler', prosrc => 'tsm_handler_out' }, +{ oid => '8753', descr => 'I/O', + proname => 'copy_handler_in', proisstrict => 'f', prorettype => 'copy_handler', + proargtypes => 'cstring', prosrc => 'copy_handler_in' }, +{ oid => '8754', descr => 'I/O', + proname => 'copy_handler_out', prorettype => 'cstring', + proargtypes => 'copy_handler', prosrc => 'copy_handler_out' }, { oid => '267', descr => 'I/O', proname => 'table_am_handler_in', proisstrict => 'f', prorettype => 'table_am_handler', proargtypes => 'cstring', diff --git a/src/include/catalog/pg_type.dat b/src/include/catalog/pg_type.dat index ceff66ccde1..37ebfa0908f 100644 --- a/src/include/catalog/pg_type.dat +++ b/src/include/catalog/pg_type.dat @@ -633,6 +633,12 @@ typcategory => 'P', typinput => 'tsm_handler_in', typoutput => 'tsm_handler_out', typreceive => '-', typsend => '-', typalign => 'i' }, +{ oid => '8752', + descr => 'pseudo-type for the result of a copy to/from method function', + typname => 'copy_handler', typlen => '4', typbyval => 't', typtype => 'p', + typcategory => 'P', typinput => 'copy_handler_in', + typoutput => 'copy_handler_out', typreceive => '-', typsend => '-', + typalign => 'i' }, { oid => '269', descr => 'pseudo-type for the result of a table AM handler function', typname => 'table_am_handler', typlen => '4', typbyval => 't', typtype => 'p', diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h index f2409013fba..63f3e8e1af7 100644 --- a/src/include/commands/copy.h +++ b/src/include/commands/copy.h @@ -87,6 +87,8 @@ typedef struct CopyFormatOptions CopyLogVerbosityChoice log_verbosity; /* verbosity of logged messages */ int64 reject_limit; /* maximum tolerable number of errors */ List *convert_select; /* list of column names (can be NIL) */ + Node *routine; /* CopyToRoutine or CopyFromRoutine (can be + * NULL) */ } CopyFormatOptions; /* These are private in commands/copy[from|to].c */ diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index d1289424c67..e049a45a4b1 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -27,6 +27,8 @@ typedef struct CopyToStateData *CopyToState; */ typedef struct CopyFromRoutine { + NodeTag type; + /* * Called when COPY FROM is started to set up the input functions * associated with the relation's attributes writing to. `finfo` can be @@ -69,6 +71,8 @@ typedef struct CopyFromRoutine */ typedef struct CopyToRoutine { + NodeTag type; + /* * Called when COPY TO is started to set up the output functions * associated with the relation's attributes reading from. `finfo` can be diff --git a/src/include/nodes/meson.build b/src/include/nodes/meson.build index b665e55b657..103df1a7873 100644 --- a/src/include/nodes/meson.build +++ b/src/include/nodes/meson.build @@ -11,6 +11,7 @@ node_support_input_i = [ 'access/sdir.h', 'access/tableam.h', 'access/tsmapi.h', + 'commands/copyapi.h', 'commands/event_trigger.h', 'commands/trigger.h', 'executor/tuptable.h', diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile index c0d3cf0e14b..33e3a49a4fb 100644 --- a/src/test/modules/Makefile +++ b/src/test/modules/Makefile @@ -15,6 +15,7 @@ SUBDIRS = \ spgist_name_ops \ test_bloomfilter \ test_copy_callbacks \ + test_copy_format \ test_custom_rmgrs \ test_ddl_deparse \ test_dsa \ diff --git a/src/test/modules/meson.build b/src/test/modules/meson.build index c829b619530..75b6ab1b6a9 100644 --- a/src/test/modules/meson.build +++ b/src/test/modules/meson.build @@ -14,6 +14,7 @@ subdir('spgist_name_ops') subdir('ssl_passphrase_callback') subdir('test_bloomfilter') subdir('test_copy_callbacks') +subdir('test_copy_format') subdir('test_custom_rmgrs') subdir('test_ddl_deparse') subdir('test_dsa') diff --git a/src/test/modules/test_copy_format/.gitignore b/src/test/modules/test_copy_format/.gitignore new file mode 100644 index 00000000000..5dcb3ff9723 --- /dev/null +++ b/src/test/modules/test_copy_format/.gitignore @@ -0,0 +1,4 @@ +# Generated subdirectories +/log/ +/results/ +/tmp_check/ diff --git a/src/test/modules/test_copy_format/Makefile b/src/test/modules/test_copy_format/Makefile new file mode 100644 index 00000000000..8497f91624d --- /dev/null +++ b/src/test/modules/test_copy_format/Makefile @@ -0,0 +1,23 @@ +# src/test/modules/test_copy_format/Makefile + +MODULE_big = test_copy_format +OBJS = \ + $(WIN32RES) \ + test_copy_format.o +PGFILEDESC = "test_copy_format - test custom COPY FORMAT" + +EXTENSION = test_copy_format +DATA = test_copy_format--1.0.sql + +REGRESS = test_copy_format + +ifdef USE_PGXS +PG_CONFIG = pg_config +PGXS := $(shell $(PG_CONFIG) --pgxs) +include $(PGXS) +else +subdir = src/test/modules/test_copy_format +top_builddir = ../../../.. +include $(top_builddir)/src/Makefile.global +include $(top_srcdir)/contrib/contrib-global.mk +endif diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out b/src/test/modules/test_copy_format/expected/test_copy_format.out new file mode 100644 index 00000000000..4ed7c0b12db --- /dev/null +++ b/src/test/modules/test_copy_format/expected/test_copy_format.out @@ -0,0 +1,21 @@ +CREATE EXTENSION test_copy_format; +CREATE TABLE public.test (a smallint, b integer, c bigint); +INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789); +COPY public.test FROM stdin WITH (format 'test_copy_format'); +NOTICE: test_copy_format: is_from=true +NOTICE: CopyFromInFunc: atttypid=21 +NOTICE: CopyFromInFunc: atttypid=23 +NOTICE: CopyFromInFunc: atttypid=20 +NOTICE: CopyFromStart: natts=3 +NOTICE: CopyFromOneRow +NOTICE: CopyFromEnd +COPY public.test TO stdout WITH (format 'test_copy_format'); +NOTICE: test_copy_format: is_from=false +NOTICE: CopyToOutFunc: atttypid=21 +NOTICE: CopyToOutFunc: atttypid=23 +NOTICE: CopyToOutFunc: atttypid=20 +NOTICE: CopyToStart: natts=3 +NOTICE: CopyToOneRow: tts_nvalid=3 +NOTICE: CopyToOneRow: tts_nvalid=3 +NOTICE: CopyToOneRow: tts_nvalid=3 +NOTICE: CopyToEnd diff --git a/src/test/modules/test_copy_format/meson.build b/src/test/modules/test_copy_format/meson.build new file mode 100644 index 00000000000..4cefe7b709a --- /dev/null +++ b/src/test/modules/test_copy_format/meson.build @@ -0,0 +1,33 @@ +# Copyright (c) 2024, PostgreSQL Global Development Group + +test_copy_format_sources = files( + 'test_copy_format.c', +) + +if host_system == 'windows' + test_copy_format_sources += rc_lib_gen.process(win32ver_rc, extra_args: [ + '--NAME', 'test_copy_format', + '--FILEDESC', 'test_copy_format - test custom COPY FORMAT',]) +endif + +test_copy_format = shared_module('test_copy_format', + test_copy_format_sources, + kwargs: pg_test_mod_args, +) +test_install_libs += test_copy_format + +test_install_data += files( + 'test_copy_format.control', + 'test_copy_format--1.0.sql', +) + +tests += { + 'name': 'test_copy_format', + 'sd': meson.current_source_dir(), + 'bd': meson.current_build_dir(), + 'regress': { + 'sql': [ + 'test_copy_format', + ], + }, +} diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql b/src/test/modules/test_copy_format/sql/test_copy_format.sql new file mode 100644 index 00000000000..e805f7cb011 --- /dev/null +++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql @@ -0,0 +1,6 @@ +CREATE EXTENSION test_copy_format; +CREATE TABLE public.test (a smallint, b integer, c bigint); +INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789); +COPY public.test FROM stdin WITH (format 'test_copy_format'); +\. +COPY public.test TO stdout WITH (format 'test_copy_format'); diff --git a/src/test/modules/test_copy_format/test_copy_format--1.0.sql b/src/test/modules/test_copy_format/test_copy_format--1.0.sql new file mode 100644 index 00000000000..d24ea03ce99 --- /dev/null +++ b/src/test/modules/test_copy_format/test_copy_format--1.0.sql @@ -0,0 +1,8 @@ +/* src/test/modules/test_copy_format/test_copy_format--1.0.sql */ + +-- complain if script is sourced in psql, rather than via CREATE EXTENSION +\echo Use "CREATE EXTENSION test_copy_format" to load this file. \quit + +CREATE FUNCTION test_copy_format(internal) + RETURNS copy_handler + AS 'MODULE_PATHNAME' LANGUAGE C; diff --git a/src/test/modules/test_copy_format/test_copy_format.c b/src/test/modules/test_copy_format/test_copy_format.c new file mode 100644 index 00000000000..f6b105659ab --- /dev/null +++ b/src/test/modules/test_copy_format/test_copy_format.c @@ -0,0 +1,100 @@ +/*-------------------------------------------------------------------------- + * + * test_copy_format.c + * Code for testing custom COPY format. + * + * Portions Copyright (c) 2024, PostgreSQL Global Development Group + * + * IDENTIFICATION + * src/test/modules/test_copy_format/test_copy_format.c + * + * ------------------------------------------------------------------------- + */ + +#include "postgres.h" + +#include "commands/copyapi.h" +#include "commands/defrem.h" + +PG_MODULE_MAGIC; + +static void +CopyFromInFunc(CopyFromState cstate, Oid atttypid, + FmgrInfo *finfo, Oid *typioparam) +{ + ereport(NOTICE, (errmsg("CopyFromInFunc: atttypid=%d", atttypid))); +} + +static void +CopyFromStart(CopyFromState cstate, TupleDesc tupDesc) +{ + ereport(NOTICE, (errmsg("CopyFromStart: natts=%d", tupDesc->natts))); +} + +static bool +CopyFromOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls) +{ + ereport(NOTICE, (errmsg("CopyFromOneRow"))); + return false; +} + +static void +CopyFromEnd(CopyFromState cstate) +{ + ereport(NOTICE, (errmsg("CopyFromEnd"))); +} + +static const CopyFromRoutine CopyFromRoutineTestCopyFormat = { + .type = T_CopyFromRoutine, + .CopyFromInFunc = CopyFromInFunc, + .CopyFromStart = CopyFromStart, + .CopyFromOneRow = CopyFromOneRow, + .CopyFromEnd = CopyFromEnd, +}; + +static void +CopyToOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo) +{ + ereport(NOTICE, (errmsg("CopyToOutFunc: atttypid=%d", atttypid))); +} + +static void +CopyToStart(CopyToState cstate, TupleDesc tupDesc) +{ + ereport(NOTICE, (errmsg("CopyToStart: natts=%d", tupDesc->natts))); +} + +static void +CopyToOneRow(CopyToState cstate, TupleTableSlot *slot) +{ + ereport(NOTICE, (errmsg("CopyToOneRow: tts_nvalid=%u", slot->tts_nvalid))); +} + +static void +CopyToEnd(CopyToState cstate) +{ + ereport(NOTICE, (errmsg("CopyToEnd"))); +} + +static const CopyToRoutine CopyToRoutineTestCopyFormat = { + .type = T_CopyToRoutine, + .CopyToOutFunc = CopyToOutFunc, + .CopyToStart = CopyToStart, + .CopyToOneRow = CopyToOneRow, + .CopyToEnd = CopyToEnd, +}; + +PG_FUNCTION_INFO_V1(test_copy_format); +Datum +test_copy_format(PG_FUNCTION_ARGS) +{ + bool is_from = PG_GETARG_BOOL(0); + + ereport(NOTICE, + (errmsg("test_copy_format: is_from=%s", is_from ? "true" : "false"))); + + if (is_from) + PG_RETURN_POINTER(&CopyFromRoutineTestCopyFormat); + else + PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat); +} diff --git a/src/test/modules/test_copy_format/test_copy_format.control b/src/test/modules/test_copy_format/test_copy_format.control new file mode 100644 index 00000000000..f05a6362358 --- /dev/null +++ b/src/test/modules/test_copy_format/test_copy_format.control @@ -0,0 +1,4 @@ +comment = 'Test code for custom COPY format' +default_version = '1.0' +module_pathname = '$libdir/test_copy_format' +relocatable = true -- 2.45.2 From 682b868c825409d44aec0d3ad32ece63aaed309f Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Tue, 23 Jan 2024 14:54:10 +0900 Subject: [PATCH v22 4/5] Export CopyToStateData and CopyFromStateData It's for custom COPY TO/FROM format handlers implemented as extension. This just moves codes. This doesn't change codes except CopyDest/CopySource enum values. CopyDest/CopySource enum values such as COPY_FILE are conflicted each other. So COPY_DEST_ prefix instead of COPY_ prefix is used for CopyDest enum values and COPY_SOURCE_ prefix instead of COPY_ prefix is used for CopySource enum values. For example, COPY_FILE in CopyDest is renamed to COPY_DEST_FILE and COPY_FILE in CopySource is renamed to COPY_SOURCE_FILE. Note that this isn't enough to implement custom COPY TO/FROM format handlers as extension. We'll do the followings in a subsequent commit: For custom COPY TO format handler: 1. Add an opaque space for custom COPY TO format handler 2. Export CopySendEndOfRow() to flush buffer For custom COPY FROM format handler: 1. Add an opaque space for custom COPY FROM format handler 2. Export CopyReadBinaryData() to read the next data --- src/backend/commands/copyfrom.c | 4 +- src/backend/commands/copyfromparse.c | 10 +- src/backend/commands/copyto.c | 77 +----- src/include/commands/copy.h | 81 +----- src/include/commands/copyapi.h | 309 ++++++++++++++++++++++- src/include/commands/copyfrom_internal.h | 165 ------------ 6 files changed, 323 insertions(+), 323 deletions(-) diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index 932f1ff4f6e..d758e66c6a1 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -1716,7 +1716,7 @@ BeginCopyFrom(ParseState *pstate, pg_encoding_to_char(GetDatabaseEncoding())))); } - cstate->copy_src = COPY_FILE; /* default */ + cstate->copy_src = COPY_SOURCE_FILE; /* default */ cstate->whereClause = whereClause; @@ -1844,7 +1844,7 @@ BeginCopyFrom(ParseState *pstate, if (data_source_cb) { progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK; - cstate->copy_src = COPY_CALLBACK; + cstate->copy_src = COPY_SOURCE_CALLBACK; cstate->data_source_cb = data_source_cb; } else if (pipe) diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index 0447c4df7e0..ccfbacb4a37 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -171,7 +171,7 @@ ReceiveCopyBegin(CopyFromState cstate) for (i = 0; i < natts; i++) pq_sendint16(&buf, format); /* per-column formats */ pq_endmessage(&buf); - cstate->copy_src = COPY_FRONTEND; + cstate->copy_src = COPY_SOURCE_FRONTEND; cstate->fe_msgbuf = makeStringInfo(); /* We *must* flush here to ensure FE knows it can send. */ pq_flush(); @@ -239,7 +239,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread) switch (cstate->copy_src) { - case COPY_FILE: + case COPY_SOURCE_FILE: bytesread = fread(databuf, 1, maxread, cstate->copy_file); if (ferror(cstate->copy_file)) ereport(ERROR, @@ -248,7 +248,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread) if (bytesread == 0) cstate->raw_reached_eof = true; break; - case COPY_FRONTEND: + case COPY_SOURCE_FRONTEND: while (maxread > 0 && bytesread < minread && !cstate->raw_reached_eof) { int avail; @@ -331,7 +331,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread) bytesread += avail; } break; - case COPY_CALLBACK: + case COPY_SOURCE_CALLBACK: bytesread = cstate->data_source_cb(databuf, minread, maxread); break; } @@ -1179,7 +1179,7 @@ CopyReadLine(CopyFromState cstate, bool is_csv) * after \. up to the protocol end of copy data. (XXX maybe better * not to treat \. as special?) */ - if (cstate->copy_src == COPY_FRONTEND) + if (cstate->copy_src == COPY_SOURCE_FRONTEND) { int inbytes; diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c index 1f1d2baf9be..fb68f42ce1e 100644 --- a/src/backend/commands/copyto.c +++ b/src/backend/commands/copyto.c @@ -37,67 +37,6 @@ #include "utils/rel.h" #include "utils/snapmgr.h" -/* - * Represents the different dest cases we need to worry about at - * the bottom level - */ -typedef enum CopyDest -{ - COPY_FILE, /* to file (or a piped program) */ - COPY_FRONTEND, /* to frontend */ - COPY_CALLBACK, /* to callback function */ -} CopyDest; - -/* - * This struct contains all the state variables used throughout a COPY TO - * operation. - * - * Multi-byte encodings: all supported client-side encodings encode multi-byte - * characters by having the first byte's high bit set. Subsequent bytes of the - * character can have the high bit not set. When scanning data in such an - * encoding to look for a match to a single-byte (ie ASCII) character, we must - * use the full pg_encoding_mblen() machinery to skip over multibyte - * characters, else we might find a false match to a trailing byte. In - * supported server encodings, there is no possibility of a false match, and - * it's faster to make useless comparisons to trailing bytes than it is to - * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true - * when we have to do it the hard way. - */ -typedef struct CopyToStateData -{ - /* format routine */ - const CopyToRoutine *routine; - - /* low-level state data */ - CopyDest copy_dest; /* type of copy source/destination */ - FILE *copy_file; /* used if copy_dest == COPY_FILE */ - StringInfo fe_msgbuf; /* used for all dests during COPY TO */ - - int file_encoding; /* file or remote side's character encoding */ - bool need_transcoding; /* file encoding diff from server? */ - bool encoding_embeds_ascii; /* ASCII can be non-first byte? */ - - /* parameters from the COPY command */ - Relation rel; /* relation to copy to */ - QueryDesc *queryDesc; /* executable query to copy from */ - List *attnumlist; /* integer list of attnums to copy */ - char *filename; /* filename, or NULL for STDOUT */ - bool is_program; /* is 'filename' a program to popen? */ - copy_data_dest_cb data_dest_cb; /* function for writing data */ - - CopyFormatOptions opts; - Node *whereClause; /* WHERE condition (or NULL) */ - - /* - * Working state - */ - MemoryContext copycontext; /* per-copy execution context */ - - FmgrInfo *out_functions; /* lookup info for output functions */ - MemoryContext rowcontext; /* per-row evaluation context */ - uint64 bytes_processed; /* number of bytes processed so far */ -} CopyToStateData; - /* DestReceiver for COPY (query) TO */ typedef struct { @@ -143,7 +82,7 @@ CopyToTextLikeSendEndOfRow(CopyToState cstate) { switch (cstate->copy_dest) { - case COPY_FILE: + case COPY_DEST_FILE: /* Default line termination depends on platform */ #ifndef WIN32 CopySendChar(cstate, '\n'); @@ -151,7 +90,7 @@ CopyToTextLikeSendEndOfRow(CopyToState cstate) CopySendString(cstate, "\r\n"); #endif break; - case COPY_FRONTEND: + case COPY_DEST_FRONTEND: /* The FE/BE protocol uses \n as newline for all platforms */ CopySendChar(cstate, '\n'); break; @@ -460,7 +399,7 @@ SendCopyBegin(CopyToState cstate) for (i = 0; i < natts; i++) pq_sendint16(&buf, format); /* per-column formats */ pq_endmessage(&buf); - cstate->copy_dest = COPY_FRONTEND; + cstate->copy_dest = COPY_DEST_FRONTEND; } static void @@ -507,7 +446,7 @@ CopySendEndOfRow(CopyToState cstate) switch (cstate->copy_dest) { - case COPY_FILE: + case COPY_DEST_FILE: if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1, cstate->copy_file) != 1 || ferror(cstate->copy_file)) @@ -541,11 +480,11 @@ CopySendEndOfRow(CopyToState cstate) errmsg("could not write to COPY file: %m"))); } break; - case COPY_FRONTEND: + case COPY_DEST_FRONTEND: /* Dump the accumulated row as one CopyData message */ (void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len); break; - case COPY_CALLBACK: + case COPY_DEST_CALLBACK: cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len); break; } @@ -929,12 +868,12 @@ BeginCopyTo(ParseState *pstate, /* See Multibyte encoding comment above */ cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding); - cstate->copy_dest = COPY_FILE; /* default */ + cstate->copy_dest = COPY_DEST_FILE; /* default */ if (data_dest_cb) { progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK; - cstate->copy_dest = COPY_CALLBACK; + cstate->copy_dest = COPY_DEST_CALLBACK; cstate->data_dest_cb = data_dest_cb; } else if (pipe) diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h index 63f3e8e1af7..e2411848e9f 100644 --- a/src/include/commands/copy.h +++ b/src/include/commands/copy.h @@ -14,90 +14,11 @@ #ifndef COPY_H #define COPY_H -#include "nodes/execnodes.h" +#include "commands/copyapi.h" #include "nodes/parsenodes.h" #include "parser/parse_node.h" #include "tcop/dest.h" -/* - * Represents whether a header line should be present, and whether it must - * match the actual names (which implies "true"). - */ -typedef enum CopyHeaderChoice -{ - COPY_HEADER_FALSE = 0, - COPY_HEADER_TRUE, - COPY_HEADER_MATCH, -} CopyHeaderChoice; - -/* - * Represents where to save input processing errors. More values to be added - * in the future. - */ -typedef enum CopyOnErrorChoice -{ - COPY_ON_ERROR_STOP = 0, /* immediately throw errors, default */ - COPY_ON_ERROR_IGNORE, /* ignore errors */ -} CopyOnErrorChoice; - -/* - * Represents verbosity of logged messages by COPY command. - */ -typedef enum CopyLogVerbosityChoice -{ - COPY_LOG_VERBOSITY_SILENT = -1, /* logs none */ - COPY_LOG_VERBOSITY_DEFAULT = 0, /* logs no additional messages. As this is - * the default, assign 0 */ - COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */ -} CopyLogVerbosityChoice; - -/* - * A struct to hold COPY options, in a parsed form. All of these are related - * to formatting, except for 'freeze', which doesn't really belong here, but - * it's expedient to parse it along with all the other options. - */ -typedef struct CopyFormatOptions -{ - /* parameters from the COPY command */ - int file_encoding; /* file or remote side's character encoding, - * -1 if not specified */ - bool binary; /* binary format? */ - bool freeze; /* freeze rows on loading? */ - bool csv_mode; /* Comma Separated Value format? */ - CopyHeaderChoice header_line; /* header line? */ - char *null_print; /* NULL marker string (server encoding!) */ - int null_print_len; /* length of same */ - char *null_print_client; /* same converted to file encoding */ - char *default_print; /* DEFAULT marker string */ - int default_print_len; /* length of same */ - char *delim; /* column delimiter (must be 1 byte) */ - char *quote; /* CSV quote char (must be 1 byte) */ - char *escape; /* CSV escape char (must be 1 byte) */ - List *force_quote; /* list of column names */ - bool force_quote_all; /* FORCE_QUOTE *? */ - bool *force_quote_flags; /* per-column CSV FQ flags */ - List *force_notnull; /* list of column names */ - bool force_notnull_all; /* FORCE_NOT_NULL *? */ - bool *force_notnull_flags; /* per-column CSV FNN flags */ - List *force_null; /* list of column names */ - bool force_null_all; /* FORCE_NULL *? */ - bool *force_null_flags; /* per-column CSV FN flags */ - bool convert_selectively; /* do selective binary conversion? */ - CopyOnErrorChoice on_error; /* what to do when error happened */ - CopyLogVerbosityChoice log_verbosity; /* verbosity of logged messages */ - int64 reject_limit; /* maximum tolerable number of errors */ - List *convert_select; /* list of column names (can be NIL) */ - Node *routine; /* CopyToRoutine or CopyFromRoutine (can be - * NULL) */ -} CopyFormatOptions; - -/* These are private in commands/copy[from|to].c */ -typedef struct CopyFromStateData *CopyFromState; -typedef struct CopyToStateData *CopyToState; - -typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread); -typedef void (*copy_data_dest_cb) (void *data, int len); - extern void DoCopy(ParseState *pstate, const CopyStmt *stmt, int stmt_location, int stmt_len, uint64 *processed); diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index e049a45a4b1..206d4c9fac9 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -14,12 +14,84 @@ #ifndef COPYAPI_H #define COPYAPI_H +#include "commands/trigger.h" +#include "executor/execdesc.h" #include "executor/tuptable.h" #include "nodes/execnodes.h" -/* These are private in commands/copy[from|to].c */ +/* + * Represents whether a header line should be present, and whether it must + * match the actual names (which implies "true"). + */ +typedef enum CopyHeaderChoice +{ + COPY_HEADER_FALSE = 0, + COPY_HEADER_TRUE, + COPY_HEADER_MATCH, +} CopyHeaderChoice; + +/* + * Represents where to save input processing errors. More values to be added + * in the future. + */ +typedef enum CopyOnErrorChoice +{ + COPY_ON_ERROR_STOP = 0, /* immediately throw errors, default */ + COPY_ON_ERROR_IGNORE, /* ignore errors */ +} CopyOnErrorChoice; + +/* + * Represents verbosity of logged messages by COPY command. + */ +typedef enum CopyLogVerbosityChoice +{ + COPY_LOG_VERBOSITY_SILENT = -1, /* logs none */ + COPY_LOG_VERBOSITY_DEFAULT = 0, /* logs no additional messages. As this is + * the default, assign 0 */ + COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */ +} CopyLogVerbosityChoice; + +/* + * A struct to hold COPY options, in a parsed form. All of these are related + * to formatting, except for 'freeze', which doesn't really belong here, but + * it's expedient to parse it along with all the other options. + */ +typedef struct CopyFormatOptions +{ + /* parameters from the COPY command */ + int file_encoding; /* file or remote side's character encoding, + * -1 if not specified */ + bool binary; /* binary format? */ + bool freeze; /* freeze rows on loading? */ + bool csv_mode; /* Comma Separated Value format? */ + CopyHeaderChoice header_line; /* header line? */ + char *null_print; /* NULL marker string (server encoding!) */ + int null_print_len; /* length of same */ + char *null_print_client; /* same converted to file encoding */ + char *default_print; /* DEFAULT marker string */ + int default_print_len; /* length of same */ + char *delim; /* column delimiter (must be 1 byte) */ + char *quote; /* CSV quote char (must be 1 byte) */ + char *escape; /* CSV escape char (must be 1 byte) */ + List *force_quote; /* list of column names */ + bool force_quote_all; /* FORCE_QUOTE *? */ + bool *force_quote_flags; /* per-column CSV FQ flags */ + List *force_notnull; /* list of column names */ + bool force_notnull_all; /* FORCE_NOT_NULL *? */ + bool *force_notnull_flags; /* per-column CSV FNN flags */ + List *force_null; /* list of column names */ + bool force_null_all; /* FORCE_NULL *? */ + bool *force_null_flags; /* per-column CSV FN flags */ + bool convert_selectively; /* do selective binary conversion? */ + CopyOnErrorChoice on_error; /* what to do when error happened */ + CopyLogVerbosityChoice log_verbosity; /* verbosity of logged messages */ + int64 reject_limit; /* maximum tolerable number of errors */ + List *convert_select; /* list of column names (can be NIL) */ + Node *routine; /* CopyToRoutine or CopyFromRoutine (can be + * NULL) */ +} CopyFormatOptions; + typedef struct CopyFromStateData *CopyFromState; -typedef struct CopyToStateData *CopyToState; /* * API structure for a COPY FROM format implementation. Note this must be @@ -65,6 +137,176 @@ typedef struct CopyFromRoutine void (*CopyFromEnd) (CopyFromState cstate); } CopyFromRoutine; +/* + * Represents the different source cases we need to worry about at + * the bottom level + */ +typedef enum CopySource +{ + COPY_SOURCE_FILE, /* from file (or a piped program) */ + COPY_SOURCE_FRONTEND, /* from frontend */ + COPY_SOURCE_CALLBACK, /* from callback function */ +} CopySource; + +/* + * Represents the end-of-line terminator type of the input + */ +typedef enum EolType +{ + EOL_UNKNOWN, + EOL_NL, + EOL_CR, + EOL_CRNL, +} EolType; + +/* + * Represents the insert method to be used during COPY FROM. + */ +typedef enum CopyInsertMethod +{ + CIM_SINGLE, /* use table_tuple_insert or ExecForeignInsert */ + CIM_MULTI, /* always use table_multi_insert or + * ExecForeignBatchInsert */ + CIM_MULTI_CONDITIONAL, /* use table_multi_insert or + * ExecForeignBatchInsert only if valid */ +} CopyInsertMethod; + +typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread); + +/* + * This struct contains all the state variables used throughout a COPY FROM + * operation. + */ +typedef struct CopyFromStateData +{ + /* format routine */ + const CopyFromRoutine *routine; + + /* low-level state data */ + CopySource copy_src; /* type of copy source */ + FILE *copy_file; /* used if copy_src == COPY_FILE */ + StringInfo fe_msgbuf; /* used if copy_src == COPY_FRONTEND */ + + EolType eol_type; /* EOL type of input */ + int file_encoding; /* file or remote side's character encoding */ + bool need_transcoding; /* file encoding diff from server? */ + Oid conversion_proc; /* encoding conversion function */ + + /* parameters from the COPY command */ + Relation rel; /* relation to copy from */ + List *attnumlist; /* integer list of attnums to copy */ + char *filename; /* filename, or NULL for STDIN */ + bool is_program; /* is 'filename' a program to popen? */ + copy_data_source_cb data_source_cb; /* function for reading data */ + + CopyFormatOptions opts; + bool *convert_select_flags; /* per-column CSV/TEXT CS flags */ + Node *whereClause; /* WHERE condition (or NULL) */ + + /* these are just for error messages, see CopyFromErrorCallback */ + const char *cur_relname; /* table name for error messages */ + uint64 cur_lineno; /* line number for error messages */ + const char *cur_attname; /* current att for error messages */ + const char *cur_attval; /* current att value for error messages */ + bool relname_only; /* don't output line number, att, etc. */ + + /* + * Working state + */ + MemoryContext copycontext; /* per-copy execution context */ + + AttrNumber num_defaults; /* count of att that are missing and have + * default value */ + FmgrInfo *in_functions; /* array of input functions for each attrs */ + Oid *typioparams; /* array of element types for in_functions */ + ErrorSaveContext *escontext; /* soft error trapper during in_functions + * execution */ + uint64 num_errors; /* total number of rows which contained soft + * errors */ + int *defmap; /* array of default att numbers related to + * missing att */ + ExprState **defexprs; /* array of default att expressions for all + * att */ + bool *defaults; /* if DEFAULT marker was found for + * corresponding att */ + bool volatile_defexprs; /* is any of defexprs volatile? */ + List *range_table; /* single element list of RangeTblEntry */ + List *rteperminfos; /* single element list of RTEPermissionInfo */ + ExprState *qualexpr; + + TransitionCaptureState *transition_capture; + + /* + * These variables are used to reduce overhead in COPY FROM. + * + * attribute_buf holds the separated, de-escaped text for each field of + * the current line. The CopyReadAttributes functions return arrays of + * pointers into this buffer. We avoid palloc/pfree overhead by re-using + * the buffer on each cycle. + * + * In binary COPY FROM, attribute_buf holds the binary data for the + * current field, but the usage is otherwise similar. + */ + StringInfoData attribute_buf; + + /* field raw data pointers found by COPY FROM */ + + int max_fields; + char **raw_fields; + + /* + * Similarly, line_buf holds the whole input line being processed. The + * input cycle is first to read the whole line into line_buf, and then + * extract the individual attribute fields into attribute_buf. line_buf + * is preserved unmodified so that we can display it in error messages if + * appropriate. (In binary mode, line_buf is not used.) + */ + StringInfoData line_buf; + bool line_buf_valid; /* contains the row being processed? */ + + /* + * input_buf holds input data, already converted to database encoding. + * + * In text mode, CopyReadLine parses this data sufficiently to locate line + * boundaries, then transfers the data to line_buf. We guarantee that + * there is a \0 at input_buf[input_buf_len] at all times. (In binary + * mode, input_buf is not used.) + * + * If encoding conversion is not required, input_buf is not a separate + * buffer but points directly to raw_buf. In that case, input_buf_len + * tracks the number of bytes that have been verified as valid in the + * database encoding, and raw_buf_len is the total number of bytes stored + * in the buffer. + */ +#define INPUT_BUF_SIZE 65536 /* we palloc INPUT_BUF_SIZE+1 bytes */ + char *input_buf; + int input_buf_index; /* next byte to process */ + int input_buf_len; /* total # of bytes stored */ + bool input_reached_eof; /* true if we reached EOF */ + bool input_reached_error; /* true if a conversion error happened */ + /* Shorthand for number of unconsumed bytes available in input_buf */ +#define INPUT_BUF_BYTES(cstate) ((cstate)->input_buf_len - (cstate)->input_buf_index) + + /* + * raw_buf holds raw input data read from the data source (file or client + * connection), not yet converted to the database encoding. Like with + * 'input_buf', we guarantee that there is a \0 at raw_buf[raw_buf_len]. + */ +#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */ + char *raw_buf; + int raw_buf_index; /* next byte to process */ + int raw_buf_len; /* total # of bytes stored */ + bool raw_reached_eof; /* true if we reached EOF */ + + /* Shorthand for number of unconsumed bytes available in raw_buf */ +#define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index) + + uint64 bytes_processed; /* number of bytes processed so far */ +} CopyFromStateData; + + +typedef struct CopyToStateData *CopyToState; + /* * API structure for a COPY TO format implementation. Note this must be * allocated in a server-lifetime manner, typically as a static const struct. @@ -102,4 +344,67 @@ typedef struct CopyToRoutine void (*CopyToEnd) (CopyToState cstate); } CopyToRoutine; +/* + * Represents the different dest cases we need to worry about at + * the bottom level + */ +typedef enum CopyDest +{ + COPY_DEST_FILE, /* to file (or a piped program) */ + COPY_DEST_FRONTEND, /* to frontend */ + COPY_DEST_CALLBACK, /* to callback function */ +} CopyDest; + +typedef void (*copy_data_dest_cb) (void *data, int len); + +/* + * This struct contains all the state variables used throughout a COPY TO + * operation. + * + * Multi-byte encodings: all supported client-side encodings encode multi-byte + * characters by having the first byte's high bit set. Subsequent bytes of the + * character can have the high bit not set. When scanning data in such an + * encoding to look for a match to a single-byte (ie ASCII) character, we must + * use the full pg_encoding_mblen() machinery to skip over multibyte + * characters, else we might find a false match to a trailing byte. In + * supported server encodings, there is no possibility of a false match, and + * it's faster to make useless comparisons to trailing bytes than it is to + * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true + * when we have to do it the hard way. + */ +typedef struct CopyToStateData +{ + /* format routine */ + const CopyToRoutine *routine; + + /* low-level state data */ + CopyDest copy_dest; /* type of copy source/destination */ + FILE *copy_file; /* used if copy_dest == COPY_FILE */ + StringInfo fe_msgbuf; /* used for all dests during COPY TO */ + + int file_encoding; /* file or remote side's character encoding */ + bool need_transcoding; /* file encoding diff from server? */ + bool encoding_embeds_ascii; /* ASCII can be non-first byte? */ + + /* parameters from the COPY command */ + Relation rel; /* relation to copy to */ + QueryDesc *queryDesc; /* executable query to copy from */ + List *attnumlist; /* integer list of attnums to copy */ + char *filename; /* filename, or NULL for STDOUT */ + bool is_program; /* is 'filename' a program to popen? */ + copy_data_dest_cb data_dest_cb; /* function for writing data */ + + CopyFormatOptions opts; + Node *whereClause; /* WHERE condition (or NULL) */ + + /* + * Working state + */ + MemoryContext copycontext; /* per-copy execution context */ + + FmgrInfo *out_functions; /* lookup info for output functions */ + MemoryContext rowcontext; /* per-row evaluation context */ + uint64 bytes_processed; /* number of bytes processed so far */ +} CopyToStateData; + #endif /* COPYAPI_H */ diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h index c11b5ff3cc0..3863d26d5b7 100644 --- a/src/include/commands/copyfrom_internal.h +++ b/src/include/commands/copyfrom_internal.h @@ -19,171 +19,6 @@ #include "commands/trigger.h" #include "nodes/miscnodes.h" -/* - * Represents the different source cases we need to worry about at - * the bottom level - */ -typedef enum CopySource -{ - COPY_FILE, /* from file (or a piped program) */ - COPY_FRONTEND, /* from frontend */ - COPY_CALLBACK, /* from callback function */ -} CopySource; - -/* - * Represents the end-of-line terminator type of the input - */ -typedef enum EolType -{ - EOL_UNKNOWN, - EOL_NL, - EOL_CR, - EOL_CRNL, -} EolType; - -/* - * Represents the insert method to be used during COPY FROM. - */ -typedef enum CopyInsertMethod -{ - CIM_SINGLE, /* use table_tuple_insert or ExecForeignInsert */ - CIM_MULTI, /* always use table_multi_insert or - * ExecForeignBatchInsert */ - CIM_MULTI_CONDITIONAL, /* use table_multi_insert or - * ExecForeignBatchInsert only if valid */ -} CopyInsertMethod; - -/* - * This struct contains all the state variables used throughout a COPY FROM - * operation. - */ -typedef struct CopyFromStateData -{ - /* format routine */ - const CopyFromRoutine *routine; - - /* low-level state data */ - CopySource copy_src; /* type of copy source */ - FILE *copy_file; /* used if copy_src == COPY_FILE */ - StringInfo fe_msgbuf; /* used if copy_src == COPY_FRONTEND */ - - EolType eol_type; /* EOL type of input */ - int file_encoding; /* file or remote side's character encoding */ - bool need_transcoding; /* file encoding diff from server? */ - Oid conversion_proc; /* encoding conversion function */ - - /* parameters from the COPY command */ - Relation rel; /* relation to copy from */ - List *attnumlist; /* integer list of attnums to copy */ - char *filename; /* filename, or NULL for STDIN */ - bool is_program; /* is 'filename' a program to popen? */ - copy_data_source_cb data_source_cb; /* function for reading data */ - - CopyFormatOptions opts; - bool *convert_select_flags; /* per-column CSV/TEXT CS flags */ - Node *whereClause; /* WHERE condition (or NULL) */ - - /* these are just for error messages, see CopyFromErrorCallback */ - const char *cur_relname; /* table name for error messages */ - uint64 cur_lineno; /* line number for error messages */ - const char *cur_attname; /* current att for error messages */ - const char *cur_attval; /* current att value for error messages */ - bool relname_only; /* don't output line number, att, etc. */ - - /* - * Working state - */ - MemoryContext copycontext; /* per-copy execution context */ - - AttrNumber num_defaults; /* count of att that are missing and have - * default value */ - FmgrInfo *in_functions; /* array of input functions for each attrs */ - Oid *typioparams; /* array of element types for in_functions */ - ErrorSaveContext *escontext; /* soft error trapper during in_functions - * execution */ - uint64 num_errors; /* total number of rows which contained soft - * errors */ - int *defmap; /* array of default att numbers related to - * missing att */ - ExprState **defexprs; /* array of default att expressions for all - * att */ - bool *defaults; /* if DEFAULT marker was found for - * corresponding att */ - bool volatile_defexprs; /* is any of defexprs volatile? */ - List *range_table; /* single element list of RangeTblEntry */ - List *rteperminfos; /* single element list of RTEPermissionInfo */ - ExprState *qualexpr; - - TransitionCaptureState *transition_capture; - - /* - * These variables are used to reduce overhead in COPY FROM. - * - * attribute_buf holds the separated, de-escaped text for each field of - * the current line. The CopyReadAttributes functions return arrays of - * pointers into this buffer. We avoid palloc/pfree overhead by re-using - * the buffer on each cycle. - * - * In binary COPY FROM, attribute_buf holds the binary data for the - * current field, but the usage is otherwise similar. - */ - StringInfoData attribute_buf; - - /* field raw data pointers found by COPY FROM */ - - int max_fields; - char **raw_fields; - - /* - * Similarly, line_buf holds the whole input line being processed. The - * input cycle is first to read the whole line into line_buf, and then - * extract the individual attribute fields into attribute_buf. line_buf - * is preserved unmodified so that we can display it in error messages if - * appropriate. (In binary mode, line_buf is not used.) - */ - StringInfoData line_buf; - bool line_buf_valid; /* contains the row being processed? */ - - /* - * input_buf holds input data, already converted to database encoding. - * - * In text mode, CopyReadLine parses this data sufficiently to locate line - * boundaries, then transfers the data to line_buf. We guarantee that - * there is a \0 at input_buf[input_buf_len] at all times. (In binary - * mode, input_buf is not used.) - * - * If encoding conversion is not required, input_buf is not a separate - * buffer but points directly to raw_buf. In that case, input_buf_len - * tracks the number of bytes that have been verified as valid in the - * database encoding, and raw_buf_len is the total number of bytes stored - * in the buffer. - */ -#define INPUT_BUF_SIZE 65536 /* we palloc INPUT_BUF_SIZE+1 bytes */ - char *input_buf; - int input_buf_index; /* next byte to process */ - int input_buf_len; /* total # of bytes stored */ - bool input_reached_eof; /* true if we reached EOF */ - bool input_reached_error; /* true if a conversion error happened */ - /* Shorthand for number of unconsumed bytes available in input_buf */ -#define INPUT_BUF_BYTES(cstate) ((cstate)->input_buf_len - (cstate)->input_buf_index) - - /* - * raw_buf holds raw input data read from the data source (file or client - * connection), not yet converted to the database encoding. Like with - * 'input_buf', we guarantee that there is a \0 at raw_buf[raw_buf_len]. - */ -#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */ - char *raw_buf; - int raw_buf_index; /* next byte to process */ - int raw_buf_len; /* total # of bytes stored */ - bool raw_reached_eof; /* true if we reached EOF */ - - /* Shorthand for number of unconsumed bytes available in raw_buf */ -#define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index) - - uint64 bytes_processed; /* number of bytes processed so far */ -} CopyFromStateData; - extern void ReceiveCopyBegin(CopyFromState cstate); extern void ReceiveCopyBinaryHeader(CopyFromState cstate); -- 2.45.2 From 3f9b4a8caa33960fe11512883177a96939186373 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Tue, 23 Jan 2024 15:12:43 +0900 Subject: [PATCH v22 5/5] Add support for implementing custom COPY TO/FROM format as extension For custom COPY TO format implementation: * Add CopyToStateData::opaque that can be used to keep data for custom COPY TO format implementation * Export CopySendEndOfRow() to flush data in CopyToStateData::fe_msgbuf as CopyToStateFlush() For custom COPY FROM format implementation: * Add CopyFromStateData::opaque that can be used to keep data for custom COPY From format implementation * Export CopyReadBinaryData() to read the next data as CopyFromStateRead() --- src/backend/commands/copyfromparse.c | 14 ++++++++++++++ src/backend/commands/copyto.c | 14 ++++++++++++++ src/include/commands/copyapi.h | 10 ++++++++++ 3 files changed, 38 insertions(+) diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index ccfbacb4a37..4fa23d992f5 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -730,6 +730,20 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes) return copied_bytes; } +/* + * CopyFromStateRead + * + * Export CopyReadBinaryData() for extensions. We want to keep + * CopyReadBinaryData() as a static function for + * optimization. CopyReadBinaryData() calls in this file may be optimized by + * a compiler. + */ +int +CopyFromStateRead(CopyFromState cstate, char *dest, int nbytes) +{ + return CopyReadBinaryData(cstate, dest, nbytes); +} + /* * Read raw fields in the next line for COPY FROM in text or csv mode. * Return false if no more lines. diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c index fb68f42ce1e..93b041352c5 100644 --- a/src/backend/commands/copyto.c +++ b/src/backend/commands/copyto.c @@ -496,6 +496,20 @@ CopySendEndOfRow(CopyToState cstate) resetStringInfo(fe_msgbuf); } +/* + * CopyToStateFlush + * + * Export CopySendEndOfRow() for extensions. We want to keep + * CopySendEndOfRow() as a static function for + * optimization. CopySendEndOfRow() calls in this file may be optimized by a + * compiler. + */ +void +CopyToStateFlush(CopyToState cstate) +{ + CopySendEndOfRow(cstate); +} + /* * These functions do apply some data conversion */ diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index 206d4c9fac9..2de610ef729 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -302,8 +302,13 @@ typedef struct CopyFromStateData #define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index) uint64 bytes_processed; /* number of bytes processed so far */ + + /* For custom format implementation */ + void *opaque; /* private space */ } CopyFromStateData; +extern int CopyFromStateRead(CopyFromState cstate, char *dest, int nbytes); + typedef struct CopyToStateData *CopyToState; @@ -405,6 +410,11 @@ typedef struct CopyToStateData FmgrInfo *out_functions; /* lookup info for output functions */ MemoryContext rowcontext; /* per-row evaluation context */ uint64 bytes_processed; /* number of bytes processed so far */ + + /* For custom format implementation */ + void *opaque; /* private space */ } CopyToStateData; +extern void CopyToStateFlush(CopyToState cstate); + #endif /* COPYAPI_H */ -- 2.45.2 From 79470eab70ba8df417796cc9e66eca41b97e74b5 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Sat, 28 Sep 2024 23:24:49 +0900 Subject: [PATCH v23 01/10] Add CopyToRountine It's for implementing custom COPY TO format. But this is not enough to implement custom COPY TO format yet. We'll export some APIs to send data and add "format" option to COPY TO later. Existing text/csv/binary format implementations don't use CopyToRoutine for now. We have a patch for it but we defer it. Because there are some mysterious profile results in spite of we get faster runtimes. See [1] for details. [1] https://www.postgresql.org/message-id/ZdbtQJ-p5H1_EDwE%40paquier.xyz Note that this doesn't change existing text/csv/binary format implementations. --- src/backend/commands/copyto.c | 31 ++++++++++++++--- src/include/commands/copyapi.h | 58 ++++++++++++++++++++++++++++++++ src/tools/pgindent/typedefs.list | 1 + 3 files changed, 86 insertions(+), 4 deletions(-) create mode 100644 src/include/commands/copyapi.h diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c index f55e6d96751..405e1782685 100644 --- a/src/backend/commands/copyto.c +++ b/src/backend/commands/copyto.c @@ -20,6 +20,7 @@ #include "access/tableam.h" #include "commands/copy.h" +#include "commands/copyapi.h" #include "commands/progress.h" #include "executor/execdesc.h" #include "executor/executor.h" @@ -64,6 +65,9 @@ typedef enum CopyDest */ typedef struct CopyToStateData { + /* format routine */ + const CopyToRoutine *routine; + /* low-level state data */ CopyDest copy_dest; /* type of copy source/destination */ FILE *copy_file; /* used if copy_dest == COPY_FILE */ @@ -776,14 +780,22 @@ DoCopyTo(CopyToState cstate) Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1); if (cstate->opts.binary) + { getTypeBinaryOutputInfo(attr->atttypid, &out_func_oid, &isvarlena); + fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]); + } + else if (cstate->routine) + cstate->routine->CopyToOutFunc(cstate, attr->atttypid, + &cstate->out_functions[attnum - 1]); else + { getTypeOutputInfo(attr->atttypid, &out_func_oid, &isvarlena); - fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]); + fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]); + } } /* @@ -810,6 +822,8 @@ DoCopyTo(CopyToState cstate) tmp = 0; CopySendInt32(cstate, tmp); } + else if (cstate->routine) + cstate->routine->CopyToStart(cstate, tupDesc); else { /* @@ -891,6 +905,8 @@ DoCopyTo(CopyToState cstate) /* Need to flush out the trailer */ CopySendEndOfRow(cstate); } + else if (cstate->routine) + cstate->routine->CopyToEnd(cstate); MemoryContextDelete(cstate->rowcontext); @@ -912,15 +928,22 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot) MemoryContextReset(cstate->rowcontext); oldcontext = MemoryContextSwitchTo(cstate->rowcontext); + /* Make sure the tuple is fully deconstructed */ + slot_getallattrs(slot); + + if (cstate->routine) + { + cstate->routine->CopyToOneRow(cstate, slot); + MemoryContextSwitchTo(oldcontext); + return; + } + if (cstate->opts.binary) { /* Binary per-tuple header */ CopySendInt16(cstate, list_length(cstate->attnumlist)); } - /* Make sure the tuple is fully deconstructed */ - slot_getallattrs(slot); - if (!cstate->opts.binary) { bool need_delim = false; diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h new file mode 100644 index 00000000000..5ce24f195dc --- /dev/null +++ b/src/include/commands/copyapi.h @@ -0,0 +1,58 @@ +/*------------------------------------------------------------------------- + * + * copyapi.h + * API for COPY TO handlers + * + * + * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + * src/include/commands/copyapi.h + * + *------------------------------------------------------------------------- + */ +#ifndef COPYAPI_H +#define COPYAPI_H + +#include "executor/tuptable.h" +#include "nodes/execnodes.h" + +/* This is private in commands/copyto.c */ +typedef struct CopyToStateData *CopyToState; + +/* + * API structure for a COPY TO format implementation. Note this must be + * allocated in a server-lifetime manner, typically as a static const struct. + */ +typedef struct CopyToRoutine +{ + /* + * Called when COPY TO is started to set up the output functions + * associated with the relation's attributes reading from. `finfo` can be + * optionally filled to provide the catalog information of the output + * function. `atttypid` is the OID of data type used by the relation's + * attribute. + */ + void (*CopyToOutFunc) (CopyToState cstate, Oid atttypid, + FmgrInfo *finfo); + + /* + * Called when COPY TO is started. + * + * `tupDesc` is the tuple descriptor of the relation from where the data + * is read. + */ + void (*CopyToStart) (CopyToState cstate, TupleDesc tupDesc); + + /* + * Copy one row for COPY TO. + * + * `slot` is the tuple slot where the data is emitted. + */ + void (*CopyToOneRow) (CopyToState cstate, TupleTableSlot *slot); + + /* Called when COPY TO has ended */ + void (*CopyToEnd) (CopyToState cstate); +} CopyToRoutine; + +#endif /* COPYAPI_H */ diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list index 1847bbfa95c..098e7023486 100644 --- a/src/tools/pgindent/typedefs.list +++ b/src/tools/pgindent/typedefs.list @@ -503,6 +503,7 @@ CopyMultiInsertInfo CopyOnErrorChoice CopySource CopyStmt +CopyToRoutine CopyToState CopyToStateData Cost -- 2.45.2 From 8743273a660c75b49862dbb18991c74131ed776e Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Sat, 28 Sep 2024 23:26:29 +0900 Subject: [PATCH v23 02/10] Use CopyToRountine for the existing formats The existing formats are text, csv and binary. If we find any performance regression by this, we will not merge this to master. This will increase indirect function call costs but this will reduce runtime "if (cstate->opts.binary)" and "if (cstate->opts.csv_mode)" branch costs. This uses an optimization based of static inline function and a constant argument call for cstate->opts.csv_mode. For example, CopyToTextLikeOneRow() uses this optimization. It accepts the "bool is_csv" argument instead of using cstate->opts.csv_mode in it. CopyToTextOneRow() calls CopyToTextLikeOneRow() with false (constant) for "bool is_csv". Compiler will remove "if (is_csv)" branch in it by this optimization. This doesn't change existing logic. This just moves existing codes. --- src/backend/commands/copyto.c | 477 +++++++++++++++++++++++----------- 1 file changed, 319 insertions(+), 158 deletions(-) diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c index 405e1782685..46f3507a8b5 100644 --- a/src/backend/commands/copyto.c +++ b/src/backend/commands/copyto.c @@ -128,6 +128,317 @@ static void CopySendEndOfRow(CopyToState cstate); static void CopySendInt32(CopyToState cstate, int32 val); static void CopySendInt16(CopyToState cstate, int16 val); +/* + * CopyToRoutine implementations. + */ + +/* + * CopyToTextLikeSendEndOfRow + * + * Apply line terminations for a line sent in text or CSV format depending + * on the destination, then send the end of a row. + */ +static pg_attribute_always_inline void +CopyToTextLikeSendEndOfRow(CopyToState cstate) +{ + switch (cstate->copy_dest) + { + case COPY_FILE: + /* Default line termination depends on platform */ +#ifndef WIN32 + CopySendChar(cstate, '\n'); +#else + CopySendString(cstate, "\r\n"); +#endif + break; + case COPY_FRONTEND: + /* The FE/BE protocol uses \n as newline for all platforms */ + CopySendChar(cstate, '\n'); + break; + default: + break; + } + + /* Now take the actions related to the end of a row */ + CopySendEndOfRow(cstate); +} + +/* + * CopyToTextLikeStart + * + * Start of COPY TO for text and CSV format. + */ +static void +CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc) +{ + /* + * For non-binary copy, we need to convert null_print to file encoding, + * because it will be sent directly with CopySendString. + */ + if (cstate->need_transcoding) + cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print, + cstate->opts.null_print_len, + cstate->file_encoding); + + /* if a header has been requested send the line */ + if (cstate->opts.header_line) + { + ListCell *cur; + bool hdr_delim = false; + + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + char *colname; + + if (hdr_delim) + CopySendChar(cstate, cstate->opts.delim[0]); + hdr_delim = true; + + colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname); + + if (cstate->opts.csv_mode) + CopyAttributeOutCSV(cstate, colname, false); + else + CopyAttributeOutText(cstate, colname); + } + + CopyToTextLikeSendEndOfRow(cstate); + } +} + +/* + * CopyToTextLikeOutFunc + * + * Assign output function data for a relation's attribute in text/CSV format. + */ +static void +CopyToTextLikeOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo) +{ + Oid func_oid; + bool is_varlena; + + /* Set output function for an attribute */ + getTypeOutputInfo(atttypid, &func_oid, &is_varlena); + fmgr_info(func_oid, finfo); +} + + +/* + * CopyToTextLikeOneRow + * + * Process one row for text/CSV format. + * + * Workhorse for CopyToTextOneRow() and CopyToCSVOneRow(). + */ +static pg_attribute_always_inline void +CopyToTextLikeOneRow(CopyToState cstate, + TupleTableSlot *slot, + bool is_csv) +{ + bool need_delim = false; + FmgrInfo *out_functions = cstate->out_functions; + + foreach_int(attnum, cstate->attnumlist) + { + Datum value = slot->tts_values[attnum - 1]; + bool isnull = slot->tts_isnull[attnum - 1]; + + if (need_delim) + CopySendChar(cstate, cstate->opts.delim[0]); + need_delim = true; + + if (isnull) + { + CopySendString(cstate, cstate->opts.null_print_client); + } + else + { + char *string; + + string = OutputFunctionCall(&out_functions[attnum - 1], + value); + + /* + * is_csv will be optimized away by compiler, as argument is + * constant at caller. + */ + if (is_csv) + CopyAttributeOutCSV(cstate, string, + cstate->opts.force_quote_flags[attnum - 1]); + else + CopyAttributeOutText(cstate, string); + } + } + + CopyToTextLikeSendEndOfRow(cstate); +} + +/* + * CopyToTextOneRow + * + * Per-row callback for COPY TO with text format. + */ +static void +CopyToTextOneRow(CopyToState cstate, TupleTableSlot *slot) +{ + CopyToTextLikeOneRow(cstate, slot, false); +} + +/* + * CopyToTextOneRow + * + * Per-row callback for COPY TO with CSV format. + */ +static void +CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot) +{ + CopyToTextLikeOneRow(cstate, slot, true); +} + +/* + * CopyToTextLikeEnd + * + * End of COPY TO for text/CSV format. + */ +static void +CopyToTextLikeEnd(CopyToState cstate) +{ + /* Nothing to do here */ +} + +/* + * CopyToRoutine implementation for "binary". + */ + +/* + * CopyToBinaryStart + * + * Start of COPY TO for binary format. + */ +static void +CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc) +{ + /* Generate header for a binary copy */ + int32 tmp; + + /* Signature */ + CopySendData(cstate, BinarySignature, 11); + /* Flags field */ + tmp = 0; + CopySendInt32(cstate, tmp); + /* No header extension */ + tmp = 0; + CopySendInt32(cstate, tmp); +} + +/* + * CopyToBinaryOutFunc + * + * Assign output function data for a relation's attribute in binary format. + */ +static void +CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo) +{ + Oid func_oid; + bool is_varlena; + + /* Set output function for an attribute */ + getTypeBinaryOutputInfo(atttypid, &func_oid, &is_varlena); + fmgr_info(func_oid, finfo); +} + +/* + * CopyToBinaryOneRow + * + * Process one row for binary format. + */ +static void +CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot) +{ + FmgrInfo *out_functions = cstate->out_functions; + + /* Binary per-tuple header */ + CopySendInt16(cstate, list_length(cstate->attnumlist)); + + foreach_int(attnum, cstate->attnumlist) + { + Datum value = slot->tts_values[attnum - 1]; + bool isnull = slot->tts_isnull[attnum - 1]; + + if (isnull) + { + CopySendInt32(cstate, -1); + } + else + { + bytea *outputbytes; + + outputbytes = SendFunctionCall(&out_functions[attnum - 1], + value); + CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ); + CopySendData(cstate, VARDATA(outputbytes), + VARSIZE(outputbytes) - VARHDRSZ); + } + } + + CopySendEndOfRow(cstate); +} + +/* + * CopyToBinaryEnd + * + * End of COPY TO for binary format. + */ +static void +CopyToBinaryEnd(CopyToState cstate) +{ + /* Generate trailer for a binary copy */ + CopySendInt16(cstate, -1); + /* Need to flush out the trailer */ + CopySendEndOfRow(cstate); +} + +/* + * CSV and text share the same implementation, at the exception of the + * output representation and per-row callbacks. + */ +static const CopyToRoutine CopyToRoutineText = { + .CopyToStart = CopyToTextLikeStart, + .CopyToOutFunc = CopyToTextLikeOutFunc, + .CopyToOneRow = CopyToTextOneRow, + .CopyToEnd = CopyToTextLikeEnd, +}; + +static const CopyToRoutine CopyToRoutineCSV = { + .CopyToStart = CopyToTextLikeStart, + .CopyToOutFunc = CopyToTextLikeOutFunc, + .CopyToOneRow = CopyToCSVOneRow, + .CopyToEnd = CopyToTextLikeEnd, +}; + +static const CopyToRoutine CopyToRoutineBinary = { + .CopyToStart = CopyToBinaryStart, + .CopyToOutFunc = CopyToBinaryOutFunc, + .CopyToOneRow = CopyToBinaryOneRow, + .CopyToEnd = CopyToBinaryEnd, +}; + +/* + * Define the COPY TO routines to use for a format. This should be called + * after options are parsed. + */ +static const CopyToRoutine * +CopyToGetRoutine(CopyFormatOptions opts) +{ + if (opts.csv_mode) + return &CopyToRoutineCSV; + else if (opts.binary) + return &CopyToRoutineBinary; + + /* default is text */ + return &CopyToRoutineText; +} /* * Send copy start/stop messages for frontend copies. These have changed @@ -195,16 +506,6 @@ CopySendEndOfRow(CopyToState cstate) switch (cstate->copy_dest) { case COPY_FILE: - if (!cstate->opts.binary) - { - /* Default line termination depends on platform */ -#ifndef WIN32 - CopySendChar(cstate, '\n'); -#else - CopySendString(cstate, "\r\n"); -#endif - } - if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1, cstate->copy_file) != 1 || ferror(cstate->copy_file)) @@ -239,10 +540,6 @@ CopySendEndOfRow(CopyToState cstate) } break; case COPY_FRONTEND: - /* The FE/BE protocol uses \n as newline for all platforms */ - if (!cstate->opts.binary) - CopySendChar(cstate, '\n'); - /* Dump the accumulated row as one CopyData message */ (void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len); break; @@ -430,6 +727,9 @@ BeginCopyTo(ParseState *pstate, /* Extract options from the statement node tree */ ProcessCopyOptions(pstate, &cstate->opts, false /* is_from */ , options); + /* Set format routine */ + cstate->routine = CopyToGetRoutine(cstate->opts); + /* Process the source/target relation or query */ if (rel) { @@ -775,27 +1075,10 @@ DoCopyTo(CopyToState cstate) foreach(cur, cstate->attnumlist) { int attnum = lfirst_int(cur); - Oid out_func_oid; - bool isvarlena; Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1); - if (cstate->opts.binary) - { - getTypeBinaryOutputInfo(attr->atttypid, - &out_func_oid, - &isvarlena); - fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]); - } - else if (cstate->routine) - cstate->routine->CopyToOutFunc(cstate, attr->atttypid, - &cstate->out_functions[attnum - 1]); - else - { - getTypeOutputInfo(attr->atttypid, - &out_func_oid, - &isvarlena); - fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]); - } + cstate->routine->CopyToOutFunc(cstate, attr->atttypid, + &cstate->out_functions[attnum - 1]); } /* @@ -808,58 +1091,7 @@ DoCopyTo(CopyToState cstate) "COPY TO", ALLOCSET_DEFAULT_SIZES); - if (cstate->opts.binary) - { - /* Generate header for a binary copy */ - int32 tmp; - - /* Signature */ - CopySendData(cstate, BinarySignature, 11); - /* Flags field */ - tmp = 0; - CopySendInt32(cstate, tmp); - /* No header extension */ - tmp = 0; - CopySendInt32(cstate, tmp); - } - else if (cstate->routine) - cstate->routine->CopyToStart(cstate, tupDesc); - else - { - /* - * For non-binary copy, we need to convert null_print to file - * encoding, because it will be sent directly with CopySendString. - */ - if (cstate->need_transcoding) - cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print, - cstate->opts.null_print_len, - cstate->file_encoding); - - /* if a header has been requested send the line */ - if (cstate->opts.header_line) - { - bool hdr_delim = false; - - foreach(cur, cstate->attnumlist) - { - int attnum = lfirst_int(cur); - char *colname; - - if (hdr_delim) - CopySendChar(cstate, cstate->opts.delim[0]); - hdr_delim = true; - - colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname); - - if (cstate->opts.csv_mode) - CopyAttributeOutCSV(cstate, colname, false); - else - CopyAttributeOutText(cstate, colname); - } - - CopySendEndOfRow(cstate); - } - } + cstate->routine->CopyToStart(cstate, tupDesc); if (cstate->rel) { @@ -898,15 +1130,7 @@ DoCopyTo(CopyToState cstate) processed = ((DR_copy *) cstate->queryDesc->dest)->processed; } - if (cstate->opts.binary) - { - /* Generate trailer for a binary copy */ - CopySendInt16(cstate, -1); - /* Need to flush out the trailer */ - CopySendEndOfRow(cstate); - } - else if (cstate->routine) - cstate->routine->CopyToEnd(cstate); + cstate->routine->CopyToEnd(cstate); MemoryContextDelete(cstate->rowcontext); @@ -922,7 +1146,6 @@ DoCopyTo(CopyToState cstate) static void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot) { - FmgrInfo *out_functions = cstate->out_functions; MemoryContext oldcontext; MemoryContextReset(cstate->rowcontext); @@ -931,69 +1154,7 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot) /* Make sure the tuple is fully deconstructed */ slot_getallattrs(slot); - if (cstate->routine) - { - cstate->routine->CopyToOneRow(cstate, slot); - MemoryContextSwitchTo(oldcontext); - return; - } - - if (cstate->opts.binary) - { - /* Binary per-tuple header */ - CopySendInt16(cstate, list_length(cstate->attnumlist)); - } - - if (!cstate->opts.binary) - { - bool need_delim = false; - - foreach_int(attnum, cstate->attnumlist) - { - Datum value = slot->tts_values[attnum - 1]; - bool isnull = slot->tts_isnull[attnum - 1]; - char *string; - - if (need_delim) - CopySendChar(cstate, cstate->opts.delim[0]); - need_delim = true; - - if (isnull) - CopySendString(cstate, cstate->opts.null_print_client); - else - { - string = OutputFunctionCall(&out_functions[attnum - 1], - value); - if (cstate->opts.csv_mode) - CopyAttributeOutCSV(cstate, string, - cstate->opts.force_quote_flags[attnum - 1]); - else - CopyAttributeOutText(cstate, string); - } - } - } - else - { - foreach_int(attnum, cstate->attnumlist) - { - Datum value = slot->tts_values[attnum - 1]; - bool isnull = slot->tts_isnull[attnum - 1]; - bytea *outputbytes; - - if (isnull) - CopySendInt32(cstate, -1); - else - { - outputbytes = SendFunctionCall(&out_functions[attnum - 1], - value); - CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ); - CopySendData(cstate, VARDATA(outputbytes), - VARSIZE(outputbytes) - VARHDRSZ); - } - } - } - - CopySendEndOfRow(cstate); + cstate->routine->CopyToOneRow(cstate, slot); MemoryContextSwitchTo(oldcontext); } -- 2.45.2 From aadcd4bb260d14fc40ab69178fcc85271307b4a9 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Sat, 28 Sep 2024 23:40:54 +0900 Subject: [PATCH v23 03/10] Add support for adding custom COPY TO format This uses the handler approach like tablesample. The approach creates an internal function that returns an internal struct. In this case, a COPY TO handler returns a CopyToRoutine. This also add a test module for custom COPY TO handler. --- src/backend/commands/copy.c | 82 ++++++++++++++++--- src/backend/commands/copyto.c | 4 +- src/backend/nodes/Makefile | 1 + src/backend/nodes/gen_node_support.pl | 2 + src/backend/utils/adt/pseudotypes.c | 1 + src/include/catalog/pg_proc.dat | 6 ++ src/include/catalog/pg_type.dat | 6 ++ src/include/commands/copy.h | 1 + src/include/commands/copyapi.h | 2 + src/include/nodes/meson.build | 1 + src/test/modules/Makefile | 1 + src/test/modules/meson.build | 1 + src/test/modules/test_copy_format/.gitignore | 4 + src/test/modules/test_copy_format/Makefile | 23 ++++++ .../expected/test_copy_format.out | 17 ++++ src/test/modules/test_copy_format/meson.build | 33 ++++++++ .../test_copy_format/sql/test_copy_format.sql | 5 ++ .../test_copy_format--1.0.sql | 8 ++ .../test_copy_format/test_copy_format.c | 63 ++++++++++++++ .../test_copy_format/test_copy_format.control | 4 + 20 files changed, 251 insertions(+), 14 deletions(-) create mode 100644 src/test/modules/test_copy_format/.gitignore create mode 100644 src/test/modules/test_copy_format/Makefile create mode 100644 src/test/modules/test_copy_format/expected/test_copy_format.out create mode 100644 src/test/modules/test_copy_format/meson.build create mode 100644 src/test/modules/test_copy_format/sql/test_copy_format.sql create mode 100644 src/test/modules/test_copy_format/test_copy_format--1.0.sql create mode 100644 src/test/modules/test_copy_format/test_copy_format.c create mode 100644 src/test/modules/test_copy_format/test_copy_format.control diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index 3485ba8663f..02528fbcc1f 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -32,6 +32,7 @@ #include "parser/parse_coerce.h" #include "parser/parse_collate.h" #include "parser/parse_expr.h" +#include "parser/parse_func.h" #include "parser/parse_relation.h" #include "utils/acl.h" #include "utils/builtins.h" @@ -462,6 +463,73 @@ defGetCopyLogVerbosityChoice(DefElem *def, ParseState *pstate) return COPY_LOG_VERBOSITY_DEFAULT; /* keep compiler quiet */ } +/* + * Process the "format" option. + * + * This function checks whether the option value is a built-in format such as + * "text" and "csv" or not. If the option value isn't a built-in format, this + * function finds a COPY format handler that returns a CopyToRoutine (for + * is_from == false). If no COPY format handler is found, this function + * reports an error. + */ +static void +ProcessCopyOptionFormat(ParseState *pstate, + CopyFormatOptions *opts_out, + bool is_from, + DefElem *defel) +{ + char *format; + Oid funcargtypes[1]; + Oid handlerOid = InvalidOid; + Datum datum; + Node *routine; + + format = defGetString(defel); + + /* built-in formats */ + if (strcmp(format, "text") == 0) + /* default format */ return; + else if (strcmp(format, "csv") == 0) + { + opts_out->csv_mode = true; + return; + } + else if (strcmp(format, "binary") == 0) + { + opts_out->binary = true; + return; + } + + /* custom format */ + if (!is_from) + { + funcargtypes[0] = INTERNALOID; + handlerOid = LookupFuncName(list_make1(makeString(format)), 1, + funcargtypes, true); + } + if (!OidIsValid(handlerOid)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("COPY format \"%s\" not recognized", format), + parser_errposition(pstate, defel->location))); + + datum = OidFunctionCall1(handlerOid, BoolGetDatum(is_from)); + routine = (Node *) DatumGetPointer(datum); + if (routine == NULL || !IsA(routine, CopyToRoutine)) + ereport( + ERROR, + (errcode( + ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("COPY handler function " + "%s(%u) did not return a " + "CopyToRoutine struct", + format, handlerOid), + parser_errposition( + pstate, defel->location))); + + opts_out->routine = routine; +} + /* * Process the statement option list for COPY. * @@ -505,22 +573,10 @@ ProcessCopyOptions(ParseState *pstate, if (strcmp(defel->defname, "format") == 0) { - char *fmt = defGetString(defel); - if (format_specified) errorConflictingDefElem(defel, pstate); format_specified = true; - if (strcmp(fmt, "text") == 0) - /* default format */ ; - else if (strcmp(fmt, "csv") == 0) - opts_out->csv_mode = true; - else if (strcmp(fmt, "binary") == 0) - opts_out->binary = true; - else - ereport(ERROR, - (errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("COPY format \"%s\" not recognized", fmt), - parser_errposition(pstate, defel->location))); + ProcessCopyOptionFormat(pstate, opts_out, is_from, defel); } else if (strcmp(defel->defname, "freeze") == 0) { diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c index 46f3507a8b5..1f1d2baf9be 100644 --- a/src/backend/commands/copyto.c +++ b/src/backend/commands/copyto.c @@ -431,7 +431,9 @@ static const CopyToRoutine CopyToRoutineBinary = { static const CopyToRoutine * CopyToGetRoutine(CopyFormatOptions opts) { - if (opts.csv_mode) + if (opts.routine) + return (const CopyToRoutine *) opts.routine; + else if (opts.csv_mode) return &CopyToRoutineCSV; else if (opts.binary) return &CopyToRoutineBinary; diff --git a/src/backend/nodes/Makefile b/src/backend/nodes/Makefile index 66bbad8e6e0..173ee11811c 100644 --- a/src/backend/nodes/Makefile +++ b/src/backend/nodes/Makefile @@ -49,6 +49,7 @@ node_headers = \ access/sdir.h \ access/tableam.h \ access/tsmapi.h \ + commands/copyapi.h \ commands/event_trigger.h \ commands/trigger.h \ executor/tuptable.h \ diff --git a/src/backend/nodes/gen_node_support.pl b/src/backend/nodes/gen_node_support.pl index 81df3bdf95f..428ab4f0d93 100644 --- a/src/backend/nodes/gen_node_support.pl +++ b/src/backend/nodes/gen_node_support.pl @@ -61,6 +61,7 @@ my @all_input_files = qw( access/sdir.h access/tableam.h access/tsmapi.h + commands/copyapi.h commands/event_trigger.h commands/trigger.h executor/tuptable.h @@ -85,6 +86,7 @@ my @nodetag_only_files = qw( access/sdir.h access/tableam.h access/tsmapi.h + commands/copyapi.h commands/event_trigger.h commands/trigger.h executor/tuptable.h diff --git a/src/backend/utils/adt/pseudotypes.c b/src/backend/utils/adt/pseudotypes.c index e189e9b79d2..25f24ab95d2 100644 --- a/src/backend/utils/adt/pseudotypes.c +++ b/src/backend/utils/adt/pseudotypes.c @@ -370,6 +370,7 @@ PSEUDOTYPE_DUMMY_IO_FUNCS(fdw_handler); PSEUDOTYPE_DUMMY_IO_FUNCS(table_am_handler); PSEUDOTYPE_DUMMY_IO_FUNCS(index_am_handler); PSEUDOTYPE_DUMMY_IO_FUNCS(tsm_handler); +PSEUDOTYPE_DUMMY_IO_FUNCS(copy_handler); PSEUDOTYPE_DUMMY_IO_FUNCS(internal); PSEUDOTYPE_DUMMY_IO_FUNCS(anyelement); PSEUDOTYPE_DUMMY_IO_FUNCS(anynonarray); diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index f23321a41f1..6af90a26374 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -7761,6 +7761,12 @@ { oid => '3312', descr => 'I/O', proname => 'tsm_handler_out', prorettype => 'cstring', proargtypes => 'tsm_handler', prosrc => 'tsm_handler_out' }, +{ oid => '8753', descr => 'I/O', + proname => 'copy_handler_in', proisstrict => 'f', prorettype => 'copy_handler', + proargtypes => 'cstring', prosrc => 'copy_handler_in' }, +{ oid => '8754', descr => 'I/O', + proname => 'copy_handler_out', prorettype => 'cstring', + proargtypes => 'copy_handler', prosrc => 'copy_handler_out' }, { oid => '267', descr => 'I/O', proname => 'table_am_handler_in', proisstrict => 'f', prorettype => 'table_am_handler', proargtypes => 'cstring', diff --git a/src/include/catalog/pg_type.dat b/src/include/catalog/pg_type.dat index ceff66ccde1..793dd671935 100644 --- a/src/include/catalog/pg_type.dat +++ b/src/include/catalog/pg_type.dat @@ -633,6 +633,12 @@ typcategory => 'P', typinput => 'tsm_handler_in', typoutput => 'tsm_handler_out', typreceive => '-', typsend => '-', typalign => 'i' }, +{ oid => '8752', + descr => 'pseudo-type for the result of a copy to method function', + typname => 'copy_handler', typlen => '4', typbyval => 't', typtype => 'p', + typcategory => 'P', typinput => 'copy_handler_in', + typoutput => 'copy_handler_out', typreceive => '-', typsend => '-', + typalign => 'i' }, { oid => '269', descr => 'pseudo-type for the result of a table AM handler function', typname => 'table_am_handler', typlen => '4', typbyval => 't', typtype => 'p', diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h index 4002a7f5382..7659d8ae32f 100644 --- a/src/include/commands/copy.h +++ b/src/include/commands/copy.h @@ -87,6 +87,7 @@ typedef struct CopyFormatOptions CopyLogVerbosityChoice log_verbosity; /* verbosity of logged messages */ int64 reject_limit; /* maximum tolerable number of errors */ List *convert_select; /* list of column names (can be NIL) */ + Node *routine; /* CopyToRoutine (can be NULL) */ } CopyFormatOptions; /* These are private in commands/copy[from|to].c */ diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index 5ce24f195dc..05b7d92ddba 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -26,6 +26,8 @@ typedef struct CopyToStateData *CopyToState; */ typedef struct CopyToRoutine { + NodeTag type; + /* * Called when COPY TO is started to set up the output functions * associated with the relation's attributes reading from. `finfo` can be diff --git a/src/include/nodes/meson.build b/src/include/nodes/meson.build index b665e55b657..103df1a7873 100644 --- a/src/include/nodes/meson.build +++ b/src/include/nodes/meson.build @@ -11,6 +11,7 @@ node_support_input_i = [ 'access/sdir.h', 'access/tableam.h', 'access/tsmapi.h', + 'commands/copyapi.h', 'commands/event_trigger.h', 'commands/trigger.h', 'executor/tuptable.h', diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile index c0d3cf0e14b..33e3a49a4fb 100644 --- a/src/test/modules/Makefile +++ b/src/test/modules/Makefile @@ -15,6 +15,7 @@ SUBDIRS = \ spgist_name_ops \ test_bloomfilter \ test_copy_callbacks \ + test_copy_format \ test_custom_rmgrs \ test_ddl_deparse \ test_dsa \ diff --git a/src/test/modules/meson.build b/src/test/modules/meson.build index c829b619530..75b6ab1b6a9 100644 --- a/src/test/modules/meson.build +++ b/src/test/modules/meson.build @@ -14,6 +14,7 @@ subdir('spgist_name_ops') subdir('ssl_passphrase_callback') subdir('test_bloomfilter') subdir('test_copy_callbacks') +subdir('test_copy_format') subdir('test_custom_rmgrs') subdir('test_ddl_deparse') subdir('test_dsa') diff --git a/src/test/modules/test_copy_format/.gitignore b/src/test/modules/test_copy_format/.gitignore new file mode 100644 index 00000000000..5dcb3ff9723 --- /dev/null +++ b/src/test/modules/test_copy_format/.gitignore @@ -0,0 +1,4 @@ +# Generated subdirectories +/log/ +/results/ +/tmp_check/ diff --git a/src/test/modules/test_copy_format/Makefile b/src/test/modules/test_copy_format/Makefile new file mode 100644 index 00000000000..8497f91624d --- /dev/null +++ b/src/test/modules/test_copy_format/Makefile @@ -0,0 +1,23 @@ +# src/test/modules/test_copy_format/Makefile + +MODULE_big = test_copy_format +OBJS = \ + $(WIN32RES) \ + test_copy_format.o +PGFILEDESC = "test_copy_format - test custom COPY FORMAT" + +EXTENSION = test_copy_format +DATA = test_copy_format--1.0.sql + +REGRESS = test_copy_format + +ifdef USE_PGXS +PG_CONFIG = pg_config +PGXS := $(shell $(PG_CONFIG) --pgxs) +include $(PGXS) +else +subdir = src/test/modules/test_copy_format +top_builddir = ../../../.. +include $(top_builddir)/src/Makefile.global +include $(top_srcdir)/contrib/contrib-global.mk +endif diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out b/src/test/modules/test_copy_format/expected/test_copy_format.out new file mode 100644 index 00000000000..606c78f6878 --- /dev/null +++ b/src/test/modules/test_copy_format/expected/test_copy_format.out @@ -0,0 +1,17 @@ +CREATE EXTENSION test_copy_format; +CREATE TABLE public.test (a smallint, b integer, c bigint); +INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789); +COPY public.test FROM stdin WITH (format 'test_copy_format'); +ERROR: COPY format "test_copy_format" not recognized +LINE 1: COPY public.test FROM stdin WITH (format 'test_copy_format')... + ^ +COPY public.test TO stdout WITH (format 'test_copy_format'); +NOTICE: test_copy_format: is_from=false +NOTICE: CopyToOutFunc: atttypid=21 +NOTICE: CopyToOutFunc: atttypid=23 +NOTICE: CopyToOutFunc: atttypid=20 +NOTICE: CopyToStart: natts=3 +NOTICE: CopyToOneRow: tts_nvalid=3 +NOTICE: CopyToOneRow: tts_nvalid=3 +NOTICE: CopyToOneRow: tts_nvalid=3 +NOTICE: CopyToEnd diff --git a/src/test/modules/test_copy_format/meson.build b/src/test/modules/test_copy_format/meson.build new file mode 100644 index 00000000000..4cefe7b709a --- /dev/null +++ b/src/test/modules/test_copy_format/meson.build @@ -0,0 +1,33 @@ +# Copyright (c) 2024, PostgreSQL Global Development Group + +test_copy_format_sources = files( + 'test_copy_format.c', +) + +if host_system == 'windows' + test_copy_format_sources += rc_lib_gen.process(win32ver_rc, extra_args: [ + '--NAME', 'test_copy_format', + '--FILEDESC', 'test_copy_format - test custom COPY FORMAT',]) +endif + +test_copy_format = shared_module('test_copy_format', + test_copy_format_sources, + kwargs: pg_test_mod_args, +) +test_install_libs += test_copy_format + +test_install_data += files( + 'test_copy_format.control', + 'test_copy_format--1.0.sql', +) + +tests += { + 'name': 'test_copy_format', + 'sd': meson.current_source_dir(), + 'bd': meson.current_build_dir(), + 'regress': { + 'sql': [ + 'test_copy_format', + ], + }, +} diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql b/src/test/modules/test_copy_format/sql/test_copy_format.sql new file mode 100644 index 00000000000..9406b3be3d4 --- /dev/null +++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql @@ -0,0 +1,5 @@ +CREATE EXTENSION test_copy_format; +CREATE TABLE public.test (a smallint, b integer, c bigint); +INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789); +COPY public.test FROM stdin WITH (format 'test_copy_format'); +COPY public.test TO stdout WITH (format 'test_copy_format'); diff --git a/src/test/modules/test_copy_format/test_copy_format--1.0.sql b/src/test/modules/test_copy_format/test_copy_format--1.0.sql new file mode 100644 index 00000000000..d24ea03ce99 --- /dev/null +++ b/src/test/modules/test_copy_format/test_copy_format--1.0.sql @@ -0,0 +1,8 @@ +/* src/test/modules/test_copy_format/test_copy_format--1.0.sql */ + +-- complain if script is sourced in psql, rather than via CREATE EXTENSION +\echo Use "CREATE EXTENSION test_copy_format" to load this file. \quit + +CREATE FUNCTION test_copy_format(internal) + RETURNS copy_handler + AS 'MODULE_PATHNAME' LANGUAGE C; diff --git a/src/test/modules/test_copy_format/test_copy_format.c b/src/test/modules/test_copy_format/test_copy_format.c new file mode 100644 index 00000000000..e064f40473b --- /dev/null +++ b/src/test/modules/test_copy_format/test_copy_format.c @@ -0,0 +1,63 @@ +/*-------------------------------------------------------------------------- + * + * test_copy_format.c + * Code for testing custom COPY format. + * + * Portions Copyright (c) 2024, PostgreSQL Global Development Group + * + * IDENTIFICATION + * src/test/modules/test_copy_format/test_copy_format.c + * + * ------------------------------------------------------------------------- + */ + +#include "postgres.h" + +#include "commands/copyapi.h" +#include "commands/defrem.h" + +PG_MODULE_MAGIC; + +static void +CopyToOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo) +{ + ereport(NOTICE, (errmsg("CopyToOutFunc: atttypid=%d", atttypid))); +} + +static void +CopyToStart(CopyToState cstate, TupleDesc tupDesc) +{ + ereport(NOTICE, (errmsg("CopyToStart: natts=%d", tupDesc->natts))); +} + +static void +CopyToOneRow(CopyToState cstate, TupleTableSlot *slot) +{ + ereport(NOTICE, (errmsg("CopyToOneRow: tts_nvalid=%u", slot->tts_nvalid))); +} + +static void +CopyToEnd(CopyToState cstate) +{ + ereport(NOTICE, (errmsg("CopyToEnd"))); +} + +static const CopyToRoutine CopyToRoutineTestCopyFormat = { + .type = T_CopyToRoutine, + .CopyToOutFunc = CopyToOutFunc, + .CopyToStart = CopyToStart, + .CopyToOneRow = CopyToOneRow, + .CopyToEnd = CopyToEnd, +}; + +PG_FUNCTION_INFO_V1(test_copy_format); +Datum +test_copy_format(PG_FUNCTION_ARGS) +{ + bool is_from = PG_GETARG_BOOL(0); + + ereport(NOTICE, + (errmsg("test_copy_format: is_from=%s", is_from ? "true" : "false"))); + + PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat); +} diff --git a/src/test/modules/test_copy_format/test_copy_format.control b/src/test/modules/test_copy_format/test_copy_format.control new file mode 100644 index 00000000000..f05a6362358 --- /dev/null +++ b/src/test/modules/test_copy_format/test_copy_format.control @@ -0,0 +1,4 @@ +comment = 'Test code for custom COPY format' +default_version = '1.0' +module_pathname = '$libdir/test_copy_format' +relocatable = true -- 2.45.2 From eedabed5e0e1a9f8f9e5591b806c6f606229874a Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Sat, 28 Sep 2024 23:56:36 +0900 Subject: [PATCH v23 04/10] Export CopyToStateData It's for custom COPY TO format handlers implemented as extension. This just moves codes. This doesn't change codes except CopyDest enum values. CopyDest/CopyFrom enum values such as COPY_FILE are conflicted each other. So COPY_DEST_ prefix instead of COPY_ prefix is used for CopyDest enum values. For example, COPY_FILE in CopyDest is renamed to COPY_DEST_FILE. Note that this isn't enough to implement custom COPY TO format handlers as extension. We'll do the followings in a subsequent commit: 1. Add an opaque space for custom COPY TO format handler 2. Export CopySendEndOfRow() to flush buffer --- src/backend/commands/copyto.c | 77 ++---------------- src/include/commands/copy.h | 77 +----------------- src/include/commands/copyapi.h | 137 ++++++++++++++++++++++++++++++++- 3 files changed, 146 insertions(+), 145 deletions(-) diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c index 1f1d2baf9be..fb68f42ce1e 100644 --- a/src/backend/commands/copyto.c +++ b/src/backend/commands/copyto.c @@ -37,67 +37,6 @@ #include "utils/rel.h" #include "utils/snapmgr.h" -/* - * Represents the different dest cases we need to worry about at - * the bottom level - */ -typedef enum CopyDest -{ - COPY_FILE, /* to file (or a piped program) */ - COPY_FRONTEND, /* to frontend */ - COPY_CALLBACK, /* to callback function */ -} CopyDest; - -/* - * This struct contains all the state variables used throughout a COPY TO - * operation. - * - * Multi-byte encodings: all supported client-side encodings encode multi-byte - * characters by having the first byte's high bit set. Subsequent bytes of the - * character can have the high bit not set. When scanning data in such an - * encoding to look for a match to a single-byte (ie ASCII) character, we must - * use the full pg_encoding_mblen() machinery to skip over multibyte - * characters, else we might find a false match to a trailing byte. In - * supported server encodings, there is no possibility of a false match, and - * it's faster to make useless comparisons to trailing bytes than it is to - * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true - * when we have to do it the hard way. - */ -typedef struct CopyToStateData -{ - /* format routine */ - const CopyToRoutine *routine; - - /* low-level state data */ - CopyDest copy_dest; /* type of copy source/destination */ - FILE *copy_file; /* used if copy_dest == COPY_FILE */ - StringInfo fe_msgbuf; /* used for all dests during COPY TO */ - - int file_encoding; /* file or remote side's character encoding */ - bool need_transcoding; /* file encoding diff from server? */ - bool encoding_embeds_ascii; /* ASCII can be non-first byte? */ - - /* parameters from the COPY command */ - Relation rel; /* relation to copy to */ - QueryDesc *queryDesc; /* executable query to copy from */ - List *attnumlist; /* integer list of attnums to copy */ - char *filename; /* filename, or NULL for STDOUT */ - bool is_program; /* is 'filename' a program to popen? */ - copy_data_dest_cb data_dest_cb; /* function for writing data */ - - CopyFormatOptions opts; - Node *whereClause; /* WHERE condition (or NULL) */ - - /* - * Working state - */ - MemoryContext copycontext; /* per-copy execution context */ - - FmgrInfo *out_functions; /* lookup info for output functions */ - MemoryContext rowcontext; /* per-row evaluation context */ - uint64 bytes_processed; /* number of bytes processed so far */ -} CopyToStateData; - /* DestReceiver for COPY (query) TO */ typedef struct { @@ -143,7 +82,7 @@ CopyToTextLikeSendEndOfRow(CopyToState cstate) { switch (cstate->copy_dest) { - case COPY_FILE: + case COPY_DEST_FILE: /* Default line termination depends on platform */ #ifndef WIN32 CopySendChar(cstate, '\n'); @@ -151,7 +90,7 @@ CopyToTextLikeSendEndOfRow(CopyToState cstate) CopySendString(cstate, "\r\n"); #endif break; - case COPY_FRONTEND: + case COPY_DEST_FRONTEND: /* The FE/BE protocol uses \n as newline for all platforms */ CopySendChar(cstate, '\n'); break; @@ -460,7 +399,7 @@ SendCopyBegin(CopyToState cstate) for (i = 0; i < natts; i++) pq_sendint16(&buf, format); /* per-column formats */ pq_endmessage(&buf); - cstate->copy_dest = COPY_FRONTEND; + cstate->copy_dest = COPY_DEST_FRONTEND; } static void @@ -507,7 +446,7 @@ CopySendEndOfRow(CopyToState cstate) switch (cstate->copy_dest) { - case COPY_FILE: + case COPY_DEST_FILE: if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1, cstate->copy_file) != 1 || ferror(cstate->copy_file)) @@ -541,11 +480,11 @@ CopySendEndOfRow(CopyToState cstate) errmsg("could not write to COPY file: %m"))); } break; - case COPY_FRONTEND: + case COPY_DEST_FRONTEND: /* Dump the accumulated row as one CopyData message */ (void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len); break; - case COPY_CALLBACK: + case COPY_DEST_CALLBACK: cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len); break; } @@ -929,12 +868,12 @@ BeginCopyTo(ParseState *pstate, /* See Multibyte encoding comment above */ cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding); - cstate->copy_dest = COPY_FILE; /* default */ + cstate->copy_dest = COPY_DEST_FILE; /* default */ if (data_dest_cb) { progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK; - cstate->copy_dest = COPY_CALLBACK; + cstate->copy_dest = COPY_DEST_CALLBACK; cstate->data_dest_cb = data_dest_cb; } else if (pipe) diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h index 7659d8ae32f..dd645eaa030 100644 --- a/src/include/commands/copy.h +++ b/src/include/commands/copy.h @@ -14,88 +14,15 @@ #ifndef COPY_H #define COPY_H -#include "nodes/execnodes.h" +#include "commands/copyapi.h" #include "nodes/parsenodes.h" #include "parser/parse_node.h" #include "tcop/dest.h" -/* - * Represents whether a header line should be present, and whether it must - * match the actual names (which implies "true"). - */ -typedef enum CopyHeaderChoice -{ - COPY_HEADER_FALSE = 0, - COPY_HEADER_TRUE, - COPY_HEADER_MATCH, -} CopyHeaderChoice; - -/* - * Represents where to save input processing errors. More values to be added - * in the future. - */ -typedef enum CopyOnErrorChoice -{ - COPY_ON_ERROR_STOP = 0, /* immediately throw errors, default */ - COPY_ON_ERROR_IGNORE, /* ignore errors */ -} CopyOnErrorChoice; - -/* - * Represents verbosity of logged messages by COPY command. - */ -typedef enum CopyLogVerbosityChoice -{ - COPY_LOG_VERBOSITY_SILENT = -1, /* logs none */ - COPY_LOG_VERBOSITY_DEFAULT = 0, /* logs no additional messages. As this is - * the default, assign 0 */ - COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */ -} CopyLogVerbosityChoice; - -/* - * A struct to hold COPY options, in a parsed form. All of these are related - * to formatting, except for 'freeze', which doesn't really belong here, but - * it's expedient to parse it along with all the other options. - */ -typedef struct CopyFormatOptions -{ - /* parameters from the COPY command */ - int file_encoding; /* file or remote side's character encoding, - * -1 if not specified */ - bool binary; /* binary format? */ - bool freeze; /* freeze rows on loading? */ - bool csv_mode; /* Comma Separated Value format? */ - CopyHeaderChoice header_line; /* header line? */ - char *null_print; /* NULL marker string (server encoding!) */ - int null_print_len; /* length of same */ - char *null_print_client; /* same converted to file encoding */ - char *default_print; /* DEFAULT marker string */ - int default_print_len; /* length of same */ - char *delim; /* column delimiter (must be 1 byte) */ - char *quote; /* CSV quote char (must be 1 byte) */ - char *escape; /* CSV escape char (must be 1 byte) */ - List *force_quote; /* list of column names */ - bool force_quote_all; /* FORCE_QUOTE *? */ - bool *force_quote_flags; /* per-column CSV FQ flags */ - List *force_notnull; /* list of column names */ - bool force_notnull_all; /* FORCE_NOT_NULL *? */ - bool *force_notnull_flags; /* per-column CSV FNN flags */ - List *force_null; /* list of column names */ - bool force_null_all; /* FORCE_NULL *? */ - bool *force_null_flags; /* per-column CSV FN flags */ - bool convert_selectively; /* do selective binary conversion? */ - CopyOnErrorChoice on_error; /* what to do when error happened */ - CopyLogVerbosityChoice log_verbosity; /* verbosity of logged messages */ - int64 reject_limit; /* maximum tolerable number of errors */ - List *convert_select; /* list of column names (can be NIL) */ - Node *routine; /* CopyToRoutine (can be NULL) */ -} CopyFormatOptions; - -/* These are private in commands/copy[from|to].c */ +/* This is private in commands/copyfrom.c */ typedef struct CopyFromStateData *CopyFromState; -typedef struct CopyToStateData *CopyToState; typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread); -typedef void (*copy_data_dest_cb) (void *data, int len); extern void DoCopy(ParseState *pstate, const CopyStmt *stmt, int stmt_location, int stmt_len, diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index 05b7d92ddba..b6ddb5f6216 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -14,10 +14,82 @@ #ifndef COPYAPI_H #define COPYAPI_H +#include "commands/trigger.h" +#include "executor/execdesc.h" #include "executor/tuptable.h" #include "nodes/execnodes.h" -/* This is private in commands/copyto.c */ +/* + * Represents whether a header line should be present, and whether it must + * match the actual names (which implies "true"). + */ +typedef enum CopyHeaderChoice +{ + COPY_HEADER_FALSE = 0, + COPY_HEADER_TRUE, + COPY_HEADER_MATCH, +} CopyHeaderChoice; + +/* + * Represents where to save input processing errors. More values to be added + * in the future. + */ +typedef enum CopyOnErrorChoice +{ + COPY_ON_ERROR_STOP = 0, /* immediately throw errors, default */ + COPY_ON_ERROR_IGNORE, /* ignore errors */ +} CopyOnErrorChoice; + +/* + * Represents verbosity of logged messages by COPY command. + */ +typedef enum CopyLogVerbosityChoice +{ + COPY_LOG_VERBOSITY_SILENT = -1, /* logs none */ + COPY_LOG_VERBOSITY_DEFAULT = 0, /* logs no additional messages. As this is + * the default, assign 0 */ + COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */ +} CopyLogVerbosityChoice; + +/* + * A struct to hold COPY options, in a parsed form. All of these are related + * to formatting, except for 'freeze', which doesn't really belong here, but + * it's expedient to parse it along with all the other options. + */ +typedef struct CopyFormatOptions +{ + /* parameters from the COPY command */ + int file_encoding; /* file or remote side's character encoding, + * -1 if not specified */ + bool binary; /* binary format? */ + bool freeze; /* freeze rows on loading? */ + bool csv_mode; /* Comma Separated Value format? */ + CopyHeaderChoice header_line; /* header line? */ + char *null_print; /* NULL marker string (server encoding!) */ + int null_print_len; /* length of same */ + char *null_print_client; /* same converted to file encoding */ + char *default_print; /* DEFAULT marker string */ + int default_print_len; /* length of same */ + char *delim; /* column delimiter (must be 1 byte) */ + char *quote; /* CSV quote char (must be 1 byte) */ + char *escape; /* CSV escape char (must be 1 byte) */ + List *force_quote; /* list of column names */ + bool force_quote_all; /* FORCE_QUOTE *? */ + bool *force_quote_flags; /* per-column CSV FQ flags */ + List *force_notnull; /* list of column names */ + bool force_notnull_all; /* FORCE_NOT_NULL *? */ + bool *force_notnull_flags; /* per-column CSV FNN flags */ + List *force_null; /* list of column names */ + bool force_null_all; /* FORCE_NULL *? */ + bool *force_null_flags; /* per-column CSV FN flags */ + bool convert_selectively; /* do selective binary conversion? */ + CopyOnErrorChoice on_error; /* what to do when error happened */ + CopyLogVerbosityChoice log_verbosity; /* verbosity of logged messages */ + int64 reject_limit; /* maximum tolerable number of errors */ + List *convert_select; /* list of column names (can be NIL) */ + Node *routine; /* CopyToRoutine (can be NULL) */ +} CopyFormatOptions; + typedef struct CopyToStateData *CopyToState; /* @@ -57,4 +129,67 @@ typedef struct CopyToRoutine void (*CopyToEnd) (CopyToState cstate); } CopyToRoutine; +/* + * Represents the different dest cases we need to worry about at + * the bottom level + */ +typedef enum CopyDest +{ + COPY_DEST_FILE, /* to file (or a piped program) */ + COPY_DEST_FRONTEND, /* to frontend */ + COPY_DEST_CALLBACK, /* to callback function */ +} CopyDest; + +typedef void (*copy_data_dest_cb) (void *data, int len); + +/* + * This struct contains all the state variables used throughout a COPY TO + * operation. + * + * Multi-byte encodings: all supported client-side encodings encode multi-byte + * characters by having the first byte's high bit set. Subsequent bytes of the + * character can have the high bit not set. When scanning data in such an + * encoding to look for a match to a single-byte (ie ASCII) character, we must + * use the full pg_encoding_mblen() machinery to skip over multibyte + * characters, else we might find a false match to a trailing byte. In + * supported server encodings, there is no possibility of a false match, and + * it's faster to make useless comparisons to trailing bytes than it is to + * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true + * when we have to do it the hard way. + */ +typedef struct CopyToStateData +{ + /* format routine */ + const CopyToRoutine *routine; + + /* low-level state data */ + CopyDest copy_dest; /* type of copy source/destination */ + FILE *copy_file; /* used if copy_dest == COPY_FILE */ + StringInfo fe_msgbuf; /* used for all dests during COPY TO */ + + int file_encoding; /* file or remote side's character encoding */ + bool need_transcoding; /* file encoding diff from server? */ + bool encoding_embeds_ascii; /* ASCII can be non-first byte? */ + + /* parameters from the COPY command */ + Relation rel; /* relation to copy to */ + QueryDesc *queryDesc; /* executable query to copy from */ + List *attnumlist; /* integer list of attnums to copy */ + char *filename; /* filename, or NULL for STDOUT */ + bool is_program; /* is 'filename' a program to popen? */ + copy_data_dest_cb data_dest_cb; /* function for writing data */ + + CopyFormatOptions opts; + Node *whereClause; /* WHERE condition (or NULL) */ + + /* + * Working state + */ + MemoryContext copycontext; /* per-copy execution context */ + + FmgrInfo *out_functions; /* lookup info for output functions */ + MemoryContext rowcontext; /* per-row evaluation context */ + uint64 bytes_processed; /* number of bytes processed so far */ +} CopyToStateData; + #endif /* COPYAPI_H */ -- 2.45.2 From 90174da9f00b736b7c4900f046fcceae8cce74d5 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Sat, 28 Sep 2024 23:59:34 +0900 Subject: [PATCH v23 05/10] Add support for implementing custom COPY TO format as extension * Add CopyToStateData::opaque that can be used to keep data for custom COPY TO format implementation * Export CopySendEndOfRow() to flush data in CopyToStateData::fe_msgbuf as CopyToStateFlush() --- src/backend/commands/copyto.c | 14 ++++++++++++++ src/include/commands/copyapi.h | 5 +++++ 2 files changed, 19 insertions(+) diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c index fb68f42ce1e..93b041352c5 100644 --- a/src/backend/commands/copyto.c +++ b/src/backend/commands/copyto.c @@ -496,6 +496,20 @@ CopySendEndOfRow(CopyToState cstate) resetStringInfo(fe_msgbuf); } +/* + * CopyToStateFlush + * + * Export CopySendEndOfRow() for extensions. We want to keep + * CopySendEndOfRow() as a static function for + * optimization. CopySendEndOfRow() calls in this file may be optimized by a + * compiler. + */ +void +CopyToStateFlush(CopyToState cstate) +{ + CopySendEndOfRow(cstate); +} + /* * These functions do apply some data conversion */ diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index b6ddb5f6216..310a37ba728 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -190,6 +190,11 @@ typedef struct CopyToStateData FmgrInfo *out_functions; /* lookup info for output functions */ MemoryContext rowcontext; /* per-row evaluation context */ uint64 bytes_processed; /* number of bytes processed so far */ + + /* For custom format implementation */ + void *opaque; /* private space */ } CopyToStateData; +extern void CopyToStateFlush(CopyToState cstate); + #endif /* COPYAPI_H */ -- 2.45.2 From e5a3351614bea13c8e5857c5f05bbc656ce5f34d Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Sun, 29 Sep 2024 00:06:20 +0900 Subject: [PATCH v23 06/10] Add CopyFromRoutine This is for implementing custom COPY FROM format. But this is not enough to implement custom COPY FROM format yet. We'll export some APIs to receive data and add "format" option to COPY FROM later. Existing text/csv/binary format implementations don't use CopyFromRoutine for now. We have a patch for it but we defer it. Because there are some mysterious profile results in spite of we get faster runtimes. See [1] for details. [1] https://www.postgresql.org/message-id/ZdbtQJ-p5H1_EDwE%40paquier.xyz Note that this doesn't change existing text/csv/binary format implementations. --- src/backend/commands/copyfrom.c | 24 ++++++++++-- src/backend/commands/copyfromparse.c | 5 +++ src/include/commands/copyapi.h | 47 +++++++++++++++++++++++- src/include/commands/copyfrom_internal.h | 4 ++ src/tools/pgindent/typedefs.list | 1 + 5 files changed, 76 insertions(+), 5 deletions(-) diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index 07cbd5d22b8..909375e81b7 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -1635,12 +1635,22 @@ BeginCopyFrom(ParseState *pstate, /* Fetch the input function and typioparam info */ if (cstate->opts.binary) + { getTypeBinaryInputInfo(att->atttypid, &in_func_oid, &typioparams[attnum - 1]); + fmgr_info(in_func_oid, &in_functions[attnum - 1]); + } + else if (cstate->routine) + cstate->routine->CopyFromInFunc(cstate, att->atttypid, + &in_functions[attnum - 1], + &typioparams[attnum - 1]); + else + { getTypeInputInfo(att->atttypid, &in_func_oid, &typioparams[attnum - 1]); - fmgr_info(in_func_oid, &in_functions[attnum - 1]); + fmgr_info(in_func_oid, &in_functions[attnum - 1]); + } /* Get default info if available */ defexprs[attnum - 1] = NULL; @@ -1780,10 +1790,13 @@ BeginCopyFrom(ParseState *pstate, /* Read and verify binary header */ ReceiveCopyBinaryHeader(cstate); } - - /* create workspace for CopyReadAttributes results */ - if (!cstate->opts.binary) + else if (cstate->routine) { + cstate->routine->CopyFromStart(cstate, tupDesc); + } + else + { + /* create workspace for CopyReadAttributes results */ AttrNumber attr_count = list_length(cstate->attnumlist); cstate->max_fields = attr_count; @@ -1801,6 +1814,9 @@ BeginCopyFrom(ParseState *pstate, void EndCopyFrom(CopyFromState cstate) { + if (cstate->routine) + cstate->routine->CopyFromEnd(cstate); + /* No COPY FROM related resources except memory. */ if (cstate->is_program) { diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index d1d43b53d83..b104e4a9114 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -1003,6 +1003,11 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext, Assert(fieldno == attr_count); } + else if (cstate->routine) + { + if (!cstate->routine->CopyFromOneRow(cstate, econtext, values, nulls)) + return false; + } else { /* binary */ diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index 310a37ba728..81b2f4e5c1f 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -1,7 +1,7 @@ /*------------------------------------------------------------------------- * * copyapi.h - * API for COPY TO handlers + * API for COPY TO/FROM handlers * * * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group @@ -90,6 +90,51 @@ typedef struct CopyFormatOptions Node *routine; /* CopyToRoutine (can be NULL) */ } CopyFormatOptions; +/* This is private in commands/copyfrom.c */ +typedef struct CopyFromStateData *CopyFromState; + +/* + * API structure for a COPY FROM format implementation. Note this must be + * allocated in a server-lifetime manner, typically as a static const struct. + */ +typedef struct CopyFromRoutine +{ + /* + * Called when COPY FROM is started to set up the input functions + * associated with the relation's attributes writing to. `finfo` can be + * optionally filled to provide the catalog information of the input + * function. `typioparam` can be optionally filled to define the OID of + * the type to pass to the input function. `atttypid` is the OID of data + * type used by the relation's attribute. + */ + void (*CopyFromInFunc) (CopyFromState cstate, Oid atttypid, + FmgrInfo *finfo, Oid *typioparam); + + /* + * Called when COPY FROM is started. + * + * `tupDesc` is the tuple descriptor of the relation where the data needs + * to be copied. This can be used for any initialization steps required + * by a format. + */ + void (*CopyFromStart) (CopyFromState cstate, TupleDesc tupDesc); + + /* + * Copy one row to a set of `values` and `nulls` of size tupDesc->natts. + * + * 'econtext' is used to evaluate default expression for each column that + * is either not read from the file or is using the DEFAULT option of COPY + * FROM. It is NULL if no default values are used. + * + * Returns false if there are no more tuples to copy. + */ + bool (*CopyFromOneRow) (CopyFromState cstate, ExprContext *econtext, + Datum *values, bool *nulls); + + /* Called when COPY FROM has ended. */ + void (*CopyFromEnd) (CopyFromState cstate); +} CopyFromRoutine; + typedef struct CopyToStateData *CopyToState; /* diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h index cad52fcc783..509b9e92a18 100644 --- a/src/include/commands/copyfrom_internal.h +++ b/src/include/commands/copyfrom_internal.h @@ -15,6 +15,7 @@ #define COPYFROM_INTERNAL_H #include "commands/copy.h" +#include "commands/copyapi.h" #include "commands/trigger.h" #include "nodes/miscnodes.h" @@ -58,6 +59,9 @@ typedef enum CopyInsertMethod */ typedef struct CopyFromStateData { + /* format routine */ + const CopyFromRoutine *routine; + /* low-level state data */ CopySource copy_src; /* type of copy source */ FILE *copy_file; /* used if copy_src == COPY_FILE */ diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list index 098e7023486..a8422fa4d35 100644 --- a/src/tools/pgindent/typedefs.list +++ b/src/tools/pgindent/typedefs.list @@ -492,6 +492,7 @@ ConvertRowtypeExpr CookedConstraint CopyDest CopyFormatOptions +CopyFromRoutine CopyFromState CopyFromStateData CopyHeaderChoice -- 2.45.2 From 89a406fde3ea0dc013d8747b2130353dd911c4c0 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Sun, 29 Sep 2024 00:09:29 +0900 Subject: [PATCH v23 07/10] Use CopyFromRoutine for the existing formats The existing formats are text, csv and binary. If we find any performance regression by this, we will not merge this to master. This will increase indirect function call costs but this will reduce runtime "if (cstate->opts.binary)" and "if (cstate->opts.csv_mode)" branch costs. This uses an optimization based of static inline function and a constant argument call for cstate->opts.csv_mode. For example, CopyFromTextLikeOneRow() uses this optimization. It accepts the "bool is_csv" argument instead of using cstate->opts.csv_mode in it. CopyFromTextOneRow() calls CopyFromTextLikeOneRow() with false (constant) for "bool is_csv". Compiler will remove "if (is_csv)" branch in it by this optimization. This doesn't change existing logic. This just moves existing codes. --- src/backend/commands/copyfrom.c | 215 ++++++--- src/backend/commands/copyfromparse.c | 530 +++++++++++++---------- src/include/commands/copy.h | 2 - src/include/commands/copyfrom_internal.h | 8 + 4 files changed, 471 insertions(+), 284 deletions(-) diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index 909375e81b7..e6ea9ce1602 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -106,6 +106,157 @@ typedef struct CopyMultiInsertInfo /* non-export function prototypes */ static void ClosePipeFromProgram(CopyFromState cstate); + +/* + * CopyFromRoutine implementations for text and CSV. + */ + +/* + * CopyFromTextLikeInFunc + * + * Assign input function data for a relation's attribute in text/CSV format. + */ +static void +CopyFromTextLikeInFunc(CopyFromState cstate, Oid atttypid, + FmgrInfo *finfo, Oid *typioparam) +{ + Oid func_oid; + + getTypeInputInfo(atttypid, &func_oid, typioparam); + fmgr_info(func_oid, finfo); +} + +/* + * CopyFromTextLikeStart + * + * Start of COPY FROM for text/CSV format. + */ +static void +CopyFromTextLikeStart(CopyFromState cstate, TupleDesc tupDesc) +{ + AttrNumber attr_count; + + /* + * If encoding conversion is needed, we need another buffer to hold the + * converted input data. Otherwise, we can just point input_buf to the + * same buffer as raw_buf. + */ + if (cstate->need_transcoding) + { + cstate->input_buf = (char *) palloc(INPUT_BUF_SIZE + 1); + cstate->input_buf_index = cstate->input_buf_len = 0; + } + else + cstate->input_buf = cstate->raw_buf; + cstate->input_reached_eof = false; + + initStringInfo(&cstate->line_buf); + + /* + * Create workspace for CopyReadAttributes results; used by CSV and text + * format. + */ + attr_count = list_length(cstate->attnumlist); + cstate->max_fields = attr_count; + cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *)); +} + +/* + * CopyFromTextLikeEnd + * + * End of COPY FROM for text/CSV format. + */ +static void +CopyFromTextLikeEnd(CopyFromState cstate) +{ + /* nothing to do */ +} + +/* + * CopyFromRoutine implementation for "binary". + */ + +/* + * CopyFromBinaryInFunc + * + * Assign input function data for a relation's attribute in binary format. + */ +static void +CopyFromBinaryInFunc(CopyFromState cstate, Oid atttypid, + FmgrInfo *finfo, Oid *typioparam) +{ + Oid func_oid; + + getTypeBinaryInputInfo(atttypid, &func_oid, typioparam); + fmgr_info(func_oid, finfo); +} + +/* + * CopyFromBinaryStart + * + * Start of COPY FROM for binary format. + */ +static void +CopyFromBinaryStart(CopyFromState cstate, TupleDesc tupDesc) +{ + /* Read and verify binary header */ + ReceiveCopyBinaryHeader(cstate); +} + +/* + * CopyFromBinaryEnd + * + * End of COPY FROM for binary format. + */ +static void +CopyFromBinaryEnd(CopyFromState cstate) +{ + /* nothing to do */ +} + +/* + * Routines assigned to each format. ++ + * CSV and text share the same implementation, at the exception of the + * per-row callback. + */ +static const CopyFromRoutine CopyFromRoutineText = { + .CopyFromInFunc = CopyFromTextLikeInFunc, + .CopyFromStart = CopyFromTextLikeStart, + .CopyFromOneRow = CopyFromTextOneRow, + .CopyFromEnd = CopyFromTextLikeEnd, +}; + +static const CopyFromRoutine CopyFromRoutineCSV = { + .CopyFromInFunc = CopyFromTextLikeInFunc, + .CopyFromStart = CopyFromTextLikeStart, + .CopyFromOneRow = CopyFromCSVOneRow, + .CopyFromEnd = CopyFromTextLikeEnd, +}; + +static const CopyFromRoutine CopyFromRoutineBinary = { + .CopyFromInFunc = CopyFromBinaryInFunc, + .CopyFromStart = CopyFromBinaryStart, + .CopyFromOneRow = CopyFromBinaryOneRow, + .CopyFromEnd = CopyFromBinaryEnd, +}; + +/* + * Define the COPY FROM routines to use for a format. + */ +static const CopyFromRoutine * +CopyFromGetRoutine(CopyFormatOptions opts) +{ + if (opts.csv_mode) + return &CopyFromRoutineCSV; + else if (opts.binary) + return &CopyFromRoutineBinary; + + /* default is text */ + return &CopyFromRoutineText; +} + + /* * error context callback for COPY FROM * @@ -1396,7 +1547,6 @@ BeginCopyFrom(ParseState *pstate, num_defaults; FmgrInfo *in_functions; Oid *typioparams; - Oid in_func_oid; int *defmap; ExprState **defexprs; MemoryContext oldcontext; @@ -1428,6 +1578,9 @@ BeginCopyFrom(ParseState *pstate, /* Extract options from the statement node tree */ ProcessCopyOptions(pstate, &cstate->opts, true /* is_from */ , options); + /* Set format routine */ + cstate->routine = CopyFromGetRoutine(cstate->opts); + /* Process the target relation */ cstate->rel = rel; @@ -1583,25 +1736,6 @@ BeginCopyFrom(ParseState *pstate, cstate->raw_buf_index = cstate->raw_buf_len = 0; cstate->raw_reached_eof = false; - if (!cstate->opts.binary) - { - /* - * If encoding conversion is needed, we need another buffer to hold - * the converted input data. Otherwise, we can just point input_buf - * to the same buffer as raw_buf. - */ - if (cstate->need_transcoding) - { - cstate->input_buf = (char *) palloc(INPUT_BUF_SIZE + 1); - cstate->input_buf_index = cstate->input_buf_len = 0; - } - else - cstate->input_buf = cstate->raw_buf; - cstate->input_reached_eof = false; - - initStringInfo(&cstate->line_buf); - } - initStringInfo(&cstate->attribute_buf); /* Assign range table and rteperminfos, we'll need them in CopyFrom. */ @@ -1634,23 +1768,9 @@ BeginCopyFrom(ParseState *pstate, continue; /* Fetch the input function and typioparam info */ - if (cstate->opts.binary) - { - getTypeBinaryInputInfo(att->atttypid, - &in_func_oid, &typioparams[attnum - 1]); - fmgr_info(in_func_oid, &in_functions[attnum - 1]); - } - else if (cstate->routine) - cstate->routine->CopyFromInFunc(cstate, att->atttypid, - &in_functions[attnum - 1], - &typioparams[attnum - 1]); - - else - { - getTypeInputInfo(att->atttypid, - &in_func_oid, &typioparams[attnum - 1]); - fmgr_info(in_func_oid, &in_functions[attnum - 1]); - } + cstate->routine->CopyFromInFunc(cstate, att->atttypid, + &in_functions[attnum - 1], + &typioparams[attnum - 1]); /* Get default info if available */ defexprs[attnum - 1] = NULL; @@ -1785,23 +1905,7 @@ BeginCopyFrom(ParseState *pstate, pgstat_progress_update_multi_param(3, progress_cols, progress_vals); - if (cstate->opts.binary) - { - /* Read and verify binary header */ - ReceiveCopyBinaryHeader(cstate); - } - else if (cstate->routine) - { - cstate->routine->CopyFromStart(cstate, tupDesc); - } - else - { - /* create workspace for CopyReadAttributes results */ - AttrNumber attr_count = list_length(cstate->attnumlist); - - cstate->max_fields = attr_count; - cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *)); - } + cstate->routine->CopyFromStart(cstate, tupDesc); MemoryContextSwitchTo(oldcontext); @@ -1814,8 +1918,7 @@ BeginCopyFrom(ParseState *pstate, void EndCopyFrom(CopyFromState cstate) { - if (cstate->routine) - cstate->routine->CopyFromEnd(cstate); + cstate->routine->CopyFromEnd(cstate); /* No COPY FROM related resources except memory. */ if (cstate->is_program) diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index b104e4a9114..0447c4df7e0 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -140,8 +140,8 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0"; /* non-export function prototypes */ -static bool CopyReadLine(CopyFromState cstate); -static bool CopyReadLineText(CopyFromState cstate); +static bool CopyReadLine(CopyFromState cstate, bool is_csv); +static pg_attribute_always_inline bool CopyReadLineText(CopyFromState cstate, bool is_csv); static int CopyReadAttributesText(CopyFromState cstate); static int CopyReadAttributesCSV(CopyFromState cstate); static Datum CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo, @@ -741,8 +741,8 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes) * * NOTE: force_not_null option are not applied to the returned fields. */ -bool -NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields) +static pg_attribute_always_inline bool +NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields, bool is_csv) { int fldct; bool done; @@ -759,13 +759,17 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields) tupDesc = RelationGetDescr(cstate->rel); cstate->cur_lineno++; - done = CopyReadLine(cstate); + done = CopyReadLine(cstate, is_csv); if (cstate->opts.header_line == COPY_HEADER_MATCH) { int fldnum; - if (cstate->opts.csv_mode) + /* + * is_csv will be optimized away by compiler, as argument is + * constant at caller. + */ + if (is_csv) fldct = CopyReadAttributesCSV(cstate); else fldct = CopyReadAttributesText(cstate); @@ -809,7 +813,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields) cstate->cur_lineno++; /* Actually read the line into memory here */ - done = CopyReadLine(cstate); + done = CopyReadLine(cstate, is_csv); /* * EOF at start of line means we're done. If we see EOF after some @@ -819,8 +823,13 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields) if (done && cstate->line_buf.len == 0) return false; - /* Parse the line into de-escaped field values */ - if (cstate->opts.csv_mode) + /* + * Parse the line into de-escaped field values + * + * is_csv will be optimized away by compiler, as argument is constant at + * caller. + */ + if (is_csv) fldct = CopyReadAttributesCSV(cstate); else fldct = CopyReadAttributesText(cstate); @@ -830,6 +839,267 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields) return true; } +/* + * CopyFromTextLikeOneRow + * + * Copy one row to a set of `values` and `nulls` for the text and CSV + * formats. + * + * Workhorse for CopyFromTextOneRow() and CopyFromCSVOneRow(). + */ +static pg_attribute_always_inline bool +CopyFromTextLikeOneRow(CopyFromState cstate, + ExprContext *econtext, + Datum *values, + bool *nulls, + bool is_csv) +{ + TupleDesc tupDesc; + AttrNumber attr_count; + FmgrInfo *in_functions = cstate->in_functions; + Oid *typioparams = cstate->typioparams; + ExprState **defexprs = cstate->defexprs; + char **field_strings; + ListCell *cur; + int fldct; + int fieldno; + char *string; + + tupDesc = RelationGetDescr(cstate->rel); + attr_count = list_length(cstate->attnumlist); + + /* read raw fields in the next line */ + if (!NextCopyFromRawFields(cstate, &field_strings, &fldct, is_csv)) + return false; + + /* check for overflowing fields */ + if (attr_count > 0 && fldct > attr_count) + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("extra data after last expected column"))); + + fieldno = 0; + + /* Loop to read the user attributes on the line. */ + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + int m = attnum - 1; + Form_pg_attribute att = TupleDescAttr(tupDesc, m); + + if (fieldno >= fldct) + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("missing data for column \"%s\"", + NameStr(att->attname)))); + string = field_strings[fieldno++]; + + if (cstate->convert_select_flags && + !cstate->convert_select_flags[m]) + { + /* ignore input field, leaving column as NULL */ + continue; + } + + if (is_csv) + { + if (string == NULL && + cstate->opts.force_notnull_flags[m]) + { + /* + * FORCE_NOT_NULL option is set and column is NULL - convert + * it to the NULL string. + */ + string = cstate->opts.null_print; + } + else if (string != NULL && cstate->opts.force_null_flags[m] + && strcmp(string, cstate->opts.null_print) == 0) + { + /* + * FORCE_NULL option is set and column matches the NULL + * string. It must have been quoted, or otherwise the string + * would already have been set to NULL. Convert it to NULL as + * specified. + */ + string = NULL; + } + } + + cstate->cur_attname = NameStr(att->attname); + cstate->cur_attval = string; + + if (string != NULL) + nulls[m] = false; + + if (cstate->defaults[m]) + { + /* + * The caller must supply econtext and have switched into the + * per-tuple memory context in it. + */ + Assert(econtext != NULL); + Assert(CurrentMemoryContext == econtext->ecxt_per_tuple_memory); + + values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]); + } + + /* + * If ON_ERROR is specified with IGNORE, skip rows with soft errors + */ + else if (!InputFunctionCallSafe(&in_functions[m], + string, + typioparams[m], + att->atttypmod, + (Node *) cstate->escontext, + &values[m])) + { + Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP); + + cstate->num_errors++; + + if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE) + { + /* + * Since we emit line number and column info in the below + * notice message, we suppress error context information other + * than the relation name. + */ + Assert(!cstate->relname_only); + cstate->relname_only = true; + + if (cstate->cur_attval) + { + char *attval; + + attval = CopyLimitPrintoutLength(cstate->cur_attval); + ereport(NOTICE, + errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": \"%s\"", + (unsigned long long) cstate->cur_lineno, + cstate->cur_attname, + attval)); + pfree(attval); + } + else + ereport(NOTICE, + errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": null input", + (unsigned long long) cstate->cur_lineno, + cstate->cur_attname)); + + /* reset relname_only */ + cstate->relname_only = false; + } + + return true; + } + + cstate->cur_attname = NULL; + cstate->cur_attval = NULL; + } + + Assert(fieldno == attr_count); + + return true; +} + + +/* + * CopyFromTextOneRow + * + * Per-row callback for COPY FROM with text format. + */ +bool +CopyFromTextOneRow(CopyFromState cstate, + ExprContext *econtext, + Datum *values, + bool *nulls) +{ + return CopyFromTextLikeOneRow(cstate, econtext, values, nulls, false); +} + +/* + * CopyFromCSVOneRow + * + * Per-row callback for COPY FROM with CSV format. + */ +bool +CopyFromCSVOneRow(CopyFromState cstate, + ExprContext *econtext, + Datum *values, + bool *nulls) +{ + return CopyFromTextLikeOneRow(cstate, econtext, values, nulls, true); +} + +/* + * CopyFromBinaryOneRow + * + * Copy one row to a set of `values` and `nulls` for the binary format. + */ +bool +CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext, + Datum *values, bool *nulls) +{ + TupleDesc tupDesc; + AttrNumber attr_count; + FmgrInfo *in_functions = cstate->in_functions; + Oid *typioparams = cstate->typioparams; + int16 fld_count; + ListCell *cur; + + tupDesc = RelationGetDescr(cstate->rel); + attr_count = list_length(cstate->attnumlist); + + cstate->cur_lineno++; + + if (!CopyGetInt16(cstate, &fld_count)) + { + /* EOF detected (end of file, or protocol-level EOF) */ + return false; + } + + if (fld_count == -1) + { + /* + * Received EOF marker. Wait for the protocol-level EOF, and complain + * if it doesn't come immediately. In COPY FROM STDIN, this ensures + * that we correctly handle CopyFail, if client chooses to send that + * now. When copying from file, we could ignore the rest of the file + * like in text mode, but we choose to be consistent with the COPY + * FROM STDIN case. + */ + char dummy; + + if (CopyReadBinaryData(cstate, &dummy, 1) > 0) + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("received copy data after EOF marker"))); + return false; + } + + if (fld_count != attr_count) + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("row field count is %d, expected %d", + (int) fld_count, attr_count))); + + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + int m = attnum - 1; + Form_pg_attribute att = TupleDescAttr(tupDesc, m); + + cstate->cur_attname = NameStr(att->attname); + values[m] = CopyReadBinaryAttribute(cstate, + &in_functions[m], + typioparams[m], + att->atttypmod, + &nulls[m]); + cstate->cur_attname = NULL; + } + + return true; +} + /* * Read next tuple from file for COPY FROM. Return false if no more tuples. * @@ -847,221 +1117,21 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext, { TupleDesc tupDesc; AttrNumber num_phys_attrs, - attr_count, num_defaults = cstate->num_defaults; - FmgrInfo *in_functions = cstate->in_functions; - Oid *typioparams = cstate->typioparams; int i; int *defmap = cstate->defmap; ExprState **defexprs = cstate->defexprs; tupDesc = RelationGetDescr(cstate->rel); num_phys_attrs = tupDesc->natts; - attr_count = list_length(cstate->attnumlist); /* Initialize all values for row to NULL */ MemSet(values, 0, num_phys_attrs * sizeof(Datum)); MemSet(nulls, true, num_phys_attrs * sizeof(bool)); MemSet(cstate->defaults, false, num_phys_attrs * sizeof(bool)); - if (!cstate->opts.binary) - { - char **field_strings; - ListCell *cur; - int fldct; - int fieldno; - char *string; - - /* read raw fields in the next line */ - if (!NextCopyFromRawFields(cstate, &field_strings, &fldct)) - return false; - - /* check for overflowing fields */ - if (attr_count > 0 && fldct > attr_count) - ereport(ERROR, - (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - errmsg("extra data after last expected column"))); - - fieldno = 0; - - /* Loop to read the user attributes on the line. */ - foreach(cur, cstate->attnumlist) - { - int attnum = lfirst_int(cur); - int m = attnum - 1; - Form_pg_attribute att = TupleDescAttr(tupDesc, m); - - if (fieldno >= fldct) - ereport(ERROR, - (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - errmsg("missing data for column \"%s\"", - NameStr(att->attname)))); - string = field_strings[fieldno++]; - - if (cstate->convert_select_flags && - !cstate->convert_select_flags[m]) - { - /* ignore input field, leaving column as NULL */ - continue; - } - - if (cstate->opts.csv_mode) - { - if (string == NULL && - cstate->opts.force_notnull_flags[m]) - { - /* - * FORCE_NOT_NULL option is set and column is NULL - - * convert it to the NULL string. - */ - string = cstate->opts.null_print; - } - else if (string != NULL && cstate->opts.force_null_flags[m] - && strcmp(string, cstate->opts.null_print) == 0) - { - /* - * FORCE_NULL option is set and column matches the NULL - * string. It must have been quoted, or otherwise the - * string would already have been set to NULL. Convert it - * to NULL as specified. - */ - string = NULL; - } - } - - cstate->cur_attname = NameStr(att->attname); - cstate->cur_attval = string; - - if (string != NULL) - nulls[m] = false; - - if (cstate->defaults[m]) - { - /* - * The caller must supply econtext and have switched into the - * per-tuple memory context in it. - */ - Assert(econtext != NULL); - Assert(CurrentMemoryContext == econtext->ecxt_per_tuple_memory); - - values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]); - } - - /* - * If ON_ERROR is specified with IGNORE, skip rows with soft - * errors - */ - else if (!InputFunctionCallSafe(&in_functions[m], - string, - typioparams[m], - att->atttypmod, - (Node *) cstate->escontext, - &values[m])) - { - Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP); - - cstate->num_errors++; - - if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE) - { - /* - * Since we emit line number and column info in the below - * notice message, we suppress error context information - * other than the relation name. - */ - Assert(!cstate->relname_only); - cstate->relname_only = true; - - if (cstate->cur_attval) - { - char *attval; - - attval = CopyLimitPrintoutLength(cstate->cur_attval); - ereport(NOTICE, - errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": \"%s\"", - (unsigned long long) cstate->cur_lineno, - cstate->cur_attname, - attval)); - pfree(attval); - } - else - ereport(NOTICE, - errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": nullinput", - (unsigned long long) cstate->cur_lineno, - cstate->cur_attname)); - - /* reset relname_only */ - cstate->relname_only = false; - } - - return true; - } - - cstate->cur_attname = NULL; - cstate->cur_attval = NULL; - } - - Assert(fieldno == attr_count); - } - else if (cstate->routine) - { - if (!cstate->routine->CopyFromOneRow(cstate, econtext, values, nulls)) - return false; - } - else - { - /* binary */ - int16 fld_count; - ListCell *cur; - - cstate->cur_lineno++; - - if (!CopyGetInt16(cstate, &fld_count)) - { - /* EOF detected (end of file, or protocol-level EOF) */ - return false; - } - - if (fld_count == -1) - { - /* - * Received EOF marker. Wait for the protocol-level EOF, and - * complain if it doesn't come immediately. In COPY FROM STDIN, - * this ensures that we correctly handle CopyFail, if client - * chooses to send that now. When copying from file, we could - * ignore the rest of the file like in text mode, but we choose to - * be consistent with the COPY FROM STDIN case. - */ - char dummy; - - if (CopyReadBinaryData(cstate, &dummy, 1) > 0) - ereport(ERROR, - (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - errmsg("received copy data after EOF marker"))); - return false; - } - - if (fld_count != attr_count) - ereport(ERROR, - (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - errmsg("row field count is %d, expected %d", - (int) fld_count, attr_count))); - - foreach(cur, cstate->attnumlist) - { - int attnum = lfirst_int(cur); - int m = attnum - 1; - Form_pg_attribute att = TupleDescAttr(tupDesc, m); - - cstate->cur_attname = NameStr(att->attname); - values[m] = CopyReadBinaryAttribute(cstate, - &in_functions[m], - typioparams[m], - att->atttypmod, - &nulls[m]); - cstate->cur_attname = NULL; - } - } + if (!cstate->routine->CopyFromOneRow(cstate, econtext, values, nulls)) + return false; /* * Now compute and insert any defaults available for the columns not @@ -1092,7 +1162,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext, * in the final value of line_buf. */ static bool -CopyReadLine(CopyFromState cstate) +CopyReadLine(CopyFromState cstate, bool is_csv) { bool result; @@ -1100,7 +1170,7 @@ CopyReadLine(CopyFromState cstate) cstate->line_buf_valid = false; /* Parse data and transfer into line_buf */ - result = CopyReadLineText(cstate); + result = CopyReadLineText(cstate, is_csv); if (result) { @@ -1167,8 +1237,8 @@ CopyReadLine(CopyFromState cstate) /* * CopyReadLineText - inner loop of CopyReadLine for text mode */ -static bool -CopyReadLineText(CopyFromState cstate) +static pg_attribute_always_inline bool +CopyReadLineText(CopyFromState cstate, bool is_csv) { char *copy_input_buf; int input_buf_ptr; @@ -1183,7 +1253,11 @@ CopyReadLineText(CopyFromState cstate) char quotec = '\0'; char escapec = '\0'; - if (cstate->opts.csv_mode) + /* + * is_csv will be optimized away by compiler, as argument is constant at + * caller. + */ + if (is_csv) { quotec = cstate->opts.quote[0]; escapec = cstate->opts.escape[0]; @@ -1260,7 +1334,11 @@ CopyReadLineText(CopyFromState cstate) prev_raw_ptr = input_buf_ptr; c = copy_input_buf[input_buf_ptr++]; - if (cstate->opts.csv_mode) + /* + * is_csv will be optimized away by compiler, as argument is constant + * at caller. + */ + if (is_csv) { /* * If character is '\r', we may need to look ahead below. Force @@ -1299,7 +1377,7 @@ CopyReadLineText(CopyFromState cstate) } /* Process \r */ - if (c == '\r' && (!cstate->opts.csv_mode || !in_quote)) + if (c == '\r' && (!is_csv || !in_quote)) { /* Check for \r\n on first line, _and_ handle \r\n. */ if (cstate->eol_type == EOL_UNKNOWN || @@ -1327,10 +1405,10 @@ CopyReadLineText(CopyFromState cstate) if (cstate->eol_type == EOL_CRNL) ereport(ERROR, (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - !cstate->opts.csv_mode ? + !is_csv ? errmsg("literal carriage return found in data") : errmsg("unquoted carriage return found in data"), - !cstate->opts.csv_mode ? + !is_csv ? errhint("Use \"\\r\" to represent carriage return.") : errhint("Use quoted CSV field to represent carriage return."))); @@ -1344,10 +1422,10 @@ CopyReadLineText(CopyFromState cstate) else if (cstate->eol_type == EOL_NL) ereport(ERROR, (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - !cstate->opts.csv_mode ? + !is_csv ? errmsg("literal carriage return found in data") : errmsg("unquoted carriage return found in data"), - !cstate->opts.csv_mode ? + !is_csv ? errhint("Use \"\\r\" to represent carriage return.") : errhint("Use quoted CSV field to represent carriage return."))); /* If reach here, we have found the line terminator */ @@ -1355,15 +1433,15 @@ CopyReadLineText(CopyFromState cstate) } /* Process \n */ - if (c == '\n' && (!cstate->opts.csv_mode || !in_quote)) + if (c == '\n' && (!is_csv || !in_quote)) { if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL) ereport(ERROR, (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - !cstate->opts.csv_mode ? + !is_csv ? errmsg("literal newline found in data") : errmsg("unquoted newline found in data"), - !cstate->opts.csv_mode ? + !is_csv ? errhint("Use \"\\n\" to represent newline.") : errhint("Use quoted CSV field to represent newline."))); cstate->eol_type = EOL_NL; /* in case not set yet */ @@ -1375,7 +1453,7 @@ CopyReadLineText(CopyFromState cstate) * Process backslash, except in CSV mode where backslash is a normal * character. */ - if (c == '\\' && !cstate->opts.csv_mode) + if (c == '\\' && !is_csv) { char c2; diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h index dd645eaa030..e5696839637 100644 --- a/src/include/commands/copy.h +++ b/src/include/commands/copy.h @@ -35,8 +35,6 @@ extern CopyFromState BeginCopyFrom(ParseState *pstate, Relation rel, Node *where extern void EndCopyFrom(CopyFromState cstate); extern bool NextCopyFrom(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls); -extern bool NextCopyFromRawFields(CopyFromState cstate, - char ***fields, int *nfields); extern void CopyFromErrorCallback(void *arg); extern char *CopyLimitPrintoutLength(const char *str); diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h index 509b9e92a18..c11b5ff3cc0 100644 --- a/src/include/commands/copyfrom_internal.h +++ b/src/include/commands/copyfrom_internal.h @@ -187,4 +187,12 @@ typedef struct CopyFromStateData extern void ReceiveCopyBegin(CopyFromState cstate); extern void ReceiveCopyBinaryHeader(CopyFromState cstate); +/* Callbacks for CopyFromRoutine->CopyFromOneRow */ +extern bool CopyFromTextOneRow(CopyFromState cstate, ExprContext *econtext, + Datum *values, bool *nulls); +extern bool CopyFromCSVOneRow(CopyFromState cstate, ExprContext *econtext, + Datum *values, bool *nulls); +extern bool CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext, + Datum *values, bool *nulls); + #endif /* COPYFROM_INTERNAL_H */ -- 2.45.2 From de95d4bd606519df51c8d68b5ab2f3009dfd0cf7 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Sun, 29 Sep 2024 00:17:53 +0900 Subject: [PATCH v23 08/10] Add support for adding custom COPY FROM format This uses the same handler for COPY TO and COPY FROM but uses different routine. This uses CopyToRoutine for COPY TO and CopyFromRoutine for COPY FROM. PostgreSQL calls a COPY TO/FROM handler with "is_from" argument. It's true for COPY FROM and false for COPY TO: copy_handler(true) returns CopyToRoutine copy_handler(false) returns CopyFromRoutine This also add a test module for custom COPY FROM handler. --- src/backend/commands/copy.c | 52 ++++++++++++------- src/backend/commands/copyfrom.c | 4 +- src/include/catalog/pg_type.dat | 2 +- src/include/commands/copyapi.h | 5 +- .../expected/test_copy_format.out | 10 ++-- .../test_copy_format/sql/test_copy_format.sql | 1 + .../test_copy_format/test_copy_format.c | 39 +++++++++++++- 7 files changed, 87 insertions(+), 26 deletions(-) diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index 02528fbcc1f..c8643b2dee7 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -469,8 +469,8 @@ defGetCopyLogVerbosityChoice(DefElem *def, ParseState *pstate) * This function checks whether the option value is a built-in format such as * "text" and "csv" or not. If the option value isn't a built-in format, this * function finds a COPY format handler that returns a CopyToRoutine (for - * is_from == false). If no COPY format handler is found, this function - * reports an error. + * is_from == false) or CopyFromRountine (for is_from == true). If no COPY + * format handler is found, this function reports an error. */ static void ProcessCopyOptionFormat(ParseState *pstate, @@ -501,12 +501,9 @@ ProcessCopyOptionFormat(ParseState *pstate, } /* custom format */ - if (!is_from) - { - funcargtypes[0] = INTERNALOID; - handlerOid = LookupFuncName(list_make1(makeString(format)), 1, - funcargtypes, true); - } + funcargtypes[0] = INTERNALOID; + handlerOid = LookupFuncName(list_make1(makeString(format)), 1, + funcargtypes, true); if (!OidIsValid(handlerOid)) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), @@ -515,17 +512,34 @@ ProcessCopyOptionFormat(ParseState *pstate, datum = OidFunctionCall1(handlerOid, BoolGetDatum(is_from)); routine = (Node *) DatumGetPointer(datum); - if (routine == NULL || !IsA(routine, CopyToRoutine)) - ereport( - ERROR, - (errcode( - ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("COPY handler function " - "%s(%u) did not return a " - "CopyToRoutine struct", - format, handlerOid), - parser_errposition( - pstate, defel->location))); + if (is_from) + { + if (routine == NULL || !IsA(routine, CopyFromRoutine)) + ereport( + ERROR, + (errcode( + ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("COPY handler function " + "%s(%u) did not return a " + "CopyFromRoutine struct", + format, handlerOid), + parser_errposition( + pstate, defel->location))); + } + else + { + if (routine == NULL || !IsA(routine, CopyToRoutine)) + ereport( + ERROR, + (errcode( + ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("COPY handler function " + "%s(%u) did not return a " + "CopyToRoutine struct", + format, handlerOid), + parser_errposition( + pstate, defel->location))); + } opts_out->routine = routine; } diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index e6ea9ce1602..932f1ff4f6e 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -247,7 +247,9 @@ static const CopyFromRoutine CopyFromRoutineBinary = { static const CopyFromRoutine * CopyFromGetRoutine(CopyFormatOptions opts) { - if (opts.csv_mode) + if (opts.routine) + return (const CopyFromRoutine *) opts.routine; + else if (opts.csv_mode) return &CopyFromRoutineCSV; else if (opts.binary) return &CopyFromRoutineBinary; diff --git a/src/include/catalog/pg_type.dat b/src/include/catalog/pg_type.dat index 793dd671935..37ebfa0908f 100644 --- a/src/include/catalog/pg_type.dat +++ b/src/include/catalog/pg_type.dat @@ -634,7 +634,7 @@ typoutput => 'tsm_handler_out', typreceive => '-', typsend => '-', typalign => 'i' }, { oid => '8752', - descr => 'pseudo-type for the result of a copy to method function', + descr => 'pseudo-type for the result of a copy to/from method function', typname => 'copy_handler', typlen => '4', typbyval => 't', typtype => 'p', typcategory => 'P', typinput => 'copy_handler_in', typoutput => 'copy_handler_out', typreceive => '-', typsend => '-', diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index 81b2f4e5c1f..e9c01492797 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -87,7 +87,8 @@ typedef struct CopyFormatOptions CopyLogVerbosityChoice log_verbosity; /* verbosity of logged messages */ int64 reject_limit; /* maximum tolerable number of errors */ List *convert_select; /* list of column names (can be NIL) */ - Node *routine; /* CopyToRoutine (can be NULL) */ + Node *routine; /* CopyToRoutine or CopyFromRoutine (can be + * NULL) */ } CopyFormatOptions; /* This is private in commands/copyfrom.c */ @@ -99,6 +100,8 @@ typedef struct CopyFromStateData *CopyFromState; */ typedef struct CopyFromRoutine { + NodeTag type; + /* * Called when COPY FROM is started to set up the input functions * associated with the relation's attributes writing to. `finfo` can be diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out b/src/test/modules/test_copy_format/expected/test_copy_format.out index 606c78f6878..4ed7c0b12db 100644 --- a/src/test/modules/test_copy_format/expected/test_copy_format.out +++ b/src/test/modules/test_copy_format/expected/test_copy_format.out @@ -2,9 +2,13 @@ CREATE EXTENSION test_copy_format; CREATE TABLE public.test (a smallint, b integer, c bigint); INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789); COPY public.test FROM stdin WITH (format 'test_copy_format'); -ERROR: COPY format "test_copy_format" not recognized -LINE 1: COPY public.test FROM stdin WITH (format 'test_copy_format')... - ^ +NOTICE: test_copy_format: is_from=true +NOTICE: CopyFromInFunc: atttypid=21 +NOTICE: CopyFromInFunc: atttypid=23 +NOTICE: CopyFromInFunc: atttypid=20 +NOTICE: CopyFromStart: natts=3 +NOTICE: CopyFromOneRow +NOTICE: CopyFromEnd COPY public.test TO stdout WITH (format 'test_copy_format'); NOTICE: test_copy_format: is_from=false NOTICE: CopyToOutFunc: atttypid=21 diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql b/src/test/modules/test_copy_format/sql/test_copy_format.sql index 9406b3be3d4..e805f7cb011 100644 --- a/src/test/modules/test_copy_format/sql/test_copy_format.sql +++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql @@ -2,4 +2,5 @@ CREATE EXTENSION test_copy_format; CREATE TABLE public.test (a smallint, b integer, c bigint); INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789); COPY public.test FROM stdin WITH (format 'test_copy_format'); +\. COPY public.test TO stdout WITH (format 'test_copy_format'); diff --git a/src/test/modules/test_copy_format/test_copy_format.c b/src/test/modules/test_copy_format/test_copy_format.c index e064f40473b..f6b105659ab 100644 --- a/src/test/modules/test_copy_format/test_copy_format.c +++ b/src/test/modules/test_copy_format/test_copy_format.c @@ -18,6 +18,40 @@ PG_MODULE_MAGIC; +static void +CopyFromInFunc(CopyFromState cstate, Oid atttypid, + FmgrInfo *finfo, Oid *typioparam) +{ + ereport(NOTICE, (errmsg("CopyFromInFunc: atttypid=%d", atttypid))); +} + +static void +CopyFromStart(CopyFromState cstate, TupleDesc tupDesc) +{ + ereport(NOTICE, (errmsg("CopyFromStart: natts=%d", tupDesc->natts))); +} + +static bool +CopyFromOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls) +{ + ereport(NOTICE, (errmsg("CopyFromOneRow"))); + return false; +} + +static void +CopyFromEnd(CopyFromState cstate) +{ + ereport(NOTICE, (errmsg("CopyFromEnd"))); +} + +static const CopyFromRoutine CopyFromRoutineTestCopyFormat = { + .type = T_CopyFromRoutine, + .CopyFromInFunc = CopyFromInFunc, + .CopyFromStart = CopyFromStart, + .CopyFromOneRow = CopyFromOneRow, + .CopyFromEnd = CopyFromEnd, +}; + static void CopyToOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo) { @@ -59,5 +93,8 @@ test_copy_format(PG_FUNCTION_ARGS) ereport(NOTICE, (errmsg("test_copy_format: is_from=%s", is_from ? "true" : "false"))); - PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat); + if (is_from) + PG_RETURN_POINTER(&CopyFromRoutineTestCopyFormat); + else + PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat); } -- 2.45.2 From 0f4d74f1750467f4ac3b87b24e03271579793651 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Sun, 29 Sep 2024 00:28:02 +0900 Subject: [PATCH v23 09/10] Export CopyFromStateData It's for custom COPY FROM format handlers implemented as extension. This just moves codes. This doesn't change codes except CopySource enum values. This changes COPY_ prefix of CopySource enum values to COPY_SOURCE_ prefix like the CopyDest enum values prefix change. For example, COPY_FILE in CopySource is renamed to COPY_SOURCE_FILE. Note that this isn't enough to implement custom COPY FROM format handlers as extension. We'll do the followings in a subsequent commit: 1. Add an opaque space for custom COPY FROM format handler 2. Export CopyReadBinaryData() to read the next data --- src/backend/commands/copyfrom.c | 4 +- src/backend/commands/copyfromparse.c | 10 +- src/include/commands/copy.h | 5 - src/include/commands/copyapi.h | 168 ++++++++++++++++++++++- src/include/commands/copyfrom_internal.h | 165 ---------------------- 5 files changed, 174 insertions(+), 178 deletions(-) diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index 932f1ff4f6e..d758e66c6a1 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -1716,7 +1716,7 @@ BeginCopyFrom(ParseState *pstate, pg_encoding_to_char(GetDatabaseEncoding())))); } - cstate->copy_src = COPY_FILE; /* default */ + cstate->copy_src = COPY_SOURCE_FILE; /* default */ cstate->whereClause = whereClause; @@ -1844,7 +1844,7 @@ BeginCopyFrom(ParseState *pstate, if (data_source_cb) { progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK; - cstate->copy_src = COPY_CALLBACK; + cstate->copy_src = COPY_SOURCE_CALLBACK; cstate->data_source_cb = data_source_cb; } else if (pipe) diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index 0447c4df7e0..ccfbacb4a37 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -171,7 +171,7 @@ ReceiveCopyBegin(CopyFromState cstate) for (i = 0; i < natts; i++) pq_sendint16(&buf, format); /* per-column formats */ pq_endmessage(&buf); - cstate->copy_src = COPY_FRONTEND; + cstate->copy_src = COPY_SOURCE_FRONTEND; cstate->fe_msgbuf = makeStringInfo(); /* We *must* flush here to ensure FE knows it can send. */ pq_flush(); @@ -239,7 +239,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread) switch (cstate->copy_src) { - case COPY_FILE: + case COPY_SOURCE_FILE: bytesread = fread(databuf, 1, maxread, cstate->copy_file); if (ferror(cstate->copy_file)) ereport(ERROR, @@ -248,7 +248,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread) if (bytesread == 0) cstate->raw_reached_eof = true; break; - case COPY_FRONTEND: + case COPY_SOURCE_FRONTEND: while (maxread > 0 && bytesread < minread && !cstate->raw_reached_eof) { int avail; @@ -331,7 +331,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread) bytesread += avail; } break; - case COPY_CALLBACK: + case COPY_SOURCE_CALLBACK: bytesread = cstate->data_source_cb(databuf, minread, maxread); break; } @@ -1179,7 +1179,7 @@ CopyReadLine(CopyFromState cstate, bool is_csv) * after \. up to the protocol end of copy data. (XXX maybe better * not to treat \. as special?) */ - if (cstate->copy_src == COPY_FRONTEND) + if (cstate->copy_src == COPY_SOURCE_FRONTEND) { int inbytes; diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h index e5696839637..e2411848e9f 100644 --- a/src/include/commands/copy.h +++ b/src/include/commands/copy.h @@ -19,11 +19,6 @@ #include "parser/parse_node.h" #include "tcop/dest.h" -/* This is private in commands/copyfrom.c */ -typedef struct CopyFromStateData *CopyFromState; - -typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread); - extern void DoCopy(ParseState *pstate, const CopyStmt *stmt, int stmt_location, int stmt_len, uint64 *processed); diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index e9c01492797..0274e3487c3 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -91,7 +91,6 @@ typedef struct CopyFormatOptions * NULL) */ } CopyFormatOptions; -/* This is private in commands/copyfrom.c */ typedef struct CopyFromStateData *CopyFromState; /* @@ -138,6 +137,173 @@ typedef struct CopyFromRoutine void (*CopyFromEnd) (CopyFromState cstate); } CopyFromRoutine; +/* + * Represents the different source cases we need to worry about at + * the bottom level + */ +typedef enum CopySource +{ + COPY_SOURCE_FILE, /* from file (or a piped program) */ + COPY_SOURCE_FRONTEND, /* from frontend */ + COPY_SOURCE_CALLBACK, /* from callback function */ +} CopySource; + +/* + * Represents the end-of-line terminator type of the input + */ +typedef enum EolType +{ + EOL_UNKNOWN, + EOL_NL, + EOL_CR, + EOL_CRNL, +} EolType; + +/* + * Represents the insert method to be used during COPY FROM. + */ +typedef enum CopyInsertMethod +{ + CIM_SINGLE, /* use table_tuple_insert or ExecForeignInsert */ + CIM_MULTI, /* always use table_multi_insert or + * ExecForeignBatchInsert */ + CIM_MULTI_CONDITIONAL, /* use table_multi_insert or + * ExecForeignBatchInsert only if valid */ +} CopyInsertMethod; + +typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread); + +/* + * This struct contains all the state variables used throughout a COPY FROM + * operation. + */ +typedef struct CopyFromStateData +{ + /* format routine */ + const CopyFromRoutine *routine; + + /* low-level state data */ + CopySource copy_src; /* type of copy source */ + FILE *copy_file; /* used if copy_src == COPY_FILE */ + StringInfo fe_msgbuf; /* used if copy_src == COPY_FRONTEND */ + + EolType eol_type; /* EOL type of input */ + int file_encoding; /* file or remote side's character encoding */ + bool need_transcoding; /* file encoding diff from server? */ + Oid conversion_proc; /* encoding conversion function */ + + /* parameters from the COPY command */ + Relation rel; /* relation to copy from */ + List *attnumlist; /* integer list of attnums to copy */ + char *filename; /* filename, or NULL for STDIN */ + bool is_program; /* is 'filename' a program to popen? */ + copy_data_source_cb data_source_cb; /* function for reading data */ + + CopyFormatOptions opts; + bool *convert_select_flags; /* per-column CSV/TEXT CS flags */ + Node *whereClause; /* WHERE condition (or NULL) */ + + /* these are just for error messages, see CopyFromErrorCallback */ + const char *cur_relname; /* table name for error messages */ + uint64 cur_lineno; /* line number for error messages */ + const char *cur_attname; /* current att for error messages */ + const char *cur_attval; /* current att value for error messages */ + bool relname_only; /* don't output line number, att, etc. */ + + /* + * Working state + */ + MemoryContext copycontext; /* per-copy execution context */ + + AttrNumber num_defaults; /* count of att that are missing and have + * default value */ + FmgrInfo *in_functions; /* array of input functions for each attrs */ + Oid *typioparams; /* array of element types for in_functions */ + ErrorSaveContext *escontext; /* soft error trapper during in_functions + * execution */ + uint64 num_errors; /* total number of rows which contained soft + * errors */ + int *defmap; /* array of default att numbers related to + * missing att */ + ExprState **defexprs; /* array of default att expressions for all + * att */ + bool *defaults; /* if DEFAULT marker was found for + * corresponding att */ + bool volatile_defexprs; /* is any of defexprs volatile? */ + List *range_table; /* single element list of RangeTblEntry */ + List *rteperminfos; /* single element list of RTEPermissionInfo */ + ExprState *qualexpr; + + TransitionCaptureState *transition_capture; + + /* + * These variables are used to reduce overhead in COPY FROM. + * + * attribute_buf holds the separated, de-escaped text for each field of + * the current line. The CopyReadAttributes functions return arrays of + * pointers into this buffer. We avoid palloc/pfree overhead by re-using + * the buffer on each cycle. + * + * In binary COPY FROM, attribute_buf holds the binary data for the + * current field, but the usage is otherwise similar. + */ + StringInfoData attribute_buf; + + /* field raw data pointers found by COPY FROM */ + + int max_fields; + char **raw_fields; + + /* + * Similarly, line_buf holds the whole input line being processed. The + * input cycle is first to read the whole line into line_buf, and then + * extract the individual attribute fields into attribute_buf. line_buf + * is preserved unmodified so that we can display it in error messages if + * appropriate. (In binary mode, line_buf is not used.) + */ + StringInfoData line_buf; + bool line_buf_valid; /* contains the row being processed? */ + + /* + * input_buf holds input data, already converted to database encoding. + * + * In text mode, CopyReadLine parses this data sufficiently to locate line + * boundaries, then transfers the data to line_buf. We guarantee that + * there is a \0 at input_buf[input_buf_len] at all times. (In binary + * mode, input_buf is not used.) + * + * If encoding conversion is not required, input_buf is not a separate + * buffer but points directly to raw_buf. In that case, input_buf_len + * tracks the number of bytes that have been verified as valid in the + * database encoding, and raw_buf_len is the total number of bytes stored + * in the buffer. + */ +#define INPUT_BUF_SIZE 65536 /* we palloc INPUT_BUF_SIZE+1 bytes */ + char *input_buf; + int input_buf_index; /* next byte to process */ + int input_buf_len; /* total # of bytes stored */ + bool input_reached_eof; /* true if we reached EOF */ + bool input_reached_error; /* true if a conversion error happened */ + /* Shorthand for number of unconsumed bytes available in input_buf */ +#define INPUT_BUF_BYTES(cstate) ((cstate)->input_buf_len - (cstate)->input_buf_index) + + /* + * raw_buf holds raw input data read from the data source (file or client + * connection), not yet converted to the database encoding. Like with + * 'input_buf', we guarantee that there is a \0 at raw_buf[raw_buf_len]. + */ +#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */ + char *raw_buf; + int raw_buf_index; /* next byte to process */ + int raw_buf_len; /* total # of bytes stored */ + bool raw_reached_eof; /* true if we reached EOF */ + + /* Shorthand for number of unconsumed bytes available in raw_buf */ +#define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index) + + uint64 bytes_processed; /* number of bytes processed so far */ +} CopyFromStateData; + typedef struct CopyToStateData *CopyToState; /* diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h index c11b5ff3cc0..3863d26d5b7 100644 --- a/src/include/commands/copyfrom_internal.h +++ b/src/include/commands/copyfrom_internal.h @@ -19,171 +19,6 @@ #include "commands/trigger.h" #include "nodes/miscnodes.h" -/* - * Represents the different source cases we need to worry about at - * the bottom level - */ -typedef enum CopySource -{ - COPY_FILE, /* from file (or a piped program) */ - COPY_FRONTEND, /* from frontend */ - COPY_CALLBACK, /* from callback function */ -} CopySource; - -/* - * Represents the end-of-line terminator type of the input - */ -typedef enum EolType -{ - EOL_UNKNOWN, - EOL_NL, - EOL_CR, - EOL_CRNL, -} EolType; - -/* - * Represents the insert method to be used during COPY FROM. - */ -typedef enum CopyInsertMethod -{ - CIM_SINGLE, /* use table_tuple_insert or ExecForeignInsert */ - CIM_MULTI, /* always use table_multi_insert or - * ExecForeignBatchInsert */ - CIM_MULTI_CONDITIONAL, /* use table_multi_insert or - * ExecForeignBatchInsert only if valid */ -} CopyInsertMethod; - -/* - * This struct contains all the state variables used throughout a COPY FROM - * operation. - */ -typedef struct CopyFromStateData -{ - /* format routine */ - const CopyFromRoutine *routine; - - /* low-level state data */ - CopySource copy_src; /* type of copy source */ - FILE *copy_file; /* used if copy_src == COPY_FILE */ - StringInfo fe_msgbuf; /* used if copy_src == COPY_FRONTEND */ - - EolType eol_type; /* EOL type of input */ - int file_encoding; /* file or remote side's character encoding */ - bool need_transcoding; /* file encoding diff from server? */ - Oid conversion_proc; /* encoding conversion function */ - - /* parameters from the COPY command */ - Relation rel; /* relation to copy from */ - List *attnumlist; /* integer list of attnums to copy */ - char *filename; /* filename, or NULL for STDIN */ - bool is_program; /* is 'filename' a program to popen? */ - copy_data_source_cb data_source_cb; /* function for reading data */ - - CopyFormatOptions opts; - bool *convert_select_flags; /* per-column CSV/TEXT CS flags */ - Node *whereClause; /* WHERE condition (or NULL) */ - - /* these are just for error messages, see CopyFromErrorCallback */ - const char *cur_relname; /* table name for error messages */ - uint64 cur_lineno; /* line number for error messages */ - const char *cur_attname; /* current att for error messages */ - const char *cur_attval; /* current att value for error messages */ - bool relname_only; /* don't output line number, att, etc. */ - - /* - * Working state - */ - MemoryContext copycontext; /* per-copy execution context */ - - AttrNumber num_defaults; /* count of att that are missing and have - * default value */ - FmgrInfo *in_functions; /* array of input functions for each attrs */ - Oid *typioparams; /* array of element types for in_functions */ - ErrorSaveContext *escontext; /* soft error trapper during in_functions - * execution */ - uint64 num_errors; /* total number of rows which contained soft - * errors */ - int *defmap; /* array of default att numbers related to - * missing att */ - ExprState **defexprs; /* array of default att expressions for all - * att */ - bool *defaults; /* if DEFAULT marker was found for - * corresponding att */ - bool volatile_defexprs; /* is any of defexprs volatile? */ - List *range_table; /* single element list of RangeTblEntry */ - List *rteperminfos; /* single element list of RTEPermissionInfo */ - ExprState *qualexpr; - - TransitionCaptureState *transition_capture; - - /* - * These variables are used to reduce overhead in COPY FROM. - * - * attribute_buf holds the separated, de-escaped text for each field of - * the current line. The CopyReadAttributes functions return arrays of - * pointers into this buffer. We avoid palloc/pfree overhead by re-using - * the buffer on each cycle. - * - * In binary COPY FROM, attribute_buf holds the binary data for the - * current field, but the usage is otherwise similar. - */ - StringInfoData attribute_buf; - - /* field raw data pointers found by COPY FROM */ - - int max_fields; - char **raw_fields; - - /* - * Similarly, line_buf holds the whole input line being processed. The - * input cycle is first to read the whole line into line_buf, and then - * extract the individual attribute fields into attribute_buf. line_buf - * is preserved unmodified so that we can display it in error messages if - * appropriate. (In binary mode, line_buf is not used.) - */ - StringInfoData line_buf; - bool line_buf_valid; /* contains the row being processed? */ - - /* - * input_buf holds input data, already converted to database encoding. - * - * In text mode, CopyReadLine parses this data sufficiently to locate line - * boundaries, then transfers the data to line_buf. We guarantee that - * there is a \0 at input_buf[input_buf_len] at all times. (In binary - * mode, input_buf is not used.) - * - * If encoding conversion is not required, input_buf is not a separate - * buffer but points directly to raw_buf. In that case, input_buf_len - * tracks the number of bytes that have been verified as valid in the - * database encoding, and raw_buf_len is the total number of bytes stored - * in the buffer. - */ -#define INPUT_BUF_SIZE 65536 /* we palloc INPUT_BUF_SIZE+1 bytes */ - char *input_buf; - int input_buf_index; /* next byte to process */ - int input_buf_len; /* total # of bytes stored */ - bool input_reached_eof; /* true if we reached EOF */ - bool input_reached_error; /* true if a conversion error happened */ - /* Shorthand for number of unconsumed bytes available in input_buf */ -#define INPUT_BUF_BYTES(cstate) ((cstate)->input_buf_len - (cstate)->input_buf_index) - - /* - * raw_buf holds raw input data read from the data source (file or client - * connection), not yet converted to the database encoding. Like with - * 'input_buf', we guarantee that there is a \0 at raw_buf[raw_buf_len]. - */ -#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */ - char *raw_buf; - int raw_buf_index; /* next byte to process */ - int raw_buf_len; /* total # of bytes stored */ - bool raw_reached_eof; /* true if we reached EOF */ - - /* Shorthand for number of unconsumed bytes available in raw_buf */ -#define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index) - - uint64 bytes_processed; /* number of bytes processed so far */ -} CopyFromStateData; - extern void ReceiveCopyBegin(CopyFromState cstate); extern void ReceiveCopyBinaryHeader(CopyFromState cstate); -- 2.45.2 From f9908c010646fdf182615a2e3632395ae9d9c4f3 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Sun, 29 Sep 2024 00:32:31 +0900 Subject: [PATCH v23 10/10] Add support for implementing custom COPY FROM format as extension * Add CopyFromStateData::opaque that can be used to keep data for custom COPY From format implementation * Export CopyReadBinaryData() to read the next data as CopyFromStateRead() --- src/backend/commands/copyfromparse.c | 14 ++++++++++++++ src/include/commands/copyapi.h | 6 ++++++ 2 files changed, 20 insertions(+) diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index ccfbacb4a37..4fa23d992f5 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -730,6 +730,20 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes) return copied_bytes; } +/* + * CopyFromStateRead + * + * Export CopyReadBinaryData() for extensions. We want to keep + * CopyReadBinaryData() as a static function for + * optimization. CopyReadBinaryData() calls in this file may be optimized by + * a compiler. + */ +int +CopyFromStateRead(CopyFromState cstate, char *dest, int nbytes) +{ + return CopyReadBinaryData(cstate, dest, nbytes); +} + /* * Read raw fields in the next line for COPY FROM in text or csv mode. * Return false if no more lines. diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index 0274e3487c3..2de610ef729 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -302,8 +302,14 @@ typedef struct CopyFromStateData #define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index) uint64 bytes_processed; /* number of bytes processed so far */ + + /* For custom format implementation */ + void *opaque; /* private space */ } CopyFromStateData; +extern int CopyFromStateRead(CopyFromState cstate, char *dest, int nbytes); + + typedef struct CopyToStateData *CopyToState; /* -- 2.45.2
On Wed, Nov 13, 2024 at 11:19 PM Sutou Kouhei <kou@clear-code.com> wrote: > > Hi, > > In <20241105.174328.1705956947135248653.kou@clear-code.com> > "Re: Make COPY format extendable: Extract COPY TO format implementations" on Tue, 05 Nov 2024 17:43:28 +0900 (JST), > Sutou Kouhei <kou@clear-code.com> wrote: > > >> I've further investigated the performance regression, and found out it > >> might be relevant that the compiler doesn't inline the > >> CopyFromTextLikeOneRow() function. It might be worth testing with > >> pg_attribute_always_inline instead of 'inline' as below: > > > > Wow! Good catch! > > > > I've rebased on the current master and updated the v20 and > > v21 patch sets with "pg_attribute_always_inline" not > > "inline". > > > > The v22 patch set is for the v20 patch set. > > (TO/FROM changes are in one commit.) > > > > The v23 patch set is for the v21 patch set. > > (TO/FROM changes are separated for easy to merge only FROM > > or TO part.) > > I've run benchmark on my machine that has "Intel(R) Core(TM) > i7-3770 CPU @ 3.40GHz". > > Summary: > * "pg_attribute_always_inline" is effective for the "COPY > FROM" part > * "pg_attribute_always_inline" may not be needed for the > "COPY TO" part > > > v20-result.pdf: This is the same result PDF attached in > https://www.postgresql.org/message-id/20241008.173918.995935870630354246.kou%40clear-code.com > . This is the base line for "pg_attribute_always_inline" > change. > > v22-result.pdf: This is a new result PDF for the v22 patch > set. > > COPY FROM: > > 0001: The v22 patch set is slower than HEAD. This just > introduces "routine" abstraction. It increases overhead. So > this is expected. > > 0002-0005: The v22 patch set is faster than HEAD for all > cases. The v20 patch set is slower than HEAD for smaller > data. This shows that "pg_attribute_always_inline" for the > "COPY FROM" part is effective on my machine too. > > > COPY TO: > > 0001: The v22 patch set is slower than HEAD. This is > as expected for the same reason as COPY FROM. > > 0002-0004: The v22 patch set is slower than HEAD. (The v20 > patch set is faster than HEAD.) This is not expected. > > 0005: The v22 patch set is faster than HEAD. This is > expected. But 0005 just exports some functions. It doesn't > change existing logic. So it's strange... > > This shows "pg_attribute_always_inline" is needless for the > "COPY TO" part. > > > I also tried the v24 patch set: > * The "COPY FROM" part is same as the v22 patch set > ("pg_attribute_always_inline" is used.) > * The "COPY TO" part is same as the v20 patch set > ("pg_attribute_always_inline" is NOT used.) > > > (I think that the v24 patch set isn't useful for others. So > I don't share it here. If you're interested in it, I'll > share it here.) > > v24-result.pdf: > > COPY FROM: The same trend as the v22 patch set result. It's > expected because the "COPY FROM" part is the same as the v22 > patch set. > > COPY TO: The v24 patch set is faster than the v22 patch set > but the v24 patch set isn't same trend as the v20 patch > set. This is not expected because the "COPY TO" part is the > same as the v20 patch set. > > > Anyway, the 0005 "COPY TO" parts are always faster than > HEAD. So we can use either "pg_attribute_always_inline" or > "inline". > > > Summary: > * "pg_attribute_always_inline" is effective for the "COPY > FROM" part > * "pg_attribute_always_inline" may not be needed for the > "COPY TO" part > > > Can we proceed this proposal with these results? Or should > we wait for more benchmark results? Thank you for sharing the benchmark test results! I think these results are good for us to proceed. I'll closely look at COPY TO results and review v22 patch sets. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
On Thu, Nov 14, 2024 at 4:04 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Wed, Nov 13, 2024 at 11:19 PM Sutou Kouhei <kou@clear-code.com> wrote: > > > > Hi, > > > > In <20241105.174328.1705956947135248653.kou@clear-code.com> > > "Re: Make COPY format extendable: Extract COPY TO format implementations" on Tue, 05 Nov 2024 17:43:28 +0900 (JST), > > Sutou Kouhei <kou@clear-code.com> wrote: > > > > >> I've further investigated the performance regression, and found out it > > >> might be relevant that the compiler doesn't inline the > > >> CopyFromTextLikeOneRow() function. It might be worth testing with > > >> pg_attribute_always_inline instead of 'inline' as below: > > > > > > Wow! Good catch! > > > > > > I've rebased on the current master and updated the v20 and > > > v21 patch sets with "pg_attribute_always_inline" not > > > "inline". > > > > > > The v22 patch set is for the v20 patch set. > > > (TO/FROM changes are in one commit.) > > > > > > The v23 patch set is for the v21 patch set. > > > (TO/FROM changes are separated for easy to merge only FROM > > > or TO part.) > > > > I've run benchmark on my machine that has "Intel(R) Core(TM) > > i7-3770 CPU @ 3.40GHz". > > > > Summary: > > * "pg_attribute_always_inline" is effective for the "COPY > > FROM" part > > * "pg_attribute_always_inline" may not be needed for the > > "COPY TO" part > > > > > > v20-result.pdf: This is the same result PDF attached in > > https://www.postgresql.org/message-id/20241008.173918.995935870630354246.kou%40clear-code.com > > . This is the base line for "pg_attribute_always_inline" > > change. > > > > v22-result.pdf: This is a new result PDF for the v22 patch > > set. > > > > COPY FROM: > > > > 0001: The v22 patch set is slower than HEAD. This just > > introduces "routine" abstraction. It increases overhead. So > > this is expected. > > > > 0002-0005: The v22 patch set is faster than HEAD for all > > cases. The v20 patch set is slower than HEAD for smaller > > data. This shows that "pg_attribute_always_inline" for the > > "COPY FROM" part is effective on my machine too. > > > > > > COPY TO: > > > > 0001: The v22 patch set is slower than HEAD. This is > > as expected for the same reason as COPY FROM. > > > > 0002-0004: The v22 patch set is slower than HEAD. (The v20 > > patch set is faster than HEAD.) This is not expected. > > > > 0005: The v22 patch set is faster than HEAD. This is > > expected. But 0005 just exports some functions. It doesn't > > change existing logic. So it's strange... > > > > This shows "pg_attribute_always_inline" is needless for the > > "COPY TO" part. > > > > > > I also tried the v24 patch set: > > * The "COPY FROM" part is same as the v22 patch set > > ("pg_attribute_always_inline" is used.) > > * The "COPY TO" part is same as the v20 patch set > > ("pg_attribute_always_inline" is NOT used.) > > > > > > (I think that the v24 patch set isn't useful for others. So > > I don't share it here. If you're interested in it, I'll > > share it here.) > > > > v24-result.pdf: > > > > COPY FROM: The same trend as the v22 patch set result. It's > > expected because the "COPY FROM" part is the same as the v22 > > patch set. > > > > COPY TO: The v24 patch set is faster than the v22 patch set > > but the v24 patch set isn't same trend as the v20 patch > > set. This is not expected because the "COPY TO" part is the > > same as the v20 patch set. > > > > > > Anyway, the 0005 "COPY TO" parts are always faster than > > HEAD. So we can use either "pg_attribute_always_inline" or > > "inline". > > > > > > Summary: > > * "pg_attribute_always_inline" is effective for the "COPY > > FROM" part > > * "pg_attribute_always_inline" may not be needed for the > > "COPY TO" part > > > > > > Can we proceed this proposal with these results? Or should > > we wait for more benchmark results? > > Thank you for sharing the benchmark test results! I think these > results are good for us to proceed. I'll closely look at COPY TO > results and review v22 patch sets. I have a question about v22. We use pg_attribute_always_inline for some functions to avoid function call overheads. Applying it to CopyToTextLikeOneRow() and CopyFromTextLikeOneRow() are legitimate as we've discussed. But there are more function where the patch applied it to: -bool -NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields) +static pg_attribute_always_inline bool +NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields, bool is_csv) -static bool -CopyReadLineText(CopyFromState cstate) +static pg_attribute_always_inline bool +CopyReadLineText(CopyFromState cstate, bool is_csv) +static pg_attribute_always_inline void +CopyToTextLikeSendEndOfRow(CopyToState cstate) I think it's out of scope of this patch even if these changes are legitimate. Is there any reason for these changes? Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Hi, In <CAD21AoC=DX5QQVb27C6UdpPfY-F=-PGnQ1u6rWo69DV=4EtDdw@mail.gmail.com> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 18 Nov 2024 17:02:41 -0800, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > I have a question about v22. We use pg_attribute_always_inline for > some functions to avoid function call overheads. Applying it to > CopyToTextLikeOneRow() and CopyFromTextLikeOneRow() are legitimate as > we've discussed. But there are more function where the patch applied > it to: > > -bool > -NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields) > +static pg_attribute_always_inline bool > +NextCopyFromRawFields(CopyFromState cstate, char ***fields, int > *nfields, bool is_csv) > > -static bool > -CopyReadLineText(CopyFromState cstate) > +static pg_attribute_always_inline bool > +CopyReadLineText(CopyFromState cstate, bool is_csv) > > +static pg_attribute_always_inline void > +CopyToTextLikeSendEndOfRow(CopyToState cstate) > > I think it's out of scope of this patch even if these changes are > legitimate. Is there any reason for these changes? Yes for NextCopyFromRawFields() and CopyReadLineText(). No for CopyToTextLikeSendEndOfRow(). NextCopyFromRawFields() and CopyReadLineText() have "bool is_csv". So I think that we should use pg_attribute_always_inline (or inline) like CopyToTextLikeOneRow() and CopyFromTextLikeOneRow(). I think that it's not out of scope of this patch because it's a part of CopyToTextLikeOneRow() and CopyFromTextLikeOneRow() optimization. Note: The optimization is based on "bool is_csv" parameter and constant "true"/"false" argument function call. If we can inline this function call, all "if (is_csv)" checks in the function are removed. pg_attribute_always_inline (or inline) for CopyToTextLikeSendEndOfRow() is out of scope of this patch. You're right. I think that inlining CopyToTextLikeSendEndOfRow() is better because it's called per row. But it's not related to the optimization. Should I create a new patch set without pg_attribute_always_inline/inline for CopyToTextLikeSendEndOfRow()? Or could you remove it when you push? Thanks, -- kou
On Mon, Nov 18, 2024 at 5:31 PM Sutou Kouhei <kou@clear-code.com> wrote: > > Hi, > > In <CAD21AoC=DX5QQVb27C6UdpPfY-F=-PGnQ1u6rWo69DV=4EtDdw@mail.gmail.com> > "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 18 Nov 2024 17:02:41 -0800, > Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > I have a question about v22. We use pg_attribute_always_inline for > > some functions to avoid function call overheads. Applying it to > > CopyToTextLikeOneRow() and CopyFromTextLikeOneRow() are legitimate as > > we've discussed. But there are more function where the patch applied > > it to: > > > > -bool > > -NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields) > > +static pg_attribute_always_inline bool > > +NextCopyFromRawFields(CopyFromState cstate, char ***fields, int > > *nfields, bool is_csv) > > > > -static bool > > -CopyReadLineText(CopyFromState cstate) > > +static pg_attribute_always_inline bool > > +CopyReadLineText(CopyFromState cstate, bool is_csv) > > > > +static pg_attribute_always_inline void > > +CopyToTextLikeSendEndOfRow(CopyToState cstate) > > > > I think it's out of scope of this patch even if these changes are > > legitimate. Is there any reason for these changes? > > Yes for NextCopyFromRawFields() and CopyReadLineText(). > No for CopyToTextLikeSendEndOfRow(). > > NextCopyFromRawFields() and CopyReadLineText() have "bool > is_csv". So I think that we should use > pg_attribute_always_inline (or inline) like > CopyToTextLikeOneRow() and CopyFromTextLikeOneRow(). I think > that it's not out of scope of this patch because it's a part > of CopyToTextLikeOneRow() and CopyFromTextLikeOneRow() > optimization. > > Note: The optimization is based on "bool is_csv" parameter > and constant "true"/"false" argument function call. If we > can inline this function call, all "if (is_csv)" checks in > the function are removed. Understood, thank you for pointing this out. > > pg_attribute_always_inline (or inline) for > CopyToTextLikeSendEndOfRow() is out of scope of this > patch. You're right. > > I think that inlining CopyToTextLikeSendEndOfRow() is better > because it's called per row. But it's not related to the > optimization. > > > Should I create a new patch set without > pg_attribute_always_inline/inline for > CopyToTextLikeSendEndOfRow()? Or could you remove it when > you push? Since I'm reviewing the patch and the patch organization I'll include it. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Hi,
In <CAD21AoA1s0nzjGU9t3N_uNdg3SZeOxXyH3rQfxYFEN3Y7JrKRQ@mail.gmail.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Wed, 20 Nov 2024 14:14:27 -0800,
  Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> I've extracted the changes to refactor COPY TO/FROM to use the format
> callback routines from v23 patch set, which seems to be a better patch
> split to me. Also, I've reviewed these changes and made some changes
> on top of them. The attached patches are:
> 
> 0001: make COPY TO use CopyToRoutine.
> 0002: minor changes to 0001 patch. will be fixed up.
> 0003: make COPY FROM use CopyFromRoutine.
> 0004: minor changes to 0003 patch. will be fixed up.
> 
> I've confirmed that v24 has a similar performance improvement to v23.
> Please check these extractions and minor change suggestions.
Thanks. Here are my comments:
0002:
+/* TEXT format */
"text" may be better than "TEXT". We use "text" not "TEXT"
in other places.
+static const CopyToRoutine CopyToRoutineText = {
+    .CopyToStart = CopyToTextLikeStart,
+    .CopyToOutFunc = CopyToTextLikeOutFunc,
+    .CopyToOneRow = CopyToTextOneRow,
+    .CopyToEnd = CopyToTextLikeEnd,
+};
+/* BINARY format */
"binary" may be better than "BINARY". We use "binary" not
"BINARY" in other places.
+static const CopyToRoutine CopyToRoutineBinary = {
+    .CopyToStart = CopyToBinaryStart,
+    .CopyToOutFunc = CopyToBinaryOutFunc,
+    .CopyToOneRow = CopyToBinaryOneRow,
+    .CopyToEnd = CopyToBinaryEnd,
+};
+/* Return COPY TO routines for the given option */
option ->
options
+static const CopyToRoutine *
+CopyToGetRoutine(CopyFormatOptions opts)
0003:
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 99981b1579..224fda172e 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -56,4 +56,46 @@ typedef struct CopyToRoutine
     void        (*CopyToEnd) (CopyToState cstate);
 } CopyToRoutine;
 
+/*
+ * API structure for a COPY FROM format implementation.     Note this must be
Should we remove a tab character here?
0004:
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index e77986f9a9..7f1de8a42b 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -106,31 +106,65 @@ typedef struct CopyMultiInsertInfo
 /* non-export function prototypes */
 static void ClosePipeFromProgram(CopyFromState cstate);
 
-
 /*
- * CopyFromRoutine implementations for text and CSV.
+ * built-in format-specific routines. One-row callbacks are defined in
built-in ->
Built-in
 /*
- * CopyFromTextLikeInFunc
- *
- * Assign input function data for a relation's attribute in text/CSV format.
+ * COPY FROM routines for built-in formats.
++
"+" ->
" *"
+/* TEXT format */
TEXT -> text?
+/* BINARY format */
BINARY -> binary?
+/* Return COPY FROM routines for the given option */
option ->
options
+static const CopyFromRoutine *
+CopyFromGetRoutine(CopyFormatOptions opts)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 0447c4df7e..5416583e94 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
+static bool CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
+                                   Datum *values, bool *nulls, bool is_csv);
Oh, I didn't know that we don't need inline in a function
declaration.
 
@@ -1237,7 +1219,7 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
 /*
  * CopyReadLineText - inner loop of CopyReadLine for text mode
  */
-static pg_attribute_always_inline bool
+static bool
 CopyReadLineText(CopyFromState cstate, bool is_csv)
Is this an intentional change?
CopyReadLineText() has "bool in_csv".
Thanks,
-- 
kou
			
		On Thu, Nov 21, 2024 at 2:41 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > I ran `make headerscheck` after these patches and it reported a few > problems: > > /pgsql/source/master/src/tools/pginclude/headerscheck /pgsql/source/master /pgsql/build/master > In file included from /tmp/headerscheck.xdG40Y/test.c:2: > /pgsql/source/master/src/include/commands/copyapi.h:76:44: error: unknown type name ‘CopyFromState’; did you mean ‘CopyToState’? > 76 | void (*CopyFromInFunc) (CopyFromState cstate, Oid atttypid, > | ^~~~~~~~~~~~~ > | CopyToState > /pgsql/source/master/src/include/commands/copyapi.h:87:43: error: unknown type name ‘CopyFromState’; did you mean ‘CopyToState’? > 87 | void (*CopyFromStart) (CopyFromState cstate, TupleDesc tupDesc); > | ^~~~~~~~~~~~~ > | CopyToState > /pgsql/source/master/src/include/commands/copyapi.h:98:44: error: unknown type name ‘CopyFromState’; did you mean ‘CopyToState’? > 98 | bool (*CopyFromOneRow) (CopyFromState cstate, ExprContext *econtext, > | ^~~~~~~~~~~~~ > | CopyToState > /pgsql/source/master/src/include/commands/copyapi.h:102:41: error: unknown type name ‘CopyFromState’; did you mean ‘CopyToState’? > 102 | void (*CopyFromEnd) (CopyFromState cstate); > | ^~~~~~~~~~~~~ > | CopyToState > /pgsql/source/master/src/include/commands/copyapi.h:103:1: warning: no semicolon at end of struct or union > 103 | } CopyFromRoutine; > | ^ > > I think the fix should be the attached. Thank you for the report and providing the patch! The fix looks good to me. I'll incorporate this patch in the next version. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Hi, In <CAD21AoBNfKDbJnu-zONNpG820ZXYC0fuTSLrJ-UdRqU4qp2wog@mail.gmail.com> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 22 Nov 2024 13:01:06 -0800, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> @@ -1237,7 +1219,7 @@ CopyReadLine(CopyFromState cstate, bool is_csv) >> /* >> * CopyReadLineText - inner loop of CopyReadLine for text mode >> */ >> -static pg_attribute_always_inline bool >> +static bool >> CopyReadLineText(CopyFromState cstate, bool is_csv) >> >> Is this an intentional change? >> CopyReadLineText() has "bool in_csv". > > Yes, I'm not sure it's really necessary to make it inline since the > benchmark results don't show much difference. Probably this is because > the function has 'is_csv' in some 'if' branches but the compiler > cannot optimize out the whole 'if' branches as most 'if' branches > check 'is_csv' and other variables. I see. If explicit "inline" isn't related to performance, we don't need explicit "inline". > I've attached the v25 patches that squashed the minor changes I made > in v24 and incorporated all comments I got so far. I think these two > patches are in good shape. Could you rebase remaining patches on top > of them so that we can see the big picture of this feature? OK. I'll work on it. > Regarding exposing the structs such as CopyToStateData, v22-0004 patch > moves most of all copy-related structs to copyapi.h from copyto.c, > copyfrom_internal.h, and copy.h, which seems odd to me. I think we can > expose CopyToStateData (and related structs) in a new file > copyto_internal.h and keep other structs in the original header files. Custom COPY format extensions need to use CopyToStateData/CopyFromStateData. For example, CopyToStateData::rel is used to retrieve table schema. If we move CopyToStateData to copyto_internal.h not copyapi.h, custom COPY format extensions need to include copyto_internal.h. I feel that it's strange that extensions need to use internal headers. What is your real concern? If you don't want to export CopyToStateData/CopyFromStateData entirely, we can provide accessors only for some members of them. FYI: We discussed this in the past. For example: https://www.postgresql.org/message-id/flat/20240115.152350.1128880926282754664.kou%40clear-code.com#1b523fb95e8fb46702f5568ae19e3649 Thanks, -- kou
Hi,
In <20241125.110620.313152541320718947.kou@clear-code.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 25 Nov 2024 11:06:20 +0900 (JST),
  Sutou Kouhei <kou@clear-code.com> wrote:
>> I've attached the v25 patches that squashed the minor changes I made
>> in v24 and incorporated all comments I got so far. I think these two
>> patches are in good shape. Could you rebase remaining patches on top
>> of them so that we can see the big picture of this feature?
> 
> OK. I'll work on it.
I've attached the v26 patch set:
0001: It's same as 0001 in the v25 patch set.
0002: It's same as 0002 in the v25 patch set.
0003: It's same as 0003 in the v23 patch set.
      This parses the "format" option and adds support for
      custom TO handler.
0004: It's same as 0004 in the v23 patch set.
      This exports CopyToStateData. But the following
      enums/structs/functions aren't moved to copyapi.h from
      copy.h:
      * CopyHeaderChoice
      * CopyOnErrorChoice
      * CopyLogVerbosityChoice
      * CopyFormatOptions
      * copy_data_dest_cb()
0005: It's same as 0005 in the v23 patch set.
      This adds missing APIs to implement custom TO handler
      as an extension.
0006: It's same as 0008 in the v23 patch set.
      This adds support for custom FROM handler.
0007: It's same as 0009 in the v23 patch set.
      This exports CopyFromStateData.
0008: It's same as 0010 in the v23 patch set.
      This adds missing APIs to implement custom FROM handler
      as an extension.
I've also updated https://github.com/kou/pg-copy-arrow for
the current API.
I think that we can merge only 0001/0002 as the first
step. Because they don't change the current behavior and
they improve performance. We can merge other patches after
that.
>> Regarding exposing the structs such as CopyToStateData, v22-0004 patch
>> moves most of all copy-related structs to copyapi.h from copyto.c,
>> copyfrom_internal.h, and copy.h, which seems odd to me. I think we can
>> expose CopyToStateData (and related structs) in a new file
>> copyto_internal.h and keep other structs in the original header files.
> 
> Custom COPY format extensions need to use
> CopyToStateData/CopyFromStateData. For example,
> CopyToStateData::rel is used to retrieve table schema. If we
> move CopyToStateData to copyto_internal.h not copyapi.h,
> custom COPY format extensions need to include
> copyto_internal.h. I feel that it's strange that extensions
> need to use internal headers.
> 
> What is your real concern? If you don't want to export
> CopyToStateData/CopyFromStateData entirely, we can provide
> accessors only for some members of them.
The v26 patch set still exports
CopyToStateData/CopyFromStateData in copyapi.h not
copy{to,from}_internal.h. But I didn't move the following
enums/structs/functions:
* CopyHeaderChoice
* CopyOnErrorChoice
* CopyLogVerbosityChoice
* CopyFormatOptions
* copy_data_dest_cb()
What do you think about this approach?
Thanks,
-- 
kou
From b95060713e5cfccc8b3db5acb34d352f18a8b1e2 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Sat, 28 Sep 2024 23:24:49 +0900
Subject: [PATCH v26 1/8] Refactor COPY TO to use format callback functions.
This commit introduces a new CopyToRoutine struct, which is a set of
callback routines to copy tuples in a specific format. It also makes
the existing formats (text, CSV, and binary) utilize these format
callbacks.
This change is a preliminary step towards making the COPY TO command
extensible in terms of output formats.
Additionally, this refactoring contributes to a performance
improvement by reducing the number of "if" branches that need to be
checked on a per-row basis when sending field representations in text
or CSV mode. The performance benchmark results showed ~5% performance
gain in text or CSV mode.
Author: Sutou Kouhei
Reviewed-by: Michael Paquier, Tomas Vondra, Masahiko Sawada
Reviewed-by: Junwang Zhao
Discussion: https://postgr.es/m/20231204.153548.2126325458835528809.kou@clear-code.com
---
 src/backend/commands/copyto.c    | 441 +++++++++++++++++++++----------
 src/include/commands/copyapi.h   |  57 ++++
 src/tools/pgindent/typedefs.list |   1 +
 3 files changed, 358 insertions(+), 141 deletions(-)
 create mode 100644 src/include/commands/copyapi.h
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index f55e6d96751..f81dadcc12b 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -19,7 +19,7 @@
 #include <sys/stat.h>
 
 #include "access/tableam.h"
-#include "commands/copy.h"
+#include "commands/copyapi.h"
 #include "commands/progress.h"
 #include "executor/execdesc.h"
 #include "executor/executor.h"
@@ -64,6 +64,9 @@ typedef enum CopyDest
  */
 typedef struct CopyToStateData
 {
+    /* format-specific routines */
+    const CopyToRoutine *routine;
+
     /* low-level state data */
     CopyDest    copy_dest;        /* type of copy source/destination */
     FILE       *copy_file;        /* used if copy_dest == COPY_FILE */
@@ -114,6 +117,19 @@ static void CopyAttributeOutText(CopyToState cstate, const char *string);
 static void CopyAttributeOutCSV(CopyToState cstate, const char *string,
                                 bool use_quote);
 
+/* built-in format-specific routines */
+static void CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc);
+static void CopyToTextLikeOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo);
+static void CopyToTextOneRow(CopyToState cstate, TupleTableSlot *slot);
+static void CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot);
+static void CopyToTextLikeOneRow(CopyToState cstate, TupleTableSlot *slot,
+                                 bool is_csv);
+static void CopyToTextLikeEnd(CopyToState cstate);
+static void CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc);
+static void CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo);
+static void CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot);
+static void CopyToBinaryEnd(CopyToState cstate);
+
 /* Low-level communications functions */
 static void SendCopyBegin(CopyToState cstate);
 static void SendCopyEnd(CopyToState cstate);
@@ -121,9 +137,254 @@ static void CopySendData(CopyToState cstate, const void *databuf, int datasize);
 static void CopySendString(CopyToState cstate, const char *str);
 static void CopySendChar(CopyToState cstate, char c);
 static void CopySendEndOfRow(CopyToState cstate);
+static void CopySendTextLikeEndOfRow(CopyToState cstate);
 static void CopySendInt32(CopyToState cstate, int32 val);
 static void CopySendInt16(CopyToState cstate, int16 val);
 
+/*
+ * COPY TO routines for built-in formats.
+ *
+ * CSV and text formats share the same TextLike routines except for the
+ * one-row callback.
+ */
+
+/* text format */
+static const CopyToRoutine CopyToRoutineText = {
+    .CopyToStart = CopyToTextLikeStart,
+    .CopyToOutFunc = CopyToTextLikeOutFunc,
+    .CopyToOneRow = CopyToTextOneRow,
+    .CopyToEnd = CopyToTextLikeEnd,
+};
+
+/* CSV format */
+static const CopyToRoutine CopyToRoutineCSV = {
+    .CopyToStart = CopyToTextLikeStart,
+    .CopyToOutFunc = CopyToTextLikeOutFunc,
+    .CopyToOneRow = CopyToCSVOneRow,
+    .CopyToEnd = CopyToTextLikeEnd,
+};
+
+/* binary format */
+static const CopyToRoutine CopyToRoutineBinary = {
+    .CopyToStart = CopyToBinaryStart,
+    .CopyToOutFunc = CopyToBinaryOutFunc,
+    .CopyToOneRow = CopyToBinaryOneRow,
+    .CopyToEnd = CopyToBinaryEnd,
+};
+
+/* Return COPY TO routines for the given options */
+static const CopyToRoutine *
+CopyToGetRoutine(CopyFormatOptions opts)
+{
+    if (opts.csv_mode)
+        return &CopyToRoutineCSV;
+    else if (opts.binary)
+        return &CopyToRoutineBinary;
+
+    /* default is text */
+    return &CopyToRoutineText;
+}
+
+/* Implementation of the start callback for text and CSV formats */
+static void
+CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
+{
+    /*
+     * For non-binary copy, we need to convert null_print to file encoding,
+     * because it will be sent directly with CopySendString.
+     */
+    if (cstate->need_transcoding)
+        cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print,
+                                                          cstate->opts.null_print_len,
+                                                          cstate->file_encoding);
+
+    /* if a header has been requested send the line */
+    if (cstate->opts.header_line)
+    {
+        ListCell   *cur;
+        bool        hdr_delim = false;
+
+        foreach(cur, cstate->attnumlist)
+        {
+            int            attnum = lfirst_int(cur);
+            char       *colname;
+
+            if (hdr_delim)
+                CopySendChar(cstate, cstate->opts.delim[0]);
+            hdr_delim = true;
+
+            colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
+
+            if (cstate->opts.csv_mode)
+                CopyAttributeOutCSV(cstate, colname, false);
+            else
+                CopyAttributeOutText(cstate, colname);
+        }
+
+        CopySendTextLikeEndOfRow(cstate);
+    }
+}
+
+/*
+ * Implementation of the outfunc callback for text and CSV formats. Assign
+ * the output function data to the given *finfo.
+ */
+static void
+CopyToTextLikeOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo)
+{
+    Oid            func_oid;
+    bool        is_varlena;
+
+    /* Set output function for an attribute */
+    getTypeOutputInfo(atttypid, &func_oid, &is_varlena);
+    fmgr_info(func_oid, finfo);
+}
+
+/* Implementation of the per-row callback for text format */
+static void
+CopyToTextOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+    CopyToTextLikeOneRow(cstate, slot, false);
+}
+
+/* Implementation of the per-row callback for CSV format */
+static void
+CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+    CopyToTextLikeOneRow(cstate, slot, true);
+}
+
+/*
+ * Workhorse for CopyToTextOneRow() and CopyToCSVOneRow().
+ *
+ * We use pg_attribute_always_inline to reduce function call overheads.
+ */
+static pg_attribute_always_inline void
+CopyToTextLikeOneRow(CopyToState cstate,
+                     TupleTableSlot *slot,
+                     bool is_csv)
+{
+    bool        need_delim = false;
+    FmgrInfo   *out_functions = cstate->out_functions;
+
+    foreach_int(attnum, cstate->attnumlist)
+    {
+        Datum        value = slot->tts_values[attnum - 1];
+        bool        isnull = slot->tts_isnull[attnum - 1];
+
+        if (need_delim)
+            CopySendChar(cstate, cstate->opts.delim[0]);
+        need_delim = true;
+
+        if (isnull)
+        {
+            CopySendString(cstate, cstate->opts.null_print_client);
+        }
+        else
+        {
+            char       *string;
+
+            string = OutputFunctionCall(&out_functions[attnum - 1],
+                                        value);
+
+            /*
+             * is_csv will be optimized away by compiler, as argument is
+             * constant at caller.
+             */
+            if (is_csv)
+                CopyAttributeOutCSV(cstate, string,
+                                    cstate->opts.force_quote_flags[attnum - 1]);
+            else
+                CopyAttributeOutText(cstate, string);
+        }
+    }
+
+    CopySendTextLikeEndOfRow(cstate);
+}
+
+/* Implementation of the end callback for text and CSV formats */
+static void
+CopyToTextLikeEnd(CopyToState cstate)
+{
+    /* Nothing to do here */
+}
+
+/*
+ * Implementation of the start callback for binary format. Send a header
+ * for a binary copy.
+ */
+static void
+CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc)
+{
+    int32        tmp;
+
+    /* Signature */
+    CopySendData(cstate, BinarySignature, 11);
+    /* Flags field */
+    tmp = 0;
+    CopySendInt32(cstate, tmp);
+    /* No header extension */
+    tmp = 0;
+    CopySendInt32(cstate, tmp);
+}
+
+/*
+ * Implementation of the outfunc callback for binary format. Assign
+ * the binary output function to the given *finfo.
+ */
+static void
+CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo)
+{
+    Oid            func_oid;
+    bool        is_varlena;
+
+    /* Set output function for an attribute */
+    getTypeBinaryOutputInfo(atttypid, &func_oid, &is_varlena);
+    fmgr_info(func_oid, finfo);
+}
+
+/* Implementation of the per-row callback for binary format */
+static void
+CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+    FmgrInfo   *out_functions = cstate->out_functions;
+
+    /* Binary per-tuple header */
+    CopySendInt16(cstate, list_length(cstate->attnumlist));
+
+    foreach_int(attnum, cstate->attnumlist)
+    {
+        Datum        value = slot->tts_values[attnum - 1];
+        bool        isnull = slot->tts_isnull[attnum - 1];
+
+        if (isnull)
+        {
+            CopySendInt32(cstate, -1);
+        }
+        else
+        {
+            bytea       *outputbytes;
+
+            outputbytes = SendFunctionCall(&out_functions[attnum - 1],
+                                           value);
+            CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
+            CopySendData(cstate, VARDATA(outputbytes),
+                         VARSIZE(outputbytes) - VARHDRSZ);
+        }
+    }
+
+    CopySendEndOfRow(cstate);
+}
+
+/* Implementation of the end callback for binary format */
+static void
+CopyToBinaryEnd(CopyToState cstate)
+{
+    /* Generate trailer for a binary copy */
+    CopySendInt16(cstate, -1);
+    /* Need to flush out the trailer */
+    CopySendEndOfRow(cstate);
+}
 
 /*
  * Send copy start/stop messages for frontend copies.  These have changed
@@ -191,16 +452,6 @@ CopySendEndOfRow(CopyToState cstate)
     switch (cstate->copy_dest)
     {
         case COPY_FILE:
-            if (!cstate->opts.binary)
-            {
-                /* Default line termination depends on platform */
-#ifndef WIN32
-                CopySendChar(cstate, '\n');
-#else
-                CopySendString(cstate, "\r\n");
-#endif
-            }
-
             if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1,
                        cstate->copy_file) != 1 ||
                 ferror(cstate->copy_file))
@@ -235,10 +486,6 @@ CopySendEndOfRow(CopyToState cstate)
             }
             break;
         case COPY_FRONTEND:
-            /* The FE/BE protocol uses \n as newline for all platforms */
-            if (!cstate->opts.binary)
-                CopySendChar(cstate, '\n');
-
             /* Dump the accumulated row as one CopyData message */
             (void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len);
             break;
@@ -254,6 +501,35 @@ CopySendEndOfRow(CopyToState cstate)
     resetStringInfo(fe_msgbuf);
 }
 
+/*
+ * Wrapper function of CopySendEndOfRow for text and CSV formats. Sends the
+ * the line termination and do common appropriate things for the end of row.
+ */
+static inline void
+CopySendTextLikeEndOfRow(CopyToState cstate)
+{
+    switch (cstate->copy_dest)
+    {
+        case COPY_FILE:
+            /* Default line termination depends on platform */
+#ifndef WIN32
+            CopySendChar(cstate, '\n');
+#else
+            CopySendString(cstate, "\r\n");
+#endif
+            break;
+        case COPY_FRONTEND:
+            /* The FE/BE protocol uses \n as newline for all platforms */
+            CopySendChar(cstate, '\n');
+            break;
+        default:
+            break;
+    }
+
+    /* Now take the actions related to the end of a row */
+    CopySendEndOfRow(cstate);
+}
+
 /*
  * These functions do apply some data conversion
  */
@@ -426,6 +702,9 @@ BeginCopyTo(ParseState *pstate,
     /* Extract options from the statement node tree */
     ProcessCopyOptions(pstate, &cstate->opts, false /* is_from */ , options);
 
+    /* Set format routine */
+    cstate->routine = CopyToGetRoutine(cstate->opts);
+
     /* Process the source/target relation or query */
     if (rel)
     {
@@ -771,19 +1050,10 @@ DoCopyTo(CopyToState cstate)
     foreach(cur, cstate->attnumlist)
     {
         int            attnum = lfirst_int(cur);
-        Oid            out_func_oid;
-        bool        isvarlena;
         Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
 
-        if (cstate->opts.binary)
-            getTypeBinaryOutputInfo(attr->atttypid,
-                                    &out_func_oid,
-                                    &isvarlena);
-        else
-            getTypeOutputInfo(attr->atttypid,
-                              &out_func_oid,
-                              &isvarlena);
-        fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
+        cstate->routine->CopyToOutFunc(cstate, attr->atttypid,
+                                       &cstate->out_functions[attnum - 1]);
     }
 
     /*
@@ -796,56 +1066,7 @@ DoCopyTo(CopyToState cstate)
                                                "COPY TO",
                                                ALLOCSET_DEFAULT_SIZES);
 
-    if (cstate->opts.binary)
-    {
-        /* Generate header for a binary copy */
-        int32        tmp;
-
-        /* Signature */
-        CopySendData(cstate, BinarySignature, 11);
-        /* Flags field */
-        tmp = 0;
-        CopySendInt32(cstate, tmp);
-        /* No header extension */
-        tmp = 0;
-        CopySendInt32(cstate, tmp);
-    }
-    else
-    {
-        /*
-         * For non-binary copy, we need to convert null_print to file
-         * encoding, because it will be sent directly with CopySendString.
-         */
-        if (cstate->need_transcoding)
-            cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print,
-                                                              cstate->opts.null_print_len,
-                                                              cstate->file_encoding);
-
-        /* if a header has been requested send the line */
-        if (cstate->opts.header_line)
-        {
-            bool        hdr_delim = false;
-
-            foreach(cur, cstate->attnumlist)
-            {
-                int            attnum = lfirst_int(cur);
-                char       *colname;
-
-                if (hdr_delim)
-                    CopySendChar(cstate, cstate->opts.delim[0]);
-                hdr_delim = true;
-
-                colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
-
-                if (cstate->opts.csv_mode)
-                    CopyAttributeOutCSV(cstate, colname, false);
-                else
-                    CopyAttributeOutText(cstate, colname);
-            }
-
-            CopySendEndOfRow(cstate);
-        }
-    }
+    cstate->routine->CopyToStart(cstate, tupDesc);
 
     if (cstate->rel)
     {
@@ -884,13 +1105,7 @@ DoCopyTo(CopyToState cstate)
         processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
     }
 
-    if (cstate->opts.binary)
-    {
-        /* Generate trailer for a binary copy */
-        CopySendInt16(cstate, -1);
-        /* Need to flush out the trailer */
-        CopySendEndOfRow(cstate);
-    }
+    cstate->routine->CopyToEnd(cstate);
 
     MemoryContextDelete(cstate->rowcontext);
 
@@ -903,74 +1118,18 @@ DoCopyTo(CopyToState cstate)
 /*
  * Emit one row during DoCopyTo().
  */
-static void
+static inline void
 CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 {
-    FmgrInfo   *out_functions = cstate->out_functions;
     MemoryContext oldcontext;
 
     MemoryContextReset(cstate->rowcontext);
     oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
 
-    if (cstate->opts.binary)
-    {
-        /* Binary per-tuple header */
-        CopySendInt16(cstate, list_length(cstate->attnumlist));
-    }
-
     /* Make sure the tuple is fully deconstructed */
     slot_getallattrs(slot);
 
-    if (!cstate->opts.binary)
-    {
-        bool        need_delim = false;
-
-        foreach_int(attnum, cstate->attnumlist)
-        {
-            Datum        value = slot->tts_values[attnum - 1];
-            bool        isnull = slot->tts_isnull[attnum - 1];
-            char       *string;
-
-            if (need_delim)
-                CopySendChar(cstate, cstate->opts.delim[0]);
-            need_delim = true;
-
-            if (isnull)
-                CopySendString(cstate, cstate->opts.null_print_client);
-            else
-            {
-                string = OutputFunctionCall(&out_functions[attnum - 1],
-                                            value);
-                if (cstate->opts.csv_mode)
-                    CopyAttributeOutCSV(cstate, string,
-                                        cstate->opts.force_quote_flags[attnum - 1]);
-                else
-                    CopyAttributeOutText(cstate, string);
-            }
-        }
-    }
-    else
-    {
-        foreach_int(attnum, cstate->attnumlist)
-        {
-            Datum        value = slot->tts_values[attnum - 1];
-            bool        isnull = slot->tts_isnull[attnum - 1];
-            bytea       *outputbytes;
-
-            if (isnull)
-                CopySendInt32(cstate, -1);
-            else
-            {
-                outputbytes = SendFunctionCall(&out_functions[attnum - 1],
-                                               value);
-                CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
-                CopySendData(cstate, VARDATA(outputbytes),
-                             VARSIZE(outputbytes) - VARHDRSZ);
-            }
-        }
-    }
-
-    CopySendEndOfRow(cstate);
+    cstate->routine->CopyToOneRow(cstate, slot);
 
     MemoryContextSwitchTo(oldcontext);
 }
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
new file mode 100644
index 00000000000..eccc875d0e8
--- /dev/null
+++ b/src/include/commands/copyapi.h
@@ -0,0 +1,57 @@
+/*-------------------------------------------------------------------------
+ *
+ * copyapi.h
+ *      API for COPY TO handlers
+ *
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/commands/copyapi.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef COPYAPI_H
+#define COPYAPI_H
+
+#include "commands/copy.h"
+#include "executor/tuptable.h"
+#include "nodes/execnodes.h"
+
+/*
+ * API structure for a COPY TO format implementation. Note this must be
+ * allocated in a server-lifetime manner, typically as a static const struct.
+ */
+typedef struct CopyToRoutine
+{
+    /*
+     * Set output function information. This callback is called once at the
+     * beginning of COPY TO.
+     *
+     * 'finfo' can be optionally filled to provide the catalog information of
+     * the output function.
+     *
+     * 'atttypid' is the OID of data type used by the relation's attribute.
+     */
+    void        (*CopyToOutFunc) (CopyToState cstate, Oid atttypid,
+                                  FmgrInfo *finfo);
+
+    /*
+     * Start a COPY TO. This callback is called once at the beginning of COPY
+     * FROM.
+     *
+     * 'tupDesc' is the tuple descriptor of the relation from where the data
+     * is read.
+     */
+    void        (*CopyToStart) (CopyToState cstate, TupleDesc tupDesc);
+
+    /*
+     * Write one row to the 'slot'.
+     */
+    void        (*CopyToOneRow) (CopyToState cstate, TupleTableSlot *slot);
+
+    /* End a COPY TO. This callback is called once at the end of COPY FROM */
+    void        (*CopyToEnd) (CopyToState cstate);
+} CopyToRoutine;
+
+#endif                            /* COPYAPI_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b54428b38cd..8edb41cce2e 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -503,6 +503,7 @@ CopyMultiInsertInfo
 CopyOnErrorChoice
 CopySource
 CopyStmt
+CopyToRoutine
 CopyToState
 CopyToStateData
 Cost
-- 
2.45.2
From c7eba0bf7bf4c42933b71d98aa6d519af0ce0121 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 18 Nov 2024 16:32:43 -0800
Subject: [PATCH v26 2/8] Refactor COPY FROM to use format callback functions.
This commit introduces a new CopyFromRoutine struct, which is a set of
callback routines to read tuples in a specific format. It also makes
COPY FROM with the existing formats (text, CSV, and binary) utilize
these format callbacks.
This change is a preliminary step towards making the COPY TO command
extensible in terms of output formats.
Similar to XXXX, this refactoring contributes to a performance
improvement by reducing the number of "if" branches that need to be
checked on a per-row basis when sending field representations in text
or CSV mode. The performance benchmark results showed ~5% performance
gain in text or CSV mode.
Author: Sutou Kouhei
Reviewed-by: Michael Paquier, Tomas Vondra, Masahiko Sawada
Reviewed-by: Junwang Zhao
Discussion: https://postgr.es/m/20231204.153548.2126325458835528809.kou@clear-code.com
---
 contrib/file_fdw/file_fdw.c              |   1 -
 src/backend/commands/copyfrom.c          | 190 +++++++--
 src/backend/commands/copyfromparse.c     | 504 +++++++++++++----------
 src/include/commands/copy.h              |   2 -
 src/include/commands/copyapi.h           |  48 ++-
 src/include/commands/copyfrom_internal.h |  13 +-
 src/tools/pgindent/typedefs.list         |   1 +
 7 files changed, 492 insertions(+), 267 deletions(-)
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 9e2896f32ae..bac31315fcb 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -21,7 +21,6 @@
 #include "access/table.h"
 #include "catalog/pg_authid.h"
 #include "catalog/pg_foreign_table.h"
-#include "commands/copy.h"
 #include "commands/copyfrom_internal.h"
 #include "commands/defrem.h"
 #include "commands/explain.h"
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 754cb496169..c84081c3ba3 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -106,6 +106,145 @@ typedef struct CopyMultiInsertInfo
 /* non-export function prototypes */
 static void ClosePipeFromProgram(CopyFromState cstate);
 
+/*
+ * Built-in format-specific routines. One-row callbacks are defined in
+ * copyfromparse.c
+ */
+static void CopyFromTextLikeInFunc(CopyFromState cstate, Oid atttypid, FmgrInfo *finfo,
+                                   Oid *typioparam);
+static void CopyFromTextLikeStart(CopyFromState cstate, TupleDesc tupDesc);
+static void CopyFromTextLikeEnd(CopyFromState cstate);
+static void CopyFromBinaryInFunc(CopyFromState cstate, Oid atttypid,
+                                 FmgrInfo *finfo, Oid *typioparam);
+static void CopyFromBinaryStart(CopyFromState cstate, TupleDesc tupDesc);
+static void CopyFromBinaryEnd(CopyFromState cstate);
+
+
+/*
+ * COPY FROM routines for built-in formats.
+ *
+ * CSV and text formats share the same TextLike routines except for the
+ * one-row callback.
+ */
+
+/* text format */
+static const CopyFromRoutine CopyFromRoutineText = {
+    .CopyFromInFunc = CopyFromTextLikeInFunc,
+    .CopyFromStart = CopyFromTextLikeStart,
+    .CopyFromOneRow = CopyFromTextOneRow,
+    .CopyFromEnd = CopyFromTextLikeEnd,
+};
+
+/* CSV format */
+static const CopyFromRoutine CopyFromRoutineCSV = {
+    .CopyFromInFunc = CopyFromTextLikeInFunc,
+    .CopyFromStart = CopyFromTextLikeStart,
+    .CopyFromOneRow = CopyFromCSVOneRow,
+    .CopyFromEnd = CopyFromTextLikeEnd,
+};
+
+/* binary format */
+static const CopyFromRoutine CopyFromRoutineBinary = {
+    .CopyFromInFunc = CopyFromBinaryInFunc,
+    .CopyFromStart = CopyFromBinaryStart,
+    .CopyFromOneRow = CopyFromBinaryOneRow,
+    .CopyFromEnd = CopyFromBinaryEnd,
+};
+
+/* Return COPY FROM routines for the given options */
+static const CopyFromRoutine *
+CopyFromGetRoutine(CopyFormatOptions opts)
+{
+    if (opts.csv_mode)
+        return &CopyFromRoutineCSV;
+    else if (opts.binary)
+        return &CopyFromRoutineBinary;
+
+    /* default is text */
+    return &CopyFromRoutineText;
+}
+
+/* Implementation of the start callback for text and CSV formats */
+static void
+CopyFromTextLikeStart(CopyFromState cstate, TupleDesc tupDesc)
+{
+    AttrNumber    attr_count;
+
+    /*
+     * If encoding conversion is needed, we need another buffer to hold the
+     * converted input data.  Otherwise, we can just point input_buf to the
+     * same buffer as raw_buf.
+     */
+    if (cstate->need_transcoding)
+    {
+        cstate->input_buf = (char *) palloc(INPUT_BUF_SIZE + 1);
+        cstate->input_buf_index = cstate->input_buf_len = 0;
+    }
+    else
+        cstate->input_buf = cstate->raw_buf;
+    cstate->input_reached_eof = false;
+
+    initStringInfo(&cstate->line_buf);
+
+    /*
+     * Create workspace for CopyReadAttributes results; used by CSV and text
+     * format.
+     */
+    attr_count = list_length(cstate->attnumlist);
+    cstate->max_fields = attr_count;
+    cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *));
+}
+
+/*
+ * Implementation of the infunc callback for text and CSV formats. Assign
+ * the input function data to the given *finfo.
+ */
+static void
+CopyFromTextLikeInFunc(CopyFromState cstate, Oid atttypid, FmgrInfo *finfo,
+                       Oid *typioparam)
+{
+    Oid            func_oid;
+
+    getTypeInputInfo(atttypid, &func_oid, typioparam);
+    fmgr_info(func_oid, finfo);
+}
+
+/* Implementation of the end callback for text and CSV formats */
+static void
+CopyFromTextLikeEnd(CopyFromState cstate)
+{
+    /* nothing to do */
+}
+
+/* Implementation of the start callback for binary format */
+static void
+CopyFromBinaryStart(CopyFromState cstate, TupleDesc tupDesc)
+{
+    /* Read and verify binary header */
+    ReceiveCopyBinaryHeader(cstate);
+}
+
+/*
+ * Implementation of the infunc callback for binary format. Assign
+ * the binary input function to the given *finfo.
+ */
+static void
+CopyFromBinaryInFunc(CopyFromState cstate, Oid atttypid,
+                     FmgrInfo *finfo, Oid *typioparam)
+{
+    Oid            func_oid;
+
+    getTypeBinaryInputInfo(atttypid, &func_oid, typioparam);
+    fmgr_info(func_oid, finfo);
+}
+
+/* Implementation of the end callback for binary format */
+static void
+CopyFromBinaryEnd(CopyFromState cstate)
+{
+    /* nothing to do */
+}
+
 /*
  * error context callback for COPY FROM
  *
@@ -1396,7 +1535,6 @@ BeginCopyFrom(ParseState *pstate,
                 num_defaults;
     FmgrInfo   *in_functions;
     Oid           *typioparams;
-    Oid            in_func_oid;
     int           *defmap;
     ExprState **defexprs;
     MemoryContext oldcontext;
@@ -1428,6 +1566,9 @@ BeginCopyFrom(ParseState *pstate,
     /* Extract options from the statement node tree */
     ProcessCopyOptions(pstate, &cstate->opts, true /* is_from */ , options);
 
+    /* Set the format routine */
+    cstate->routine = CopyFromGetRoutine(cstate->opts);
+
     /* Process the target relation */
     cstate->rel = rel;
 
@@ -1583,25 +1724,6 @@ BeginCopyFrom(ParseState *pstate,
     cstate->raw_buf_index = cstate->raw_buf_len = 0;
     cstate->raw_reached_eof = false;
 
-    if (!cstate->opts.binary)
-    {
-        /*
-         * If encoding conversion is needed, we need another buffer to hold
-         * the converted input data.  Otherwise, we can just point input_buf
-         * to the same buffer as raw_buf.
-         */
-        if (cstate->need_transcoding)
-        {
-            cstate->input_buf = (char *) palloc(INPUT_BUF_SIZE + 1);
-            cstate->input_buf_index = cstate->input_buf_len = 0;
-        }
-        else
-            cstate->input_buf = cstate->raw_buf;
-        cstate->input_reached_eof = false;
-
-        initStringInfo(&cstate->line_buf);
-    }
-
     initStringInfo(&cstate->attribute_buf);
 
     /* Assign range table and rteperminfos, we'll need them in CopyFrom. */
@@ -1634,13 +1756,9 @@ BeginCopyFrom(ParseState *pstate,
             continue;
 
         /* Fetch the input function and typioparam info */
-        if (cstate->opts.binary)
-            getTypeBinaryInputInfo(att->atttypid,
-                                   &in_func_oid, &typioparams[attnum - 1]);
-        else
-            getTypeInputInfo(att->atttypid,
-                             &in_func_oid, &typioparams[attnum - 1]);
-        fmgr_info(in_func_oid, &in_functions[attnum - 1]);
+        cstate->routine->CopyFromInFunc(cstate, att->atttypid,
+                                        &in_functions[attnum - 1],
+                                        &typioparams[attnum - 1]);
 
         /* Get default info if available */
         defexprs[attnum - 1] = NULL;
@@ -1775,20 +1893,7 @@ BeginCopyFrom(ParseState *pstate,
 
     pgstat_progress_update_multi_param(3, progress_cols, progress_vals);
 
-    if (cstate->opts.binary)
-    {
-        /* Read and verify binary header */
-        ReceiveCopyBinaryHeader(cstate);
-    }
-
-    /* create workspace for CopyReadAttributes results */
-    if (!cstate->opts.binary)
-    {
-        AttrNumber    attr_count = list_length(cstate->attnumlist);
-
-        cstate->max_fields = attr_count;
-        cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *));
-    }
+    cstate->routine->CopyFromStart(cstate, tupDesc);
 
     MemoryContextSwitchTo(oldcontext);
 
@@ -1801,6 +1906,9 @@ BeginCopyFrom(ParseState *pstate,
 void
 EndCopyFrom(CopyFromState cstate)
 {
+    /* Invoke the end callback */
+    cstate->routine->CopyFromEnd(cstate);
+
     /* No COPY FROM related resources except memory. */
     if (cstate->is_program)
     {
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index d1d43b53d83..fdb506c58be 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -62,7 +62,6 @@
 #include <unistd.h>
 #include <sys/stat.h>
 
-#include "commands/copy.h"
 #include "commands/copyfrom_internal.h"
 #include "commands/progress.h"
 #include "executor/executor.h"
@@ -140,8 +139,8 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 
 
 /* non-export function prototypes */
-static bool CopyReadLine(CopyFromState cstate);
-static bool CopyReadLineText(CopyFromState cstate);
+static bool CopyReadLine(CopyFromState cstate, bool is_csv);
+static bool CopyReadLineText(CopyFromState cstate, bool is_csv);
 static int    CopyReadAttributesText(CopyFromState cstate);
 static int    CopyReadAttributesCSV(CopyFromState cstate);
 static Datum CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
@@ -740,9 +739,11 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes)
  * in the relation.
  *
  * NOTE: force_not_null option are not applied to the returned fields.
+ *
+ * We use pg_attribute_always_inline to reduce function call overheads.
  */
-bool
-NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
+static pg_attribute_always_inline bool
+NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields, bool is_csv)
 {
     int            fldct;
     bool        done;
@@ -759,13 +760,17 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
         tupDesc = RelationGetDescr(cstate->rel);
 
         cstate->cur_lineno++;
-        done = CopyReadLine(cstate);
+        done = CopyReadLine(cstate, is_csv);
 
         if (cstate->opts.header_line == COPY_HEADER_MATCH)
         {
             int            fldnum;
 
-            if (cstate->opts.csv_mode)
+            /*
+             * is_csv will be optimized away by compiler, as argument is
+             * constant at caller.
+             */
+            if (is_csv)
                 fldct = CopyReadAttributesCSV(cstate);
             else
                 fldct = CopyReadAttributesText(cstate);
@@ -809,7 +814,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
     cstate->cur_lineno++;
 
     /* Actually read the line into memory here */
-    done = CopyReadLine(cstate);
+    done = CopyReadLine(cstate, is_csv);
 
     /*
      * EOF at start of line means we're done.  If we see EOF after some
@@ -819,8 +824,13 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
     if (done && cstate->line_buf.len == 0)
         return false;
 
-    /* Parse the line into de-escaped field values */
-    if (cstate->opts.csv_mode)
+    /*
+     * Parse the line into de-escaped field values
+     *
+     * is_csv will be optimized away by compiler, as argument is constant at
+     * caller.
+     */
+    if (is_csv)
         fldct = CopyReadAttributesCSV(cstate);
     else
         fldct = CopyReadAttributesText(cstate);
@@ -830,6 +840,244 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
     return true;
 }
 
+/*
+ * Workhorse for CopyFromTextOneRow() and CopyFromCSVOneRow().
+ *
+ * We use pg_attribute_always_inline to reduce function call overheads.
+ */
+static pg_attribute_always_inline bool
+CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
+                       Datum *values, bool *nulls, bool is_csv)
+{
+    TupleDesc    tupDesc;
+    AttrNumber    attr_count;
+    FmgrInfo   *in_functions = cstate->in_functions;
+    Oid           *typioparams = cstate->typioparams;
+    ExprState **defexprs = cstate->defexprs;
+    char      **field_strings;
+    ListCell   *cur;
+    int            fldct;
+    int            fieldno;
+    char       *string;
+
+    tupDesc = RelationGetDescr(cstate->rel);
+    attr_count = list_length(cstate->attnumlist);
+
+    /* read raw fields in the next line */
+    if (!NextCopyFromRawFields(cstate, &field_strings, &fldct, is_csv))
+        return false;
+
+    /* check for overflowing fields */
+    if (attr_count > 0 && fldct > attr_count)
+        ereport(ERROR,
+                (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                 errmsg("extra data after last expected column")));
+
+    fieldno = 0;
+
+    /* Loop to read the user attributes on the line. */
+    foreach(cur, cstate->attnumlist)
+    {
+        int            attnum = lfirst_int(cur);
+        int            m = attnum - 1;
+        Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+        if (fieldno >= fldct)
+            ereport(ERROR,
+                    (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                     errmsg("missing data for column \"%s\"",
+                            NameStr(att->attname))));
+        string = field_strings[fieldno++];
+
+        if (cstate->convert_select_flags &&
+            !cstate->convert_select_flags[m])
+        {
+            /* ignore input field, leaving column as NULL */
+            continue;
+        }
+
+        if (is_csv)
+        {
+            if (string == NULL &&
+                cstate->opts.force_notnull_flags[m])
+            {
+                /*
+                 * FORCE_NOT_NULL option is set and column is NULL - convert
+                 * it to the NULL string.
+                 */
+                string = cstate->opts.null_print;
+            }
+            else if (string != NULL && cstate->opts.force_null_flags[m]
+                     && strcmp(string, cstate->opts.null_print) == 0)
+            {
+                /*
+                 * FORCE_NULL option is set and column matches the NULL
+                 * string. It must have been quoted, or otherwise the string
+                 * would already have been set to NULL. Convert it to NULL as
+                 * specified.
+                 */
+                string = NULL;
+            }
+        }
+
+        cstate->cur_attname = NameStr(att->attname);
+        cstate->cur_attval = string;
+
+        if (string != NULL)
+            nulls[m] = false;
+
+        if (cstate->defaults[m])
+        {
+            /*
+             * The caller must supply econtext and have switched into the
+             * per-tuple memory context in it.
+             */
+            Assert(econtext != NULL);
+            Assert(CurrentMemoryContext == econtext->ecxt_per_tuple_memory);
+
+            values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]);
+        }
+
+        /*
+         * If ON_ERROR is specified with IGNORE, skip rows with soft errors
+         */
+        else if (!InputFunctionCallSafe(&in_functions[m],
+                                        string,
+                                        typioparams[m],
+                                        att->atttypmod,
+                                        (Node *) cstate->escontext,
+                                        &values[m]))
+        {
+            Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
+
+            cstate->num_errors++;
+
+            if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+            {
+                /*
+                 * Since we emit line number and column info in the below
+                 * notice message, we suppress error context information other
+                 * than the relation name.
+                 */
+                Assert(!cstate->relname_only);
+                cstate->relname_only = true;
+
+                if (cstate->cur_attval)
+                {
+                    char       *attval;
+
+                    attval = CopyLimitPrintoutLength(cstate->cur_attval);
+                    ereport(NOTICE,
+                            errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\":
\"%s\"",
+                                   (unsigned long long) cstate->cur_lineno,
+                                   cstate->cur_attname,
+                                   attval));
+                    pfree(attval);
+                }
+                else
+                    ereport(NOTICE,
+                            errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": null
input",
+                                   (unsigned long long) cstate->cur_lineno,
+                                   cstate->cur_attname));
+
+                /* reset relname_only */
+                cstate->relname_only = false;
+            }
+
+            return true;
+        }
+
+        cstate->cur_attname = NULL;
+        cstate->cur_attval = NULL;
+    }
+
+    Assert(fieldno == attr_count);
+
+    return true;
+}
+
+/* Implementation of the per-row callback for text format */
+bool
+CopyFromTextOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values,
+                   bool *nulls)
+{
+    return CopyFromTextLikeOneRow(cstate, econtext, values, nulls, false);
+}
+
+/* Implementation of the per-row callback for CSV format */
+bool
+CopyFromCSVOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values,
+                  bool *nulls)
+{
+    return CopyFromTextLikeOneRow(cstate, econtext, values, nulls, true);
+}
+
+/* Implementation of the per-row callback for binary format */
+bool
+CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values,
+                     bool *nulls)
+{
+    TupleDesc    tupDesc;
+    AttrNumber    attr_count;
+    FmgrInfo   *in_functions = cstate->in_functions;
+    Oid           *typioparams = cstate->typioparams;
+    int16        fld_count;
+    ListCell   *cur;
+
+    tupDesc = RelationGetDescr(cstate->rel);
+    attr_count = list_length(cstate->attnumlist);
+
+    cstate->cur_lineno++;
+
+    if (!CopyGetInt16(cstate, &fld_count))
+    {
+        /* EOF detected (end of file, or protocol-level EOF) */
+        return false;
+    }
+
+    if (fld_count == -1)
+    {
+        /*
+         * Received EOF marker.  Wait for the protocol-level EOF, and complain
+         * if it doesn't come immediately.  In COPY FROM STDIN, this ensures
+         * that we correctly handle CopyFail, if client chooses to send that
+         * now.  When copying from file, we could ignore the rest of the file
+         * like in text mode, but we choose to be consistent with the COPY
+         * FROM STDIN case.
+         */
+        char        dummy;
+
+        if (CopyReadBinaryData(cstate, &dummy, 1) > 0)
+            ereport(ERROR,
+                    (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                     errmsg("received copy data after EOF marker")));
+        return false;
+    }
+
+    if (fld_count != attr_count)
+        ereport(ERROR,
+                (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                 errmsg("row field count is %d, expected %d",
+                        (int) fld_count, attr_count)));
+
+    foreach(cur, cstate->attnumlist)
+    {
+        int            attnum = lfirst_int(cur);
+        int            m = attnum - 1;
+        Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+        cstate->cur_attname = NameStr(att->attname);
+        values[m] = CopyReadBinaryAttribute(cstate,
+                                            &in_functions[m],
+                                            typioparams[m],
+                                            att->atttypmod,
+                                            &nulls[m]);
+        cstate->cur_attname = NULL;
+    }
+
+    return true;
+}
+
 /*
  * Read next tuple from file for COPY FROM. Return false if no more tuples.
  *
@@ -847,216 +1095,22 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 {
     TupleDesc    tupDesc;
     AttrNumber    num_phys_attrs,
-                attr_count,
                 num_defaults = cstate->num_defaults;
-    FmgrInfo   *in_functions = cstate->in_functions;
-    Oid           *typioparams = cstate->typioparams;
     int            i;
     int           *defmap = cstate->defmap;
     ExprState **defexprs = cstate->defexprs;
 
     tupDesc = RelationGetDescr(cstate->rel);
     num_phys_attrs = tupDesc->natts;
-    attr_count = list_length(cstate->attnumlist);
 
     /* Initialize all values for row to NULL */
     MemSet(values, 0, num_phys_attrs * sizeof(Datum));
     MemSet(nulls, true, num_phys_attrs * sizeof(bool));
     MemSet(cstate->defaults, false, num_phys_attrs * sizeof(bool));
 
-    if (!cstate->opts.binary)
-    {
-        char      **field_strings;
-        ListCell   *cur;
-        int            fldct;
-        int            fieldno;
-        char       *string;
-
-        /* read raw fields in the next line */
-        if (!NextCopyFromRawFields(cstate, &field_strings, &fldct))
-            return false;
-
-        /* check for overflowing fields */
-        if (attr_count > 0 && fldct > attr_count)
-            ereport(ERROR,
-                    (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                     errmsg("extra data after last expected column")));
-
-        fieldno = 0;
-
-        /* Loop to read the user attributes on the line. */
-        foreach(cur, cstate->attnumlist)
-        {
-            int            attnum = lfirst_int(cur);
-            int            m = attnum - 1;
-            Form_pg_attribute att = TupleDescAttr(tupDesc, m);
-
-            if (fieldno >= fldct)
-                ereport(ERROR,
-                        (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                         errmsg("missing data for column \"%s\"",
-                                NameStr(att->attname))));
-            string = field_strings[fieldno++];
-
-            if (cstate->convert_select_flags &&
-                !cstate->convert_select_flags[m])
-            {
-                /* ignore input field, leaving column as NULL */
-                continue;
-            }
-
-            if (cstate->opts.csv_mode)
-            {
-                if (string == NULL &&
-                    cstate->opts.force_notnull_flags[m])
-                {
-                    /*
-                     * FORCE_NOT_NULL option is set and column is NULL -
-                     * convert it to the NULL string.
-                     */
-                    string = cstate->opts.null_print;
-                }
-                else if (string != NULL && cstate->opts.force_null_flags[m]
-                         && strcmp(string, cstate->opts.null_print) == 0)
-                {
-                    /*
-                     * FORCE_NULL option is set and column matches the NULL
-                     * string. It must have been quoted, or otherwise the
-                     * string would already have been set to NULL. Convert it
-                     * to NULL as specified.
-                     */
-                    string = NULL;
-                }
-            }
-
-            cstate->cur_attname = NameStr(att->attname);
-            cstate->cur_attval = string;
-
-            if (string != NULL)
-                nulls[m] = false;
-
-            if (cstate->defaults[m])
-            {
-                /*
-                 * The caller must supply econtext and have switched into the
-                 * per-tuple memory context in it.
-                 */
-                Assert(econtext != NULL);
-                Assert(CurrentMemoryContext == econtext->ecxt_per_tuple_memory);
-
-                values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]);
-            }
-
-            /*
-             * If ON_ERROR is specified with IGNORE, skip rows with soft
-             * errors
-             */
-            else if (!InputFunctionCallSafe(&in_functions[m],
-                                            string,
-                                            typioparams[m],
-                                            att->atttypmod,
-                                            (Node *) cstate->escontext,
-                                            &values[m]))
-            {
-                Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
-
-                cstate->num_errors++;
-
-                if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
-                {
-                    /*
-                     * Since we emit line number and column info in the below
-                     * notice message, we suppress error context information
-                     * other than the relation name.
-                     */
-                    Assert(!cstate->relname_only);
-                    cstate->relname_only = true;
-
-                    if (cstate->cur_attval)
-                    {
-                        char       *attval;
-
-                        attval = CopyLimitPrintoutLength(cstate->cur_attval);
-                        ereport(NOTICE,
-                                errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\":
\"%s\"",
-                                       (unsigned long long) cstate->cur_lineno,
-                                       cstate->cur_attname,
-                                       attval));
-                        pfree(attval);
-                    }
-                    else
-                        ereport(NOTICE,
-                                errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\":
nullinput",
 
-                                       (unsigned long long) cstate->cur_lineno,
-                                       cstate->cur_attname));
-
-                    /* reset relname_only */
-                    cstate->relname_only = false;
-                }
-
-                return true;
-            }
-
-            cstate->cur_attname = NULL;
-            cstate->cur_attval = NULL;
-        }
-
-        Assert(fieldno == attr_count);
-    }
-    else
-    {
-        /* binary */
-        int16        fld_count;
-        ListCell   *cur;
-
-        cstate->cur_lineno++;
-
-        if (!CopyGetInt16(cstate, &fld_count))
-        {
-            /* EOF detected (end of file, or protocol-level EOF) */
-            return false;
-        }
-
-        if (fld_count == -1)
-        {
-            /*
-             * Received EOF marker.  Wait for the protocol-level EOF, and
-             * complain if it doesn't come immediately.  In COPY FROM STDIN,
-             * this ensures that we correctly handle CopyFail, if client
-             * chooses to send that now.  When copying from file, we could
-             * ignore the rest of the file like in text mode, but we choose to
-             * be consistent with the COPY FROM STDIN case.
-             */
-            char        dummy;
-
-            if (CopyReadBinaryData(cstate, &dummy, 1) > 0)
-                ereport(ERROR,
-                        (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                         errmsg("received copy data after EOF marker")));
-            return false;
-        }
-
-        if (fld_count != attr_count)
-            ereport(ERROR,
-                    (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                     errmsg("row field count is %d, expected %d",
-                            (int) fld_count, attr_count)));
-
-        foreach(cur, cstate->attnumlist)
-        {
-            int            attnum = lfirst_int(cur);
-            int            m = attnum - 1;
-            Form_pg_attribute att = TupleDescAttr(tupDesc, m);
-
-            cstate->cur_attname = NameStr(att->attname);
-            values[m] = CopyReadBinaryAttribute(cstate,
-                                                &in_functions[m],
-                                                typioparams[m],
-                                                att->atttypmod,
-                                                &nulls[m]);
-            cstate->cur_attname = NULL;
-        }
-    }
+    /* Get one row from source */
+    if (!cstate->routine->CopyFromOneRow(cstate, econtext, values, nulls))
+        return false;
 
     /*
      * Now compute and insert any defaults available for the columns not
@@ -1087,7 +1141,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
  * in the final value of line_buf.
  */
 static bool
-CopyReadLine(CopyFromState cstate)
+CopyReadLine(CopyFromState cstate, bool is_csv)
 {
     bool        result;
 
@@ -1095,7 +1149,7 @@ CopyReadLine(CopyFromState cstate)
     cstate->line_buf_valid = false;
 
     /* Parse data and transfer into line_buf */
-    result = CopyReadLineText(cstate);
+    result = CopyReadLineText(cstate, is_csv);
 
     if (result)
     {
@@ -1163,7 +1217,7 @@ CopyReadLine(CopyFromState cstate)
  * CopyReadLineText - inner loop of CopyReadLine for text mode
  */
 static bool
-CopyReadLineText(CopyFromState cstate)
+CopyReadLineText(CopyFromState cstate, bool is_csv)
 {
     char       *copy_input_buf;
     int            input_buf_ptr;
@@ -1178,7 +1232,11 @@ CopyReadLineText(CopyFromState cstate)
     char        quotec = '\0';
     char        escapec = '\0';
 
-    if (cstate->opts.csv_mode)
+    /*
+     * is_csv will be optimized away by compiler, as argument is constant at
+     * caller.
+     */
+    if (is_csv)
     {
         quotec = cstate->opts.quote[0];
         escapec = cstate->opts.escape[0];
@@ -1255,7 +1313,11 @@ CopyReadLineText(CopyFromState cstate)
         prev_raw_ptr = input_buf_ptr;
         c = copy_input_buf[input_buf_ptr++];
 
-        if (cstate->opts.csv_mode)
+        /*
+         * is_csv will be optimized away by compiler, as argument is constant
+         * at caller.
+         */
+        if (is_csv)
         {
             /*
              * If character is '\r', we may need to look ahead below.  Force
@@ -1294,7 +1356,7 @@ CopyReadLineText(CopyFromState cstate)
         }
 
         /* Process \r */
-        if (c == '\r' && (!cstate->opts.csv_mode || !in_quote))
+        if (c == '\r' && (!is_csv || !in_quote))
         {
             /* Check for \r\n on first line, _and_ handle \r\n. */
             if (cstate->eol_type == EOL_UNKNOWN ||
@@ -1322,10 +1384,10 @@ CopyReadLineText(CopyFromState cstate)
                     if (cstate->eol_type == EOL_CRNL)
                         ereport(ERROR,
                                 (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                                 !cstate->opts.csv_mode ?
+                                 !is_csv ?
                                  errmsg("literal carriage return found in data") :
                                  errmsg("unquoted carriage return found in data"),
-                                 !cstate->opts.csv_mode ?
+                                 !is_csv ?
                                  errhint("Use \"\\r\" to represent carriage return.") :
                                  errhint("Use quoted CSV field to represent carriage return.")));
 
@@ -1339,10 +1401,10 @@ CopyReadLineText(CopyFromState cstate)
             else if (cstate->eol_type == EOL_NL)
                 ereport(ERROR,
                         (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                         !cstate->opts.csv_mode ?
+                         !is_csv ?
                          errmsg("literal carriage return found in data") :
                          errmsg("unquoted carriage return found in data"),
-                         !cstate->opts.csv_mode ?
+                         !is_csv ?
                          errhint("Use \"\\r\" to represent carriage return.") :
                          errhint("Use quoted CSV field to represent carriage return.")));
             /* If reach here, we have found the line terminator */
@@ -1350,15 +1412,15 @@ CopyReadLineText(CopyFromState cstate)
         }
 
         /* Process \n */
-        if (c == '\n' && (!cstate->opts.csv_mode || !in_quote))
+        if (c == '\n' && (!is_csv || !in_quote))
         {
             if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL)
                 ereport(ERROR,
                         (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                         !cstate->opts.csv_mode ?
+                         !is_csv ?
                          errmsg("literal newline found in data") :
                          errmsg("unquoted newline found in data"),
-                         !cstate->opts.csv_mode ?
+                         !is_csv ?
                          errhint("Use \"\\n\" to represent newline.") :
                          errhint("Use quoted CSV field to represent newline.")));
             cstate->eol_type = EOL_NL;    /* in case not set yet */
@@ -1370,7 +1432,7 @@ CopyReadLineText(CopyFromState cstate)
          * Process backslash, except in CSV mode where backslash is a normal
          * character.
          */
-        if (c == '\\' && !cstate->opts.csv_mode)
+        if (c == '\\' && !is_csv)
         {
             char        c2;
 
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 4002a7f5382..f2409013fba 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -107,8 +107,6 @@ extern CopyFromState BeginCopyFrom(ParseState *pstate, Relation rel, Node *where
 extern void EndCopyFrom(CopyFromState cstate);
 extern bool NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
                          Datum *values, bool *nulls);
-extern bool NextCopyFromRawFields(CopyFromState cstate,
-                                  char ***fields, int *nfields);
 extern void CopyFromErrorCallback(void *arg);
 extern char *CopyLimitPrintoutLength(const char *str);
 
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index eccc875d0e8..19aacc8ddd3 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -1,7 +1,7 @@
 /*-------------------------------------------------------------------------
  *
  * copyapi.h
- *      API for COPY TO handlers
+ *      API for COPY TO/FROM handlers
  *
  *
  * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
@@ -54,4 +54,50 @@ typedef struct CopyToRoutine
     void        (*CopyToEnd) (CopyToState cstate);
 } CopyToRoutine;
 
+/*
+ * API structure for a COPY FROM format implementation.     Note this must be
+ * allocated in a server-lifetime manner, typically as a static const struct.
+ */
+typedef struct CopyFromRoutine
+{
+    /*
+     * Set input function information. This callback is called once at the
+     * beginning of COPY FROM.
+     *
+     * 'finfo' can be optionally filled to provide the catalog information of
+     * the input function.
+     *
+     * 'typioparam' can be optionally filled to define the OID of the type to
+     * pass to the input function.'atttypid' is the OID of data type used by
+     * the relation's attribute.
+     */
+    void        (*CopyFromInFunc) (CopyFromState cstate, Oid atttypid,
+                                   FmgrInfo *finfo, Oid *typioparam);
+
+    /*
+     * Start a COPY FROM. This callback is called once at the beginning of
+     * COPY FROM.
+     *
+     * 'tupDesc' is the tuple descriptor of the relation where the data needs
+     * to be copied.  This can be used for any initialization steps required
+     * by a format.
+     */
+    void        (*CopyFromStart) (CopyFromState cstate, TupleDesc tupDesc);
+
+    /*
+     * Read one row from the source and fill *values and *nulls.
+     *
+     * 'econtext' is used to evaluate default expression for each column that
+     * is either not read from the file or is using the DEFAULT option of COPY
+     * FROM.  It is NULL if no default values are used.
+     *
+     * Returns false if there are no more tuples to read.
+     */
+    bool        (*CopyFromOneRow) (CopyFromState cstate, ExprContext *econtext,
+                                   Datum *values, bool *nulls);
+
+    /* End a COPY FROM. This callback is called once at the end of COPY FROM */
+    void        (*CopyFromEnd) (CopyFromState cstate);
+} CopyFromRoutine;
+
 #endif                            /* COPYAPI_H */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index cad52fcc783..1ca058c6add 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -14,7 +14,7 @@
 #ifndef COPYFROM_INTERNAL_H
 #define COPYFROM_INTERNAL_H
 
-#include "commands/copy.h"
+#include "commands/copyapi.h"
 #include "commands/trigger.h"
 #include "nodes/miscnodes.h"
 
@@ -58,6 +58,9 @@ typedef enum CopyInsertMethod
  */
 typedef struct CopyFromStateData
 {
+    /* format routine */
+    const CopyFromRoutine *routine;
+
     /* low-level state data */
     CopySource    copy_src;        /* type of copy source */
     FILE       *copy_file;        /* used if copy_src == COPY_FILE */
@@ -183,4 +186,12 @@ typedef struct CopyFromStateData
 extern void ReceiveCopyBegin(CopyFromState cstate);
 extern void ReceiveCopyBinaryHeader(CopyFromState cstate);
 
+/* One-row callbacks for built-in formats defined in copyfromparse.c */
+extern bool CopyFromTextOneRow(CopyFromState cstate, ExprContext *econtext,
+                               Datum *values, bool *nulls);
+extern bool CopyFromCSVOneRow(CopyFromState cstate, ExprContext *econtext,
+                              Datum *values, bool *nulls);
+extern bool CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext,
+                                 Datum *values, bool *nulls);
+
 #endif                            /* COPYFROM_INTERNAL_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 8edb41cce2e..e09407c7463 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -492,6 +492,7 @@ ConvertRowtypeExpr
 CookedConstraint
 CopyDest
 CopyFormatOptions
+CopyFromRoutine
 CopyFromState
 CopyFromStateData
 CopyHeaderChoice
-- 
2.45.2
From d146c9f428163be276b7fac6b046dabe3d466374 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Mon, 25 Nov 2024 12:19:15 +0900
Subject: [PATCH v26 3/8] Add support for adding custom COPY TO format
This uses the handler approach like tablesample. The approach creates
an internal function that returns an internal struct. In this case,
a COPY TO handler returns a CopyToRoutine.
This also add a test module for custom COPY TO handler.
---
 src/backend/commands/copy.c                   | 82 ++++++++++++++++---
 src/backend/commands/copyto.c                 |  4 +-
 src/backend/nodes/Makefile                    |  1 +
 src/backend/nodes/gen_node_support.pl         |  2 +
 src/backend/utils/adt/pseudotypes.c           |  1 +
 src/include/catalog/pg_proc.dat               |  6 ++
 src/include/catalog/pg_type.dat               |  6 ++
 src/include/commands/copy.h                   |  1 +
 src/include/commands/copyapi.h                |  2 +
 src/include/nodes/meson.build                 |  1 +
 src/test/modules/Makefile                     |  1 +
 src/test/modules/meson.build                  |  1 +
 src/test/modules/test_copy_format/.gitignore  |  4 +
 src/test/modules/test_copy_format/Makefile    | 23 ++++++
 .../expected/test_copy_format.out             | 17 ++++
 src/test/modules/test_copy_format/meson.build | 33 ++++++++
 .../test_copy_format/sql/test_copy_format.sql |  5 ++
 .../test_copy_format--1.0.sql                 |  8 ++
 .../test_copy_format/test_copy_format.c       | 63 ++++++++++++++
 .../test_copy_format/test_copy_format.control |  4 +
 20 files changed, 251 insertions(+), 14 deletions(-)
 mode change 100644 => 100755 src/backend/nodes/gen_node_support.pl
 create mode 100644 src/test/modules/test_copy_format/.gitignore
 create mode 100644 src/test/modules/test_copy_format/Makefile
 create mode 100644 src/test/modules/test_copy_format/expected/test_copy_format.out
 create mode 100644 src/test/modules/test_copy_format/meson.build
 create mode 100644 src/test/modules/test_copy_format/sql/test_copy_format.sql
 create mode 100644 src/test/modules/test_copy_format/test_copy_format--1.0.sql
 create mode 100644 src/test/modules/test_copy_format/test_copy_format.c
 create mode 100644 src/test/modules/test_copy_format/test_copy_format.control
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 2d98ecf3f4e..d4906b44751 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -32,6 +32,7 @@
 #include "parser/parse_coerce.h"
 #include "parser/parse_collate.h"
 #include "parser/parse_expr.h"
+#include "parser/parse_func.h"
 #include "parser/parse_relation.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
@@ -476,6 +477,73 @@ defGetCopyLogVerbosityChoice(DefElem *def, ParseState *pstate)
     return COPY_LOG_VERBOSITY_DEFAULT;    /* keep compiler quiet */
 }
 
+/*
+ * Process the "format" option.
+ *
+ * This function checks whether the option value is a built-in format such as
+ * "text" and "csv" or not. If the option value isn't a built-in format, this
+ * function finds a COPY format handler that returns a CopyToRoutine (for
+ * is_from == false). If no COPY format handler is found, this function
+ * reports an error.
+ */
+static void
+ProcessCopyOptionFormat(ParseState *pstate,
+                        CopyFormatOptions *opts_out,
+                        bool is_from,
+                        DefElem *defel)
+{
+    char       *format;
+    Oid            funcargtypes[1];
+    Oid            handlerOid = InvalidOid;
+    Datum        datum;
+    Node       *routine;
+
+    format = defGetString(defel);
+
+    /* built-in formats */
+    if (strcmp(format, "text") == 0)
+         /* default format */ return;
+    else if (strcmp(format, "csv") == 0)
+    {
+        opts_out->csv_mode = true;
+        return;
+    }
+    else if (strcmp(format, "binary") == 0)
+    {
+        opts_out->binary = true;
+        return;
+    }
+
+    /* custom format */
+    if (!is_from)
+    {
+        funcargtypes[0] = INTERNALOID;
+        handlerOid = LookupFuncName(list_make1(makeString(format)), 1,
+                                    funcargtypes, true);
+    }
+    if (!OidIsValid(handlerOid))
+        ereport(ERROR,
+                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                 errmsg("COPY format \"%s\" not recognized", format),
+                 parser_errposition(pstate, defel->location)));
+
+    datum = OidFunctionCall1(handlerOid, BoolGetDatum(is_from));
+    routine = (Node *) DatumGetPointer(datum);
+    if (routine == NULL || !IsA(routine, CopyToRoutine))
+        ereport(
+                ERROR,
+                (errcode(
+                         ERRCODE_INVALID_PARAMETER_VALUE),
+                 errmsg("COPY handler function "
+                        "%s(%u) did not return a "
+                        "CopyToRoutine struct",
+                        format, handlerOid),
+                 parser_errposition(
+                                    pstate, defel->location)));
+
+    opts_out->routine = routine;
+}
+
 /*
  * Process the statement option list for COPY.
  *
@@ -519,22 +587,10 @@ ProcessCopyOptions(ParseState *pstate,
 
         if (strcmp(defel->defname, "format") == 0)
         {
-            char       *fmt = defGetString(defel);
-
             if (format_specified)
                 errorConflictingDefElem(defel, pstate);
             format_specified = true;
-            if (strcmp(fmt, "text") == 0)
-                 /* default format */ ;
-            else if (strcmp(fmt, "csv") == 0)
-                opts_out->csv_mode = true;
-            else if (strcmp(fmt, "binary") == 0)
-                opts_out->binary = true;
-            else
-                ereport(ERROR,
-                        (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-                         errmsg("COPY format \"%s\" not recognized", fmt),
-                         parser_errposition(pstate, defel->location)));
+            ProcessCopyOptionFormat(pstate, opts_out, is_from, defel);
         }
         else if (strcmp(defel->defname, "freeze") == 0)
         {
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index f81dadcc12b..ce3dd252c32 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -176,7 +176,9 @@ static const CopyToRoutine CopyToRoutineBinary = {
 static const CopyToRoutine *
 CopyToGetRoutine(CopyFormatOptions opts)
 {
-    if (opts.csv_mode)
+    if (opts.routine)
+        return (const CopyToRoutine *) opts.routine;
+    else if (opts.csv_mode)
         return &CopyToRoutineCSV;
     else if (opts.binary)
         return &CopyToRoutineBinary;
diff --git a/src/backend/nodes/Makefile b/src/backend/nodes/Makefile
index 66bbad8e6e0..173ee11811c 100644
--- a/src/backend/nodes/Makefile
+++ b/src/backend/nodes/Makefile
@@ -49,6 +49,7 @@ node_headers = \
     access/sdir.h \
     access/tableam.h \
     access/tsmapi.h \
+    commands/copyapi.h \
     commands/event_trigger.h \
     commands/trigger.h \
     executor/tuptable.h \
diff --git a/src/backend/nodes/gen_node_support.pl b/src/backend/nodes/gen_node_support.pl
old mode 100644
new mode 100755
index 81df3bdf95f..428ab4f0d93
--- a/src/backend/nodes/gen_node_support.pl
+++ b/src/backend/nodes/gen_node_support.pl
@@ -61,6 +61,7 @@ my @all_input_files = qw(
   access/sdir.h
   access/tableam.h
   access/tsmapi.h
+  commands/copyapi.h
   commands/event_trigger.h
   commands/trigger.h
   executor/tuptable.h
@@ -85,6 +86,7 @@ my @nodetag_only_files = qw(
   access/sdir.h
   access/tableam.h
   access/tsmapi.h
+  commands/copyapi.h
   commands/event_trigger.h
   commands/trigger.h
   executor/tuptable.h
diff --git a/src/backend/utils/adt/pseudotypes.c b/src/backend/utils/adt/pseudotypes.c
index e189e9b79d2..25f24ab95d2 100644
--- a/src/backend/utils/adt/pseudotypes.c
+++ b/src/backend/utils/adt/pseudotypes.c
@@ -370,6 +370,7 @@ PSEUDOTYPE_DUMMY_IO_FUNCS(fdw_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(table_am_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(index_am_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(tsm_handler);
+PSEUDOTYPE_DUMMY_IO_FUNCS(copy_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(internal);
 PSEUDOTYPE_DUMMY_IO_FUNCS(anyelement);
 PSEUDOTYPE_DUMMY_IO_FUNCS(anynonarray);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index cbbe8acd382..959d0301c20 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -7771,6 +7771,12 @@
 { oid => '3312', descr => 'I/O',
   proname => 'tsm_handler_out', prorettype => 'cstring',
   proargtypes => 'tsm_handler', prosrc => 'tsm_handler_out' },
+{ oid => '8753', descr => 'I/O',
+  proname => 'copy_handler_in', proisstrict => 'f', prorettype => 'copy_handler',
+  proargtypes => 'cstring', prosrc => 'copy_handler_in' },
+{ oid => '8754', descr => 'I/O',
+  proname => 'copy_handler_out', prorettype => 'cstring',
+  proargtypes => 'copy_handler', prosrc => 'copy_handler_out' },
 { oid => '267', descr => 'I/O',
   proname => 'table_am_handler_in', proisstrict => 'f',
   prorettype => 'table_am_handler', proargtypes => 'cstring',
diff --git a/src/include/catalog/pg_type.dat b/src/include/catalog/pg_type.dat
index ceff66ccde1..793dd671935 100644
--- a/src/include/catalog/pg_type.dat
+++ b/src/include/catalog/pg_type.dat
@@ -633,6 +633,12 @@
   typcategory => 'P', typinput => 'tsm_handler_in',
   typoutput => 'tsm_handler_out', typreceive => '-', typsend => '-',
   typalign => 'i' },
+{ oid => '8752',
+  descr => 'pseudo-type for the result of a copy to method function',
+  typname => 'copy_handler', typlen => '4', typbyval => 't', typtype => 'p',
+  typcategory => 'P', typinput => 'copy_handler_in',
+  typoutput => 'copy_handler_out', typreceive => '-', typsend => '-',
+  typalign => 'i' },
 { oid => '269',
   descr => 'pseudo-type for the result of a table AM handler function',
   typname => 'table_am_handler', typlen => '4', typbyval => 't', typtype => 'p',
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index f2409013fba..6b740d5b917 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -87,6 +87,7 @@ typedef struct CopyFormatOptions
     CopyLogVerbosityChoice log_verbosity;    /* verbosity of logged messages */
     int64        reject_limit;    /* maximum tolerable number of errors */
     List       *convert_select; /* list of column names (can be NIL) */
+    Node       *routine;        /* CopyToRoutine (can be NULL) */
 } CopyFormatOptions;
 
 /* These are private in commands/copy[from|to].c */
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 19aacc8ddd3..36057b92417 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -24,6 +24,8 @@
  */
 typedef struct CopyToRoutine
 {
+    NodeTag        type;
+
     /*
      * Set output function information. This callback is called once at the
      * beginning of COPY TO.
diff --git a/src/include/nodes/meson.build b/src/include/nodes/meson.build
index b665e55b657..103df1a7873 100644
--- a/src/include/nodes/meson.build
+++ b/src/include/nodes/meson.build
@@ -11,6 +11,7 @@ node_support_input_i = [
   'access/sdir.h',
   'access/tableam.h',
   'access/tsmapi.h',
+  'commands/copyapi.h',
   'commands/event_trigger.h',
   'commands/trigger.h',
   'executor/tuptable.h',
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index c0d3cf0e14b..33e3a49a4fb 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -15,6 +15,7 @@ SUBDIRS = \
           spgist_name_ops \
           test_bloomfilter \
           test_copy_callbacks \
+          test_copy_format \
           test_custom_rmgrs \
           test_ddl_deparse \
           test_dsa \
diff --git a/src/test/modules/meson.build b/src/test/modules/meson.build
index c829b619530..75b6ab1b6a9 100644
--- a/src/test/modules/meson.build
+++ b/src/test/modules/meson.build
@@ -14,6 +14,7 @@ subdir('spgist_name_ops')
 subdir('ssl_passphrase_callback')
 subdir('test_bloomfilter')
 subdir('test_copy_callbacks')
+subdir('test_copy_format')
 subdir('test_custom_rmgrs')
 subdir('test_ddl_deparse')
 subdir('test_dsa')
diff --git a/src/test/modules/test_copy_format/.gitignore b/src/test/modules/test_copy_format/.gitignore
new file mode 100644
index 00000000000..5dcb3ff9723
--- /dev/null
+++ b/src/test/modules/test_copy_format/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/src/test/modules/test_copy_format/Makefile b/src/test/modules/test_copy_format/Makefile
new file mode 100644
index 00000000000..8497f91624d
--- /dev/null
+++ b/src/test/modules/test_copy_format/Makefile
@@ -0,0 +1,23 @@
+# src/test/modules/test_copy_format/Makefile
+
+MODULE_big = test_copy_format
+OBJS = \
+    $(WIN32RES) \
+    test_copy_format.o
+PGFILEDESC = "test_copy_format - test custom COPY FORMAT"
+
+EXTENSION = test_copy_format
+DATA = test_copy_format--1.0.sql
+
+REGRESS = test_copy_format
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_copy_format
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out
b/src/test/modules/test_copy_format/expected/test_copy_format.out
new file mode 100644
index 00000000000..606c78f6878
--- /dev/null
+++ b/src/test/modules/test_copy_format/expected/test_copy_format.out
@@ -0,0 +1,17 @@
+CREATE EXTENSION test_copy_format;
+CREATE TABLE public.test (a smallint, b integer, c bigint);
+INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+COPY public.test FROM stdin WITH (format 'test_copy_format');
+ERROR:  COPY format "test_copy_format" not recognized
+LINE 1: COPY public.test FROM stdin WITH (format 'test_copy_format')...
+                                          ^
+COPY public.test TO stdout WITH (format 'test_copy_format');
+NOTICE:  test_copy_format: is_from=false
+NOTICE:  CopyToOutFunc: atttypid=21
+NOTICE:  CopyToOutFunc: atttypid=23
+NOTICE:  CopyToOutFunc: atttypid=20
+NOTICE:  CopyToStart: natts=3
+NOTICE:  CopyToOneRow: tts_nvalid=3
+NOTICE:  CopyToOneRow: tts_nvalid=3
+NOTICE:  CopyToOneRow: tts_nvalid=3
+NOTICE:  CopyToEnd
diff --git a/src/test/modules/test_copy_format/meson.build b/src/test/modules/test_copy_format/meson.build
new file mode 100644
index 00000000000..4cefe7b709a
--- /dev/null
+++ b/src/test/modules/test_copy_format/meson.build
@@ -0,0 +1,33 @@
+# Copyright (c) 2024, PostgreSQL Global Development Group
+
+test_copy_format_sources = files(
+  'test_copy_format.c',
+)
+
+if host_system == 'windows'
+  test_copy_format_sources += rc_lib_gen.process(win32ver_rc, extra_args: [
+    '--NAME', 'test_copy_format',
+    '--FILEDESC', 'test_copy_format - test custom COPY FORMAT',])
+endif
+
+test_copy_format = shared_module('test_copy_format',
+  test_copy_format_sources,
+  kwargs: pg_test_mod_args,
+)
+test_install_libs += test_copy_format
+
+test_install_data += files(
+  'test_copy_format.control',
+  'test_copy_format--1.0.sql',
+)
+
+tests += {
+  'name': 'test_copy_format',
+  'sd': meson.current_source_dir(),
+  'bd': meson.current_build_dir(),
+  'regress': {
+    'sql': [
+      'test_copy_format',
+    ],
+  },
+}
diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql
b/src/test/modules/test_copy_format/sql/test_copy_format.sql
new file mode 100644
index 00000000000..9406b3be3d4
--- /dev/null
+++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql
@@ -0,0 +1,5 @@
+CREATE EXTENSION test_copy_format;
+CREATE TABLE public.test (a smallint, b integer, c bigint);
+INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+COPY public.test FROM stdin WITH (format 'test_copy_format');
+COPY public.test TO stdout WITH (format 'test_copy_format');
diff --git a/src/test/modules/test_copy_format/test_copy_format--1.0.sql
b/src/test/modules/test_copy_format/test_copy_format--1.0.sql
new file mode 100644
index 00000000000..d24ea03ce99
--- /dev/null
+++ b/src/test/modules/test_copy_format/test_copy_format--1.0.sql
@@ -0,0 +1,8 @@
+/* src/test/modules/test_copy_format/test_copy_format--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_copy_format" to load this file. \quit
+
+CREATE FUNCTION test_copy_format(internal)
+    RETURNS copy_handler
+    AS 'MODULE_PATHNAME' LANGUAGE C;
diff --git a/src/test/modules/test_copy_format/test_copy_format.c
b/src/test/modules/test_copy_format/test_copy_format.c
new file mode 100644
index 00000000000..e064f40473b
--- /dev/null
+++ b/src/test/modules/test_copy_format/test_copy_format.c
@@ -0,0 +1,63 @@
+/*--------------------------------------------------------------------------
+ *
+ * test_copy_format.c
+ *        Code for testing custom COPY format.
+ *
+ * Portions Copyright (c) 2024, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *        src/test/modules/test_copy_format/test_copy_format.c
+ *
+ * -------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "commands/copyapi.h"
+#include "commands/defrem.h"
+
+PG_MODULE_MAGIC;
+
+static void
+CopyToOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo)
+{
+    ereport(NOTICE, (errmsg("CopyToOutFunc: atttypid=%d", atttypid)));
+}
+
+static void
+CopyToStart(CopyToState cstate, TupleDesc tupDesc)
+{
+    ereport(NOTICE, (errmsg("CopyToStart: natts=%d", tupDesc->natts)));
+}
+
+static void
+CopyToOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+    ereport(NOTICE, (errmsg("CopyToOneRow: tts_nvalid=%u", slot->tts_nvalid)));
+}
+
+static void
+CopyToEnd(CopyToState cstate)
+{
+    ereport(NOTICE, (errmsg("CopyToEnd")));
+}
+
+static const CopyToRoutine CopyToRoutineTestCopyFormat = {
+    .type = T_CopyToRoutine,
+    .CopyToOutFunc = CopyToOutFunc,
+    .CopyToStart = CopyToStart,
+    .CopyToOneRow = CopyToOneRow,
+    .CopyToEnd = CopyToEnd,
+};
+
+PG_FUNCTION_INFO_V1(test_copy_format);
+Datum
+test_copy_format(PG_FUNCTION_ARGS)
+{
+    bool        is_from = PG_GETARG_BOOL(0);
+
+    ereport(NOTICE,
+            (errmsg("test_copy_format: is_from=%s", is_from ? "true" : "false")));
+
+    PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat);
+}
diff --git a/src/test/modules/test_copy_format/test_copy_format.control
b/src/test/modules/test_copy_format/test_copy_format.control
new file mode 100644
index 00000000000..f05a6362358
--- /dev/null
+++ b/src/test/modules/test_copy_format/test_copy_format.control
@@ -0,0 +1,4 @@
+comment = 'Test code for custom COPY format'
+default_version = '1.0'
+module_pathname = '$libdir/test_copy_format'
+relocatable = true
-- 
2.45.2
From f7b968816fdff0c3ae3f113ed2404a9f5aac72e9 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Mon, 25 Nov 2024 13:58:33 +0900
Subject: [PATCH v26 4/8] Export CopyToStateData
It's for custom COPY TO format handlers implemented as extension.
This just moves codes. This doesn't change codes except CopyDest enum
values. CopyDest/CopyFrom enum values such as COPY_FILE are conflicted
each other. So COPY_DEST_ prefix instead of COPY_ prefix is used for
CopyDest enum values. For example, COPY_FILE in CopyDest is renamed to
COPY_DEST_FILE.
Note that this isn't enough to implement custom COPY TO format
handlers as extension. We'll do the followings in a subsequent commit:
1. Add an opaque space for custom COPY TO format handler
2. Export CopySendEndOfRow() to flush buffer
---
 src/backend/commands/copyto.c  | 77 ++++------------------------------
 src/include/commands/copy.h    |  2 +-
 src/include/commands/copyapi.h | 62 +++++++++++++++++++++++++++
 3 files changed, 71 insertions(+), 70 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index ce3dd252c32..96b5e144a1d 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -36,67 +36,6 @@
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
-/*
- * Represents the different dest cases we need to worry about at
- * the bottom level
- */
-typedef enum CopyDest
-{
-    COPY_FILE,                    /* to file (or a piped program) */
-    COPY_FRONTEND,                /* to frontend */
-    COPY_CALLBACK,                /* to callback function */
-} CopyDest;
-
-/*
- * This struct contains all the state variables used throughout a COPY TO
- * operation.
- *
- * Multi-byte encodings: all supported client-side encodings encode multi-byte
- * characters by having the first byte's high bit set. Subsequent bytes of the
- * character can have the high bit not set. When scanning data in such an
- * encoding to look for a match to a single-byte (ie ASCII) character, we must
- * use the full pg_encoding_mblen() machinery to skip over multibyte
- * characters, else we might find a false match to a trailing byte. In
- * supported server encodings, there is no possibility of a false match, and
- * it's faster to make useless comparisons to trailing bytes than it is to
- * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true
- * when we have to do it the hard way.
- */
-typedef struct CopyToStateData
-{
-    /* format-specific routines */
-    const CopyToRoutine *routine;
-
-    /* low-level state data */
-    CopyDest    copy_dest;        /* type of copy source/destination */
-    FILE       *copy_file;        /* used if copy_dest == COPY_FILE */
-    StringInfo    fe_msgbuf;        /* used for all dests during COPY TO */
-
-    int            file_encoding;    /* file or remote side's character encoding */
-    bool        need_transcoding;    /* file encoding diff from server? */
-    bool        encoding_embeds_ascii;    /* ASCII can be non-first byte? */
-
-    /* parameters from the COPY command */
-    Relation    rel;            /* relation to copy to */
-    QueryDesc  *queryDesc;        /* executable query to copy from */
-    List       *attnumlist;        /* integer list of attnums to copy */
-    char       *filename;        /* filename, or NULL for STDOUT */
-    bool        is_program;        /* is 'filename' a program to popen? */
-    copy_data_dest_cb data_dest_cb; /* function for writing data */
-
-    CopyFormatOptions opts;
-    Node       *whereClause;    /* WHERE condition (or NULL) */
-
-    /*
-     * Working state
-     */
-    MemoryContext copycontext;    /* per-copy execution context */
-
-    FmgrInfo   *out_functions;    /* lookup info for output functions */
-    MemoryContext rowcontext;    /* per-row evaluation context */
-    uint64        bytes_processed;    /* number of bytes processed so far */
-} CopyToStateData;
-
 /* DestReceiver for COPY (query) TO */
 typedef struct
 {
@@ -406,7 +345,7 @@ SendCopyBegin(CopyToState cstate)
     for (i = 0; i < natts; i++)
         pq_sendint16(&buf, format); /* per-column formats */
     pq_endmessage(&buf);
-    cstate->copy_dest = COPY_FRONTEND;
+    cstate->copy_dest = COPY_DEST_FRONTEND;
 }
 
 static void
@@ -453,7 +392,7 @@ CopySendEndOfRow(CopyToState cstate)
 
     switch (cstate->copy_dest)
     {
-        case COPY_FILE:
+        case COPY_DEST_FILE:
             if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1,
                        cstate->copy_file) != 1 ||
                 ferror(cstate->copy_file))
@@ -487,11 +426,11 @@ CopySendEndOfRow(CopyToState cstate)
                              errmsg("could not write to COPY file: %m")));
             }
             break;
-        case COPY_FRONTEND:
+        case COPY_DEST_FRONTEND:
             /* Dump the accumulated row as one CopyData message */
             (void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len);
             break;
-        case COPY_CALLBACK:
+        case COPY_DEST_CALLBACK:
             cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len);
             break;
     }
@@ -512,7 +451,7 @@ CopySendTextLikeEndOfRow(CopyToState cstate)
 {
     switch (cstate->copy_dest)
     {
-        case COPY_FILE:
+        case COPY_DEST_FILE:
             /* Default line termination depends on platform */
 #ifndef WIN32
             CopySendChar(cstate, '\n');
@@ -520,7 +459,7 @@ CopySendTextLikeEndOfRow(CopyToState cstate)
             CopySendString(cstate, "\r\n");
 #endif
             break;
-        case COPY_FRONTEND:
+        case COPY_DEST_FRONTEND:
             /* The FE/BE protocol uses \n as newline for all platforms */
             CopySendChar(cstate, '\n');
             break;
@@ -904,12 +843,12 @@ BeginCopyTo(ParseState *pstate,
     /* See Multibyte encoding comment above */
     cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding);
 
-    cstate->copy_dest = COPY_FILE;    /* default */
+    cstate->copy_dest = COPY_DEST_FILE; /* default */
 
     if (data_dest_cb)
     {
         progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK;
-        cstate->copy_dest = COPY_CALLBACK;
+        cstate->copy_dest = COPY_DEST_CALLBACK;
         cstate->data_dest_cb = data_dest_cb;
     }
     else if (pipe)
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 6b740d5b917..98aa5707102 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -90,7 +90,7 @@ typedef struct CopyFormatOptions
     Node       *routine;        /* CopyToRoutine (can be NULL) */
 } CopyFormatOptions;
 
-/* These are private in commands/copy[from|to].c */
+/* This is private in commands/copyfrom.c */
 typedef struct CopyFromStateData *CopyFromState;
 typedef struct CopyToStateData *CopyToState;
 
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 36057b92417..1cb2815deab 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -15,6 +15,7 @@
 #define COPYAPI_H
 
 #include "commands/copy.h"
+#include "executor/execdesc.h"
 #include "executor/tuptable.h"
 #include "nodes/execnodes.h"
 
@@ -56,6 +57,67 @@ typedef struct CopyToRoutine
     void        (*CopyToEnd) (CopyToState cstate);
 } CopyToRoutine;
 
+/*
+ * Represents the different dest cases we need to worry about at
+ * the bottom level
+ */
+typedef enum CopyDest
+{
+    COPY_DEST_FILE,                /* to file (or a piped program) */
+    COPY_DEST_FRONTEND,            /* to frontend */
+    COPY_DEST_CALLBACK,            /* to callback function */
+} CopyDest;
+
+/*
+ * This struct contains all the state variables used throughout a COPY TO
+ * operation.
+ *
+ * Multi-byte encodings: all supported client-side encodings encode multi-byte
+ * characters by having the first byte's high bit set. Subsequent bytes of the
+ * character can have the high bit not set. When scanning data in such an
+ * encoding to look for a match to a single-byte (ie ASCII) character, we must
+ * use the full pg_encoding_mblen() machinery to skip over multibyte
+ * characters, else we might find a false match to a trailing byte. In
+ * supported server encodings, there is no possibility of a false match, and
+ * it's faster to make useless comparisons to trailing bytes than it is to
+ * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true
+ * when we have to do it the hard way.
+ */
+typedef struct CopyToStateData
+{
+    /* format-specific routines */
+    const CopyToRoutine *routine;
+
+    /* low-level state data */
+    CopyDest    copy_dest;        /* type of copy source/destination */
+    FILE       *copy_file;        /* used if copy_dest == COPY_FILE */
+    StringInfo    fe_msgbuf;        /* used for all dests during COPY TO */
+
+    int            file_encoding;    /* file or remote side's character encoding */
+    bool        need_transcoding;    /* file encoding diff from server? */
+    bool        encoding_embeds_ascii;    /* ASCII can be non-first byte? */
+
+    /* parameters from the COPY command */
+    Relation    rel;            /* relation to copy to */
+    QueryDesc  *queryDesc;        /* executable query to copy from */
+    List       *attnumlist;        /* integer list of attnums to copy */
+    char       *filename;        /* filename, or NULL for STDOUT */
+    bool        is_program;        /* is 'filename' a program to popen? */
+    copy_data_dest_cb data_dest_cb; /* function for writing data */
+
+    CopyFormatOptions opts;
+    Node       *whereClause;    /* WHERE condition (or NULL) */
+
+    /*
+     * Working state
+     */
+    MemoryContext copycontext;    /* per-copy execution context */
+
+    FmgrInfo   *out_functions;    /* lookup info for output functions */
+    MemoryContext rowcontext;    /* per-row evaluation context */
+    uint64        bytes_processed;    /* number of bytes processed so far */
+} CopyToStateData;
+
 /*
  * API structure for a COPY FROM format implementation.     Note this must be
  * allocated in a server-lifetime manner, typically as a static const struct.
-- 
2.45.2
From b42052e4372871d10449ba2d70b738c6970d5d42 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Mon, 25 Nov 2024 14:01:18 +0900
Subject: [PATCH v26 5/8] Add support for implementing custom COPY TO format as
 extension
* Add CopyToStateData::opaque that can be used to keep data for custom
  COPY TO format implementation
* Export CopySendEndOfRow() to flush data in CopyToStateData::fe_msgbuf
  as CopyToStateFlush()
---
 src/backend/commands/copyto.c  | 12 ++++++++++++
 src/include/commands/copyapi.h |  5 +++++
 2 files changed, 17 insertions(+)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 96b5e144a1d..cb9bfa0053f 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -442,6 +442,18 @@ CopySendEndOfRow(CopyToState cstate)
     resetStringInfo(fe_msgbuf);
 }
 
+/*
+ * Export CopySendEndOfRow() for extensions. We want to keep
+ * CopySendEndOfRow() as a static function for
+ * optimization. CopySendEndOfRow() calls in this file may be optimized by a
+ * compiler.
+ */
+void
+CopyToStateFlush(CopyToState cstate)
+{
+    CopySendEndOfRow(cstate);
+}
+
 /*
  * Wrapper function of CopySendEndOfRow for text and CSV formats. Sends the
  * the line termination and do common appropriate things for the end of row.
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 1cb2815deab..030a82aca7f 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -116,8 +116,13 @@ typedef struct CopyToStateData
     FmgrInfo   *out_functions;    /* lookup info for output functions */
     MemoryContext rowcontext;    /* per-row evaluation context */
     uint64        bytes_processed;    /* number of bytes processed so far */
+
+    /* For custom format implementation */
+    void       *opaque;            /* private space */
 } CopyToStateData;
 
+extern void CopyToStateFlush(CopyToState cstate);
+
 /*
  * API structure for a COPY FROM format implementation.     Note this must be
  * allocated in a server-lifetime manner, typically as a static const struct.
-- 
2.45.2
From 38efc9937a0cff782683080bc8b9fbe62290ff6c Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Mon, 25 Nov 2024 14:11:55 +0900
Subject: [PATCH v26 6/8] Add support for adding custom COPY FROM format
This uses the same handler for COPY TO and COPY FROM but uses
different routine. This uses CopyToRoutine for COPY TO and
CopyFromRoutine for COPY FROM. PostgreSQL calls a COPY TO/FROM handler
with "is_from" argument. It's true for COPY FROM and false for COPY
TO:
    copy_handler(true) returns CopyToRoutine
    copy_handler(false) returns CopyFromRoutine
This also add a test module for custom COPY FROM handler.
---
 src/backend/commands/copy.c                   | 52 ++++++++++++-------
 src/backend/commands/copyfrom.c               |  4 +-
 src/include/catalog/pg_type.dat               |  2 +-
 src/include/commands/copy.h                   |  3 +-
 src/include/commands/copyapi.h                |  2 +
 .../expected/test_copy_format.out             | 10 ++--
 .../test_copy_format/sql/test_copy_format.sql |  1 +
 .../test_copy_format/test_copy_format.c       | 39 +++++++++++++-
 8 files changed, 87 insertions(+), 26 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index d4906b44751..5be649c9c89 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -483,8 +483,8 @@ defGetCopyLogVerbosityChoice(DefElem *def, ParseState *pstate)
  * This function checks whether the option value is a built-in format such as
  * "text" and "csv" or not. If the option value isn't a built-in format, this
  * function finds a COPY format handler that returns a CopyToRoutine (for
- * is_from == false). If no COPY format handler is found, this function
- * reports an error.
+ * is_from == false) or CopyFromRountine (for is_from == true). If no COPY
+ * format handler is found, this function reports an error.
  */
 static void
 ProcessCopyOptionFormat(ParseState *pstate,
@@ -515,12 +515,9 @@ ProcessCopyOptionFormat(ParseState *pstate,
     }
 
     /* custom format */
-    if (!is_from)
-    {
-        funcargtypes[0] = INTERNALOID;
-        handlerOid = LookupFuncName(list_make1(makeString(format)), 1,
-                                    funcargtypes, true);
-    }
+    funcargtypes[0] = INTERNALOID;
+    handlerOid = LookupFuncName(list_make1(makeString(format)), 1,
+                                funcargtypes, true);
     if (!OidIsValid(handlerOid))
         ereport(ERROR,
                 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -529,17 +526,34 @@ ProcessCopyOptionFormat(ParseState *pstate,
 
     datum = OidFunctionCall1(handlerOid, BoolGetDatum(is_from));
     routine = (Node *) DatumGetPointer(datum);
-    if (routine == NULL || !IsA(routine, CopyToRoutine))
-        ereport(
-                ERROR,
-                (errcode(
-                         ERRCODE_INVALID_PARAMETER_VALUE),
-                 errmsg("COPY handler function "
-                        "%s(%u) did not return a "
-                        "CopyToRoutine struct",
-                        format, handlerOid),
-                 parser_errposition(
-                                    pstate, defel->location)));
+    if (is_from)
+    {
+        if (routine == NULL || !IsA(routine, CopyFromRoutine))
+            ereport(
+                    ERROR,
+                    (errcode(
+                             ERRCODE_INVALID_PARAMETER_VALUE),
+                     errmsg("COPY handler function "
+                            "%s(%u) did not return a "
+                            "CopyFromRoutine struct",
+                            format, handlerOid),
+                     parser_errposition(
+                                        pstate, defel->location)));
+    }
+    else
+    {
+        if (routine == NULL || !IsA(routine, CopyToRoutine))
+            ereport(
+                    ERROR,
+                    (errcode(
+                             ERRCODE_INVALID_PARAMETER_VALUE),
+                     errmsg("COPY handler function "
+                            "%s(%u) did not return a "
+                            "CopyToRoutine struct",
+                            format, handlerOid),
+                     parser_errposition(
+                                        pstate, defel->location)));
+    }
 
     opts_out->routine = routine;
 }
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index c84081c3ba3..a4cdab75879 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -155,7 +155,9 @@ static const CopyFromRoutine CopyFromRoutineBinary = {
 static const CopyFromRoutine *
 CopyFromGetRoutine(CopyFormatOptions opts)
 {
-    if (opts.csv_mode)
+    if (opts.routine)
+        return (const CopyFromRoutine *) opts.routine;
+    else if (opts.csv_mode)
         return &CopyFromRoutineCSV;
     else if (opts.binary)
         return &CopyFromRoutineBinary;
diff --git a/src/include/catalog/pg_type.dat b/src/include/catalog/pg_type.dat
index 793dd671935..37ebfa0908f 100644
--- a/src/include/catalog/pg_type.dat
+++ b/src/include/catalog/pg_type.dat
@@ -634,7 +634,7 @@
   typoutput => 'tsm_handler_out', typreceive => '-', typsend => '-',
   typalign => 'i' },
 { oid => '8752',
-  descr => 'pseudo-type for the result of a copy to method function',
+  descr => 'pseudo-type for the result of a copy to/from method function',
   typname => 'copy_handler', typlen => '4', typbyval => 't', typtype => 'p',
   typcategory => 'P', typinput => 'copy_handler_in',
   typoutput => 'copy_handler_out', typreceive => '-', typsend => '-',
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 98aa5707102..e07988a0c74 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -87,7 +87,8 @@ typedef struct CopyFormatOptions
     CopyLogVerbosityChoice log_verbosity;    /* verbosity of logged messages */
     int64        reject_limit;    /* maximum tolerable number of errors */
     List       *convert_select; /* list of column names (can be NIL) */
-    Node       *routine;        /* CopyToRoutine (can be NULL) */
+    Node       *routine;        /* CopyToRoutine or CopyFromRoutine (can be
+                                 * NULL) */
 } CopyFormatOptions;
 
 /* This is private in commands/copyfrom.c */
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 030a82aca7f..fa3d8d87760 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -129,6 +129,8 @@ extern void CopyToStateFlush(CopyToState cstate);
  */
 typedef struct CopyFromRoutine
 {
+    NodeTag        type;
+
     /*
      * Set input function information. This callback is called once at the
      * beginning of COPY FROM.
diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out
b/src/test/modules/test_copy_format/expected/test_copy_format.out
index 606c78f6878..4ed7c0b12db 100644
--- a/src/test/modules/test_copy_format/expected/test_copy_format.out
+++ b/src/test/modules/test_copy_format/expected/test_copy_format.out
@@ -2,9 +2,13 @@ CREATE EXTENSION test_copy_format;
 CREATE TABLE public.test (a smallint, b integer, c bigint);
 INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
 COPY public.test FROM stdin WITH (format 'test_copy_format');
-ERROR:  COPY format "test_copy_format" not recognized
-LINE 1: COPY public.test FROM stdin WITH (format 'test_copy_format')...
-                                          ^
+NOTICE:  test_copy_format: is_from=true
+NOTICE:  CopyFromInFunc: atttypid=21
+NOTICE:  CopyFromInFunc: atttypid=23
+NOTICE:  CopyFromInFunc: atttypid=20
+NOTICE:  CopyFromStart: natts=3
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromEnd
 COPY public.test TO stdout WITH (format 'test_copy_format');
 NOTICE:  test_copy_format: is_from=false
 NOTICE:  CopyToOutFunc: atttypid=21
diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql
b/src/test/modules/test_copy_format/sql/test_copy_format.sql
index 9406b3be3d4..e805f7cb011 100644
--- a/src/test/modules/test_copy_format/sql/test_copy_format.sql
+++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql
@@ -2,4 +2,5 @@ CREATE EXTENSION test_copy_format;
 CREATE TABLE public.test (a smallint, b integer, c bigint);
 INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
 COPY public.test FROM stdin WITH (format 'test_copy_format');
+\.
 COPY public.test TO stdout WITH (format 'test_copy_format');
diff --git a/src/test/modules/test_copy_format/test_copy_format.c
b/src/test/modules/test_copy_format/test_copy_format.c
index e064f40473b..f6b105659ab 100644
--- a/src/test/modules/test_copy_format/test_copy_format.c
+++ b/src/test/modules/test_copy_format/test_copy_format.c
@@ -18,6 +18,40 @@
 
 PG_MODULE_MAGIC;
 
+static void
+CopyFromInFunc(CopyFromState cstate, Oid atttypid,
+               FmgrInfo *finfo, Oid *typioparam)
+{
+    ereport(NOTICE, (errmsg("CopyFromInFunc: atttypid=%d", atttypid)));
+}
+
+static void
+CopyFromStart(CopyFromState cstate, TupleDesc tupDesc)
+{
+    ereport(NOTICE, (errmsg("CopyFromStart: natts=%d", tupDesc->natts)));
+}
+
+static bool
+CopyFromOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls)
+{
+    ereport(NOTICE, (errmsg("CopyFromOneRow")));
+    return false;
+}
+
+static void
+CopyFromEnd(CopyFromState cstate)
+{
+    ereport(NOTICE, (errmsg("CopyFromEnd")));
+}
+
+static const CopyFromRoutine CopyFromRoutineTestCopyFormat = {
+    .type = T_CopyFromRoutine,
+    .CopyFromInFunc = CopyFromInFunc,
+    .CopyFromStart = CopyFromStart,
+    .CopyFromOneRow = CopyFromOneRow,
+    .CopyFromEnd = CopyFromEnd,
+};
+
 static void
 CopyToOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo)
 {
@@ -59,5 +93,8 @@ test_copy_format(PG_FUNCTION_ARGS)
     ereport(NOTICE,
             (errmsg("test_copy_format: is_from=%s", is_from ? "true" : "false")));
 
-    PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat);
+    if (is_from)
+        PG_RETURN_POINTER(&CopyFromRoutineTestCopyFormat);
+    else
+        PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat);
 }
-- 
2.45.2
From 07638006a825fbd3c141902ad87736bd7ca00e7f Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Mon, 25 Nov 2024 14:19:34 +0900
Subject: [PATCH v26 7/8] Export CopyFromStateData
It's for custom COPY FROM format handlers implemented as extension.
This just moves codes. This doesn't change codes except CopySource
enum values. This changes COPY_ prefix of CopySource enum values to
COPY_SOURCE_ prefix like the CopyDest enum values prefix change. For
example, COPY_FILE in CopySource is renamed to COPY_SOURCE_FILE.
Note that this isn't enough to implement custom COPY FROM format
handlers as extension. We'll do the followings in a subsequent commit:
1. Add an opaque space for custom COPY FROM format handler
2. Export CopyReadBinaryData() to read the next data
---
 src/backend/commands/copyfrom.c          |   4 +-
 src/backend/commands/copyfromparse.c     |  10 +-
 src/include/commands/copy.h              |   1 -
 src/include/commands/copyapi.h           | 166 +++++++++++++++++++++++
 src/include/commands/copyfrom_internal.h | 166 -----------------------
 5 files changed, 173 insertions(+), 174 deletions(-)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index a4cdab75879..e1fef1b95a5 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1704,7 +1704,7 @@ BeginCopyFrom(ParseState *pstate,
                             pg_encoding_to_char(GetDatabaseEncoding()))));
     }
 
-    cstate->copy_src = COPY_FILE;    /* default */
+    cstate->copy_src = COPY_SOURCE_FILE;    /* default */
 
     cstate->whereClause = whereClause;
 
@@ -1832,7 +1832,7 @@ BeginCopyFrom(ParseState *pstate,
     if (data_source_cb)
     {
         progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK;
-        cstate->copy_src = COPY_CALLBACK;
+        cstate->copy_src = COPY_SOURCE_CALLBACK;
         cstate->data_source_cb = data_source_cb;
     }
     else if (pipe)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index fdb506c58be..1c68b0d2952 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -170,7 +170,7 @@ ReceiveCopyBegin(CopyFromState cstate)
     for (i = 0; i < natts; i++)
         pq_sendint16(&buf, format); /* per-column formats */
     pq_endmessage(&buf);
-    cstate->copy_src = COPY_FRONTEND;
+    cstate->copy_src = COPY_SOURCE_FRONTEND;
     cstate->fe_msgbuf = makeStringInfo();
     /* We *must* flush here to ensure FE knows it can send. */
     pq_flush();
@@ -238,7 +238,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
 
     switch (cstate->copy_src)
     {
-        case COPY_FILE:
+        case COPY_SOURCE_FILE:
             bytesread = fread(databuf, 1, maxread, cstate->copy_file);
             if (ferror(cstate->copy_file))
                 ereport(ERROR,
@@ -247,7 +247,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
             if (bytesread == 0)
                 cstate->raw_reached_eof = true;
             break;
-        case COPY_FRONTEND:
+        case COPY_SOURCE_FRONTEND:
             while (maxread > 0 && bytesread < minread && !cstate->raw_reached_eof)
             {
                 int            avail;
@@ -330,7 +330,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
                 bytesread += avail;
             }
             break;
-        case COPY_CALLBACK:
+        case COPY_SOURCE_CALLBACK:
             bytesread = cstate->data_source_cb(databuf, minread, maxread);
             break;
     }
@@ -1158,7 +1158,7 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
          * after \. up to the protocol end of copy data.  (XXX maybe better
          * not to treat \. as special?)
          */
-        if (cstate->copy_src == COPY_FRONTEND)
+        if (cstate->copy_src == COPY_SOURCE_FRONTEND)
         {
             int            inbytes;
 
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index e07988a0c74..50af4b99258 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -91,7 +91,6 @@ typedef struct CopyFormatOptions
                                  * NULL) */
 } CopyFormatOptions;
 
-/* This is private in commands/copyfrom.c */
 typedef struct CopyFromStateData *CopyFromState;
 typedef struct CopyToStateData *CopyToState;
 
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index fa3d8d87760..335584f8877 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -15,6 +15,7 @@
 #define COPYAPI_H
 
 #include "commands/copy.h"
+#include "commands/trigger.h"
 #include "executor/execdesc.h"
 #include "executor/tuptable.h"
 #include "nodes/execnodes.h"
@@ -171,4 +172,169 @@ typedef struct CopyFromRoutine
     void        (*CopyFromEnd) (CopyFromState cstate);
 } CopyFromRoutine;
 
+/*
+ * Represents the different source cases we need to worry about at
+ * the bottom level
+ */
+typedef enum CopySource
+{
+    COPY_SOURCE_FILE,            /* from file (or a piped program) */
+    COPY_SOURCE_FRONTEND,        /* from frontend */
+    COPY_SOURCE_CALLBACK,        /* from callback function */
+} CopySource;
+
+/*
+ *    Represents the end-of-line terminator type of the input
+ */
+typedef enum EolType
+{
+    EOL_UNKNOWN,
+    EOL_NL,
+    EOL_CR,
+    EOL_CRNL,
+} EolType;
+
+/*
+ * Represents the insert method to be used during COPY FROM.
+ */
+typedef enum CopyInsertMethod
+{
+    CIM_SINGLE,                    /* use table_tuple_insert or ExecForeignInsert */
+    CIM_MULTI,                    /* always use table_multi_insert or
+                                 * ExecForeignBatchInsert */
+    CIM_MULTI_CONDITIONAL,        /* use table_multi_insert or
+                                 * ExecForeignBatchInsert only if valid */
+} CopyInsertMethod;
+
+/*
+ * This struct contains all the state variables used throughout a COPY FROM
+ * operation.
+ */
+typedef struct CopyFromStateData
+{
+    /* format routine */
+    const CopyFromRoutine *routine;
+
+    /* low-level state data */
+    CopySource    copy_src;        /* type of copy source */
+    FILE       *copy_file;        /* used if copy_src == COPY_FILE */
+    StringInfo    fe_msgbuf;        /* used if copy_src == COPY_FRONTEND */
+
+    EolType        eol_type;        /* EOL type of input */
+    int            file_encoding;    /* file or remote side's character encoding */
+    bool        need_transcoding;    /* file encoding diff from server? */
+    Oid            conversion_proc;    /* encoding conversion function */
+
+    /* parameters from the COPY command */
+    Relation    rel;            /* relation to copy from */
+    List       *attnumlist;        /* integer list of attnums to copy */
+    char       *filename;        /* filename, or NULL for STDIN */
+    bool        is_program;        /* is 'filename' a program to popen? */
+    copy_data_source_cb data_source_cb; /* function for reading data */
+
+    CopyFormatOptions opts;
+    bool       *convert_select_flags;    /* per-column CSV/TEXT CS flags */
+    Node       *whereClause;    /* WHERE condition (or NULL) */
+
+    /* these are just for error messages, see CopyFromErrorCallback */
+    const char *cur_relname;    /* table name for error messages */
+    uint64        cur_lineno;        /* line number for error messages */
+    const char *cur_attname;    /* current att for error messages */
+    const char *cur_attval;        /* current att value for error messages */
+    bool        relname_only;    /* don't output line number, att, etc. */
+
+    /*
+     * Working state
+     */
+    MemoryContext copycontext;    /* per-copy execution context */
+
+    AttrNumber    num_defaults;    /* count of att that are missing and have
+                                 * default value */
+    FmgrInfo   *in_functions;    /* array of input functions for each attrs */
+    Oid           *typioparams;    /* array of element types for in_functions */
+    ErrorSaveContext *escontext;    /* soft error trapper during in_functions
+                                     * execution */
+    uint64        num_errors;        /* total number of rows which contained soft
+                                 * errors */
+    int           *defmap;            /* array of default att numbers related to
+                                 * missing att */
+    ExprState **defexprs;        /* array of default att expressions for all
+                                 * att */
+    bool       *defaults;        /* if DEFAULT marker was found for
+                                 * corresponding att */
+    bool        volatile_defexprs;    /* is any of defexprs volatile? */
+    List       *range_table;    /* single element list of RangeTblEntry */
+    List       *rteperminfos;    /* single element list of RTEPermissionInfo */
+    ExprState  *qualexpr;
+
+    TransitionCaptureState *transition_capture;
+
+    /*
+     * These variables are used to reduce overhead in COPY FROM.
+     *
+     * attribute_buf holds the separated, de-escaped text for each field of
+     * the current line.  The CopyReadAttributes functions return arrays of
+     * pointers into this buffer.  We avoid palloc/pfree overhead by re-using
+     * the buffer on each cycle.
+     *
+     * In binary COPY FROM, attribute_buf holds the binary data for the
+     * current field, but the usage is otherwise similar.
+     */
+    StringInfoData attribute_buf;
+
+    /* field raw data pointers found by COPY FROM */
+
+    int            max_fields;
+    char      **raw_fields;
+
+    /*
+     * Similarly, line_buf holds the whole input line being processed. The
+     * input cycle is first to read the whole line into line_buf, and then
+     * extract the individual attribute fields into attribute_buf.  line_buf
+     * is preserved unmodified so that we can display it in error messages if
+     * appropriate.  (In binary mode, line_buf is not used.)
+     */
+    StringInfoData line_buf;
+    bool        line_buf_valid; /* contains the row being processed? */
+
+    /*
+     * input_buf holds input data, already converted to database encoding.
+     *
+     * In text mode, CopyReadLine parses this data sufficiently to locate line
+     * boundaries, then transfers the data to line_buf. We guarantee that
+     * there is a \0 at input_buf[input_buf_len] at all times.  (In binary
+     * mode, input_buf is not used.)
+     *
+     * If encoding conversion is not required, input_buf is not a separate
+     * buffer but points directly to raw_buf.  In that case, input_buf_len
+     * tracks the number of bytes that have been verified as valid in the
+     * database encoding, and raw_buf_len is the total number of bytes stored
+     * in the buffer.
+     */
+#define INPUT_BUF_SIZE 65536    /* we palloc INPUT_BUF_SIZE+1 bytes */
+    char       *input_buf;
+    int            input_buf_index;    /* next byte to process */
+    int            input_buf_len;    /* total # of bytes stored */
+    bool        input_reached_eof;    /* true if we reached EOF */
+    bool        input_reached_error;    /* true if a conversion error happened */
+    /* Shorthand for number of unconsumed bytes available in input_buf */
+#define INPUT_BUF_BYTES(cstate) ((cstate)->input_buf_len - (cstate)->input_buf_index)
+
+    /*
+     * raw_buf holds raw input data read from the data source (file or client
+     * connection), not yet converted to the database encoding.  Like with
+     * 'input_buf', we guarantee that there is a \0 at raw_buf[raw_buf_len].
+     */
+#define RAW_BUF_SIZE 65536        /* we palloc RAW_BUF_SIZE+1 bytes */
+    char       *raw_buf;
+    int            raw_buf_index;    /* next byte to process */
+    int            raw_buf_len;    /* total # of bytes stored */
+    bool        raw_reached_eof;    /* true if we reached EOF */
+
+    /* Shorthand for number of unconsumed bytes available in raw_buf */
+#define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index)
+
+    uint64        bytes_processed;    /* number of bytes processed so far */
+} CopyFromStateData;
+
 #endif                            /* COPYAPI_H */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 1ca058c6add..23760eb0e02 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -15,174 +15,8 @@
 #define COPYFROM_INTERNAL_H
 
 #include "commands/copyapi.h"
-#include "commands/trigger.h"
 #include "nodes/miscnodes.h"
 
-/*
- * Represents the different source cases we need to worry about at
- * the bottom level
- */
-typedef enum CopySource
-{
-    COPY_FILE,                    /* from file (or a piped program) */
-    COPY_FRONTEND,                /* from frontend */
-    COPY_CALLBACK,                /* from callback function */
-} CopySource;
-
-/*
- *    Represents the end-of-line terminator type of the input
- */
-typedef enum EolType
-{
-    EOL_UNKNOWN,
-    EOL_NL,
-    EOL_CR,
-    EOL_CRNL,
-} EolType;
-
-/*
- * Represents the insert method to be used during COPY FROM.
- */
-typedef enum CopyInsertMethod
-{
-    CIM_SINGLE,                    /* use table_tuple_insert or ExecForeignInsert */
-    CIM_MULTI,                    /* always use table_multi_insert or
-                                 * ExecForeignBatchInsert */
-    CIM_MULTI_CONDITIONAL,        /* use table_multi_insert or
-                                 * ExecForeignBatchInsert only if valid */
-} CopyInsertMethod;
-
-/*
- * This struct contains all the state variables used throughout a COPY FROM
- * operation.
- */
-typedef struct CopyFromStateData
-{
-    /* format routine */
-    const CopyFromRoutine *routine;
-
-    /* low-level state data */
-    CopySource    copy_src;        /* type of copy source */
-    FILE       *copy_file;        /* used if copy_src == COPY_FILE */
-    StringInfo    fe_msgbuf;        /* used if copy_src == COPY_FRONTEND */
-
-    EolType        eol_type;        /* EOL type of input */
-    int            file_encoding;    /* file or remote side's character encoding */
-    bool        need_transcoding;    /* file encoding diff from server? */
-    Oid            conversion_proc;    /* encoding conversion function */
-
-    /* parameters from the COPY command */
-    Relation    rel;            /* relation to copy from */
-    List       *attnumlist;        /* integer list of attnums to copy */
-    char       *filename;        /* filename, or NULL for STDIN */
-    bool        is_program;        /* is 'filename' a program to popen? */
-    copy_data_source_cb data_source_cb; /* function for reading data */
-
-    CopyFormatOptions opts;
-    bool       *convert_select_flags;    /* per-column CSV/TEXT CS flags */
-    Node       *whereClause;    /* WHERE condition (or NULL) */
-
-    /* these are just for error messages, see CopyFromErrorCallback */
-    const char *cur_relname;    /* table name for error messages */
-    uint64        cur_lineno;        /* line number for error messages */
-    const char *cur_attname;    /* current att for error messages */
-    const char *cur_attval;        /* current att value for error messages */
-    bool        relname_only;    /* don't output line number, att, etc. */
-
-    /*
-     * Working state
-     */
-    MemoryContext copycontext;    /* per-copy execution context */
-
-    AttrNumber    num_defaults;    /* count of att that are missing and have
-                                 * default value */
-    FmgrInfo   *in_functions;    /* array of input functions for each attrs */
-    Oid           *typioparams;    /* array of element types for in_functions */
-    ErrorSaveContext *escontext;    /* soft error trapper during in_functions
-                                     * execution */
-    uint64        num_errors;        /* total number of rows which contained soft
-                                 * errors */
-    int           *defmap;            /* array of default att numbers related to
-                                 * missing att */
-    ExprState **defexprs;        /* array of default att expressions for all
-                                 * att */
-    bool       *defaults;        /* if DEFAULT marker was found for
-                                 * corresponding att */
-    bool        volatile_defexprs;    /* is any of defexprs volatile? */
-    List       *range_table;    /* single element list of RangeTblEntry */
-    List       *rteperminfos;    /* single element list of RTEPermissionInfo */
-    ExprState  *qualexpr;
-
-    TransitionCaptureState *transition_capture;
-
-    /*
-     * These variables are used to reduce overhead in COPY FROM.
-     *
-     * attribute_buf holds the separated, de-escaped text for each field of
-     * the current line.  The CopyReadAttributes functions return arrays of
-     * pointers into this buffer.  We avoid palloc/pfree overhead by re-using
-     * the buffer on each cycle.
-     *
-     * In binary COPY FROM, attribute_buf holds the binary data for the
-     * current field, but the usage is otherwise similar.
-     */
-    StringInfoData attribute_buf;
-
-    /* field raw data pointers found by COPY FROM */
-
-    int            max_fields;
-    char      **raw_fields;
-
-    /*
-     * Similarly, line_buf holds the whole input line being processed. The
-     * input cycle is first to read the whole line into line_buf, and then
-     * extract the individual attribute fields into attribute_buf.  line_buf
-     * is preserved unmodified so that we can display it in error messages if
-     * appropriate.  (In binary mode, line_buf is not used.)
-     */
-    StringInfoData line_buf;
-    bool        line_buf_valid; /* contains the row being processed? */
-
-    /*
-     * input_buf holds input data, already converted to database encoding.
-     *
-     * In text mode, CopyReadLine parses this data sufficiently to locate line
-     * boundaries, then transfers the data to line_buf. We guarantee that
-     * there is a \0 at input_buf[input_buf_len] at all times.  (In binary
-     * mode, input_buf is not used.)
-     *
-     * If encoding conversion is not required, input_buf is not a separate
-     * buffer but points directly to raw_buf.  In that case, input_buf_len
-     * tracks the number of bytes that have been verified as valid in the
-     * database encoding, and raw_buf_len is the total number of bytes stored
-     * in the buffer.
-     */
-#define INPUT_BUF_SIZE 65536    /* we palloc INPUT_BUF_SIZE+1 bytes */
-    char       *input_buf;
-    int            input_buf_index;    /* next byte to process */
-    int            input_buf_len;    /* total # of bytes stored */
-    bool        input_reached_eof;    /* true if we reached EOF */
-    bool        input_reached_error;    /* true if a conversion error happened */
-    /* Shorthand for number of unconsumed bytes available in input_buf */
-#define INPUT_BUF_BYTES(cstate) ((cstate)->input_buf_len - (cstate)->input_buf_index)
-
-    /*
-     * raw_buf holds raw input data read from the data source (file or client
-     * connection), not yet converted to the database encoding.  Like with
-     * 'input_buf', we guarantee that there is a \0 at raw_buf[raw_buf_len].
-     */
-#define RAW_BUF_SIZE 65536        /* we palloc RAW_BUF_SIZE+1 bytes */
-    char       *raw_buf;
-    int            raw_buf_index;    /* next byte to process */
-    int            raw_buf_len;    /* total # of bytes stored */
-    bool        raw_reached_eof;    /* true if we reached EOF */
-
-    /* Shorthand for number of unconsumed bytes available in raw_buf */
-#define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index)
-
-    uint64        bytes_processed;    /* number of bytes processed so far */
-} CopyFromStateData;
-
 extern void ReceiveCopyBegin(CopyFromState cstate);
 extern void ReceiveCopyBinaryHeader(CopyFromState cstate);
 
-- 
2.45.2
From 9a50895abd2db7b6d6d90f0a98c9370a809cb328 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Mon, 25 Nov 2024 14:21:39 +0900
Subject: [PATCH v26 8/8] Add support for implementing custom COPY FROM format
 as extension
* Add CopyFromStateData::opaque that can be used to keep data for
  custom COPY From format implementation
* Export CopyReadBinaryData() to read the next data as
  CopyFromStateRead()
---
 src/backend/commands/copyfromparse.c | 12 ++++++++++++
 src/include/commands/copyapi.h       |  5 +++++
 2 files changed, 17 insertions(+)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 1c68b0d2952..0a7e7255b7d 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -729,6 +729,18 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes)
     return copied_bytes;
 }
 
+/*
+ * Export CopyReadBinaryData() for extensions. We want to keep
+ * CopyReadBinaryData() as a static function for
+ * optimization. CopyReadBinaryData() calls in this file may be optimized by
+ * a compiler.
+ */
+int
+CopyFromStateRead(CopyFromState cstate, char *dest, int nbytes)
+{
+    return CopyReadBinaryData(cstate, dest, nbytes);
+}
+
 /*
  * Read raw fields in the next line for COPY FROM in text or csv mode.
  * Return false if no more lines.
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 335584f8877..caba308533d 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -335,6 +335,11 @@ typedef struct CopyFromStateData
 #define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index)
 
     uint64        bytes_processed;    /* number of bytes processed so far */
+
+    /* For custom format implementation */
+    void       *opaque;            /* private space */
 } CopyFromStateData;
 
+extern int    CopyFromStateRead(CopyFromState cstate, char *dest, int nbytes);
+
 #endif                            /* COPYAPI_H */
-- 
2.45.2
			
		On Sun, Nov 24, 2024 at 6:06 PM Sutou Kouhei <kou@clear-code.com> wrote: > > Hi, > > In <CAD21AoBNfKDbJnu-zONNpG820ZXYC0fuTSLrJ-UdRqU4qp2wog@mail.gmail.com> > "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 22 Nov 2024 13:01:06 -0800, > Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > >> @@ -1237,7 +1219,7 @@ CopyReadLine(CopyFromState cstate, bool is_csv) > >> /* > >> * CopyReadLineText - inner loop of CopyReadLine for text mode > >> */ > >> -static pg_attribute_always_inline bool > >> +static bool > >> CopyReadLineText(CopyFromState cstate, bool is_csv) > >> > >> Is this an intentional change? > >> CopyReadLineText() has "bool in_csv". > > > > Yes, I'm not sure it's really necessary to make it inline since the > > benchmark results don't show much difference. Probably this is because > > the function has 'is_csv' in some 'if' branches but the compiler > > cannot optimize out the whole 'if' branches as most 'if' branches > > check 'is_csv' and other variables. > > I see. If explicit "inline" isn't related to performance, we > don't need explicit "inline". > > > I've attached the v25 patches that squashed the minor changes I made > > in v24 and incorporated all comments I got so far. I think these two > > patches are in good shape. Could you rebase remaining patches on top > > of them so that we can see the big picture of this feature? > > OK. I'll work on it. > > > Regarding exposing the structs such as CopyToStateData, v22-0004 patch > > moves most of all copy-related structs to copyapi.h from copyto.c, > > copyfrom_internal.h, and copy.h, which seems odd to me. I think we can > > expose CopyToStateData (and related structs) in a new file > > copyto_internal.h and keep other structs in the original header files. > > Custom COPY format extensions need to use > CopyToStateData/CopyFromStateData. For example, > CopyToStateData::rel is used to retrieve table schema. If we > move CopyToStateData to copyto_internal.h not copyapi.h, > custom COPY format extensions need to include > copyto_internal.h. I feel that it's strange that extensions > need to use internal headers. > > What is your real concern? If you don't want to export > CopyToStateData/CopyFromStateData entirely, we can provide > accessors only for some members of them. I'm not against exposing CopyToStateData and CopyFromStateData. My concern is that if we move all copy-related structs to copyapi.h, other copy-related files would need to include copyapi.h even if the file is not related to copy format APIs. IMO copyapi.h should have only copy-format-API-related variables structs such as CopyFromRoutine and CopyToRoutine and functions that custom COPY format extension can utilize to access data source and destination, such as CopyGetData(). When it comes to CopyToStateData and CopyFromStateData, I feel that they have mixed fields of common fields (e.g., rel, num_errors, and transition_capture) and format-specific fields (e.g., input_buf, line_buf, and eol_type). While it makes sense to me that custom copy format extensions can access the common fields, it seems odd to me that they can access text-and-csv-format-specific fields such as input_buf. We might want to sort out these fields but it could be a huge task. Also, I realized that CopyFromTextLikeOneRow() does input function calls and handle soft errors based on ON_ERROR and LOG_VERBOSITY options. I think these should be done in the core side. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Hi,
In <CAD21AoBW5dEv=Gd2iF_BYNZGEsF=3KTG7fpq=vP5qwpC1CAOeA@mail.gmail.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 25 Nov 2024 23:10:50 -0800,
  Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> Custom COPY format extensions need to use
>> CopyToStateData/CopyFromStateData. For example,
>> CopyToStateData::rel is used to retrieve table schema. If we
>> move CopyToStateData to copyto_internal.h not copyapi.h,
>> custom COPY format extensions need to include
>> copyto_internal.h. I feel that it's strange that extensions
>> need to use internal headers.
>>
>> What is your real concern? If you don't want to export
>> CopyToStateData/CopyFromStateData entirely, we can provide
>> accessors only for some members of them.
> 
> I'm not against exposing CopyToStateData and CopyFromStateData. My
> concern is that if we move all copy-related structs to copyapi.h,
> other copy-related files would need to include copyapi.h even if the
> file is not related to copy format APIs. IMO copyapi.h should have
> only copy-format-API-related variables structs such as CopyFromRoutine
> and CopyToRoutine and functions that custom COPY format extension can
> utilize to access data source and destination, such as CopyGetData().
> 
> When it comes to CopyToStateData and CopyFromStateData, I feel that
> they have mixed fields of common fields (e.g., rel, num_errors, and
> transition_capture) and format-specific fields (e.g., input_buf,
> line_buf, and eol_type). While it makes sense to me that custom copy
> format extensions can access the common fields, it seems odd to me
> that they can access text-and-csv-format-specific fields such as
> input_buf. We might want to sort out these fields but it could be a
> huge task.
I understand you concern.
How about using Copy{To,From}StateData::opaque to store
text-and-csv-format-specific data? I feel that this
refactoring doesn't block the 0001/0002 patches. Do you
think that this is a blocker of the 0001/0002 patches?
I think that this may block the 0004/0007 patches that
export Copy{To,From}StateData. But we can work on it after
we merge the 0004/0007 patches. Which is preferred?
> Also, I realized that CopyFromTextLikeOneRow() does input function
> calls and handle soft errors based on ON_ERROR and LOG_VERBOSITY
> options. I think these should be done in the core side.
How about extracting the following part in NextCopyFrom() as
a function and provide it for extensions?
----
                Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
                cstate->num_errors++;
                if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
                {
                    /*
                     * Since we emit line number and column info in the below
                     * notice message, we suppress error context information
                     * other than the relation name.
                     */
                    Assert(!cstate->relname_only);
                    cstate->relname_only = true;
                    if (cstate->cur_attval)
                    {
                        char       *attval;
                        attval = CopyLimitPrintoutLength(cstate->cur_attval);
                        ereport(NOTICE,
                                errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\":
\"%s\"",
                                       (unsigned long long) cstate->cur_lineno,
                                       cstate->cur_attname,
                                       attval));
                        pfree(attval);
                    }
                    else
                        ereport(NOTICE,
                                errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\":
nullinput",
 
                                       (unsigned long long) cstate->cur_lineno,
                                       cstate->cur_attname));
                    /* reset relname_only */
                    cstate->relname_only = false;
                }
----
See the attached v27 patch set for this idea.
0001-0008 are almost same as the v26 patch set.
("format" -> "FORMAT" in COPY test changes are included.)
0009 exports the above code as
CopyFromSkipErrorRow(). Extensions should call it when they
use errsave() for a soft error in CopyFromOneRow callback.
Thanks,
-- 
kou
From b95060713e5cfccc8b3db5acb34d352f18a8b1e2 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Sat, 28 Sep 2024 23:24:49 +0900
Subject: [PATCH v27 1/9] Refactor COPY TO to use format callback functions.
This commit introduces a new CopyToRoutine struct, which is a set of
callback routines to copy tuples in a specific format. It also makes
the existing formats (text, CSV, and binary) utilize these format
callbacks.
This change is a preliminary step towards making the COPY TO command
extensible in terms of output formats.
Additionally, this refactoring contributes to a performance
improvement by reducing the number of "if" branches that need to be
checked on a per-row basis when sending field representations in text
or CSV mode. The performance benchmark results showed ~5% performance
gain in text or CSV mode.
Author: Sutou Kouhei
Reviewed-by: Michael Paquier, Tomas Vondra, Masahiko Sawada
Reviewed-by: Junwang Zhao
Discussion: https://postgr.es/m/20231204.153548.2126325458835528809.kou@clear-code.com
---
 src/backend/commands/copyto.c    | 441 +++++++++++++++++++++----------
 src/include/commands/copyapi.h   |  57 ++++
 src/tools/pgindent/typedefs.list |   1 +
 3 files changed, 358 insertions(+), 141 deletions(-)
 create mode 100644 src/include/commands/copyapi.h
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index f55e6d96751..f81dadcc12b 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -19,7 +19,7 @@
 #include <sys/stat.h>
 
 #include "access/tableam.h"
-#include "commands/copy.h"
+#include "commands/copyapi.h"
 #include "commands/progress.h"
 #include "executor/execdesc.h"
 #include "executor/executor.h"
@@ -64,6 +64,9 @@ typedef enum CopyDest
  */
 typedef struct CopyToStateData
 {
+    /* format-specific routines */
+    const CopyToRoutine *routine;
+
     /* low-level state data */
     CopyDest    copy_dest;        /* type of copy source/destination */
     FILE       *copy_file;        /* used if copy_dest == COPY_FILE */
@@ -114,6 +117,19 @@ static void CopyAttributeOutText(CopyToState cstate, const char *string);
 static void CopyAttributeOutCSV(CopyToState cstate, const char *string,
                                 bool use_quote);
 
+/* built-in format-specific routines */
+static void CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc);
+static void CopyToTextLikeOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo);
+static void CopyToTextOneRow(CopyToState cstate, TupleTableSlot *slot);
+static void CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot);
+static void CopyToTextLikeOneRow(CopyToState cstate, TupleTableSlot *slot,
+                                 bool is_csv);
+static void CopyToTextLikeEnd(CopyToState cstate);
+static void CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc);
+static void CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo);
+static void CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot);
+static void CopyToBinaryEnd(CopyToState cstate);
+
 /* Low-level communications functions */
 static void SendCopyBegin(CopyToState cstate);
 static void SendCopyEnd(CopyToState cstate);
@@ -121,9 +137,254 @@ static void CopySendData(CopyToState cstate, const void *databuf, int datasize);
 static void CopySendString(CopyToState cstate, const char *str);
 static void CopySendChar(CopyToState cstate, char c);
 static void CopySendEndOfRow(CopyToState cstate);
+static void CopySendTextLikeEndOfRow(CopyToState cstate);
 static void CopySendInt32(CopyToState cstate, int32 val);
 static void CopySendInt16(CopyToState cstate, int16 val);
 
+/*
+ * COPY TO routines for built-in formats.
+ *
+ * CSV and text formats share the same TextLike routines except for the
+ * one-row callback.
+ */
+
+/* text format */
+static const CopyToRoutine CopyToRoutineText = {
+    .CopyToStart = CopyToTextLikeStart,
+    .CopyToOutFunc = CopyToTextLikeOutFunc,
+    .CopyToOneRow = CopyToTextOneRow,
+    .CopyToEnd = CopyToTextLikeEnd,
+};
+
+/* CSV format */
+static const CopyToRoutine CopyToRoutineCSV = {
+    .CopyToStart = CopyToTextLikeStart,
+    .CopyToOutFunc = CopyToTextLikeOutFunc,
+    .CopyToOneRow = CopyToCSVOneRow,
+    .CopyToEnd = CopyToTextLikeEnd,
+};
+
+/* binary format */
+static const CopyToRoutine CopyToRoutineBinary = {
+    .CopyToStart = CopyToBinaryStart,
+    .CopyToOutFunc = CopyToBinaryOutFunc,
+    .CopyToOneRow = CopyToBinaryOneRow,
+    .CopyToEnd = CopyToBinaryEnd,
+};
+
+/* Return COPY TO routines for the given options */
+static const CopyToRoutine *
+CopyToGetRoutine(CopyFormatOptions opts)
+{
+    if (opts.csv_mode)
+        return &CopyToRoutineCSV;
+    else if (opts.binary)
+        return &CopyToRoutineBinary;
+
+    /* default is text */
+    return &CopyToRoutineText;
+}
+
+/* Implementation of the start callback for text and CSV formats */
+static void
+CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
+{
+    /*
+     * For non-binary copy, we need to convert null_print to file encoding,
+     * because it will be sent directly with CopySendString.
+     */
+    if (cstate->need_transcoding)
+        cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print,
+                                                          cstate->opts.null_print_len,
+                                                          cstate->file_encoding);
+
+    /* if a header has been requested send the line */
+    if (cstate->opts.header_line)
+    {
+        ListCell   *cur;
+        bool        hdr_delim = false;
+
+        foreach(cur, cstate->attnumlist)
+        {
+            int            attnum = lfirst_int(cur);
+            char       *colname;
+
+            if (hdr_delim)
+                CopySendChar(cstate, cstate->opts.delim[0]);
+            hdr_delim = true;
+
+            colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
+
+            if (cstate->opts.csv_mode)
+                CopyAttributeOutCSV(cstate, colname, false);
+            else
+                CopyAttributeOutText(cstate, colname);
+        }
+
+        CopySendTextLikeEndOfRow(cstate);
+    }
+}
+
+/*
+ * Implementation of the outfunc callback for text and CSV formats. Assign
+ * the output function data to the given *finfo.
+ */
+static void
+CopyToTextLikeOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo)
+{
+    Oid            func_oid;
+    bool        is_varlena;
+
+    /* Set output function for an attribute */
+    getTypeOutputInfo(atttypid, &func_oid, &is_varlena);
+    fmgr_info(func_oid, finfo);
+}
+
+/* Implementation of the per-row callback for text format */
+static void
+CopyToTextOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+    CopyToTextLikeOneRow(cstate, slot, false);
+}
+
+/* Implementation of the per-row callback for CSV format */
+static void
+CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+    CopyToTextLikeOneRow(cstate, slot, true);
+}
+
+/*
+ * Workhorse for CopyToTextOneRow() and CopyToCSVOneRow().
+ *
+ * We use pg_attribute_always_inline to reduce function call overheads.
+ */
+static pg_attribute_always_inline void
+CopyToTextLikeOneRow(CopyToState cstate,
+                     TupleTableSlot *slot,
+                     bool is_csv)
+{
+    bool        need_delim = false;
+    FmgrInfo   *out_functions = cstate->out_functions;
+
+    foreach_int(attnum, cstate->attnumlist)
+    {
+        Datum        value = slot->tts_values[attnum - 1];
+        bool        isnull = slot->tts_isnull[attnum - 1];
+
+        if (need_delim)
+            CopySendChar(cstate, cstate->opts.delim[0]);
+        need_delim = true;
+
+        if (isnull)
+        {
+            CopySendString(cstate, cstate->opts.null_print_client);
+        }
+        else
+        {
+            char       *string;
+
+            string = OutputFunctionCall(&out_functions[attnum - 1],
+                                        value);
+
+            /*
+             * is_csv will be optimized away by compiler, as argument is
+             * constant at caller.
+             */
+            if (is_csv)
+                CopyAttributeOutCSV(cstate, string,
+                                    cstate->opts.force_quote_flags[attnum - 1]);
+            else
+                CopyAttributeOutText(cstate, string);
+        }
+    }
+
+    CopySendTextLikeEndOfRow(cstate);
+}
+
+/* Implementation of the end callback for text and CSV formats */
+static void
+CopyToTextLikeEnd(CopyToState cstate)
+{
+    /* Nothing to do here */
+}
+
+/*
+ * Implementation of the start callback for binary format. Send a header
+ * for a binary copy.
+ */
+static void
+CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc)
+{
+    int32        tmp;
+
+    /* Signature */
+    CopySendData(cstate, BinarySignature, 11);
+    /* Flags field */
+    tmp = 0;
+    CopySendInt32(cstate, tmp);
+    /* No header extension */
+    tmp = 0;
+    CopySendInt32(cstate, tmp);
+}
+
+/*
+ * Implementation of the outfunc callback for binary format. Assign
+ * the binary output function to the given *finfo.
+ */
+static void
+CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo)
+{
+    Oid            func_oid;
+    bool        is_varlena;
+
+    /* Set output function for an attribute */
+    getTypeBinaryOutputInfo(atttypid, &func_oid, &is_varlena);
+    fmgr_info(func_oid, finfo);
+}
+
+/* Implementation of the per-row callback for binary format */
+static void
+CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+    FmgrInfo   *out_functions = cstate->out_functions;
+
+    /* Binary per-tuple header */
+    CopySendInt16(cstate, list_length(cstate->attnumlist));
+
+    foreach_int(attnum, cstate->attnumlist)
+    {
+        Datum        value = slot->tts_values[attnum - 1];
+        bool        isnull = slot->tts_isnull[attnum - 1];
+
+        if (isnull)
+        {
+            CopySendInt32(cstate, -1);
+        }
+        else
+        {
+            bytea       *outputbytes;
+
+            outputbytes = SendFunctionCall(&out_functions[attnum - 1],
+                                           value);
+            CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
+            CopySendData(cstate, VARDATA(outputbytes),
+                         VARSIZE(outputbytes) - VARHDRSZ);
+        }
+    }
+
+    CopySendEndOfRow(cstate);
+}
+
+/* Implementation of the end callback for binary format */
+static void
+CopyToBinaryEnd(CopyToState cstate)
+{
+    /* Generate trailer for a binary copy */
+    CopySendInt16(cstate, -1);
+    /* Need to flush out the trailer */
+    CopySendEndOfRow(cstate);
+}
 
 /*
  * Send copy start/stop messages for frontend copies.  These have changed
@@ -191,16 +452,6 @@ CopySendEndOfRow(CopyToState cstate)
     switch (cstate->copy_dest)
     {
         case COPY_FILE:
-            if (!cstate->opts.binary)
-            {
-                /* Default line termination depends on platform */
-#ifndef WIN32
-                CopySendChar(cstate, '\n');
-#else
-                CopySendString(cstate, "\r\n");
-#endif
-            }
-
             if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1,
                        cstate->copy_file) != 1 ||
                 ferror(cstate->copy_file))
@@ -235,10 +486,6 @@ CopySendEndOfRow(CopyToState cstate)
             }
             break;
         case COPY_FRONTEND:
-            /* The FE/BE protocol uses \n as newline for all platforms */
-            if (!cstate->opts.binary)
-                CopySendChar(cstate, '\n');
-
             /* Dump the accumulated row as one CopyData message */
             (void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len);
             break;
@@ -254,6 +501,35 @@ CopySendEndOfRow(CopyToState cstate)
     resetStringInfo(fe_msgbuf);
 }
 
+/*
+ * Wrapper function of CopySendEndOfRow for text and CSV formats. Sends the
+ * the line termination and do common appropriate things for the end of row.
+ */
+static inline void
+CopySendTextLikeEndOfRow(CopyToState cstate)
+{
+    switch (cstate->copy_dest)
+    {
+        case COPY_FILE:
+            /* Default line termination depends on platform */
+#ifndef WIN32
+            CopySendChar(cstate, '\n');
+#else
+            CopySendString(cstate, "\r\n");
+#endif
+            break;
+        case COPY_FRONTEND:
+            /* The FE/BE protocol uses \n as newline for all platforms */
+            CopySendChar(cstate, '\n');
+            break;
+        default:
+            break;
+    }
+
+    /* Now take the actions related to the end of a row */
+    CopySendEndOfRow(cstate);
+}
+
 /*
  * These functions do apply some data conversion
  */
@@ -426,6 +702,9 @@ BeginCopyTo(ParseState *pstate,
     /* Extract options from the statement node tree */
     ProcessCopyOptions(pstate, &cstate->opts, false /* is_from */ , options);
 
+    /* Set format routine */
+    cstate->routine = CopyToGetRoutine(cstate->opts);
+
     /* Process the source/target relation or query */
     if (rel)
     {
@@ -771,19 +1050,10 @@ DoCopyTo(CopyToState cstate)
     foreach(cur, cstate->attnumlist)
     {
         int            attnum = lfirst_int(cur);
-        Oid            out_func_oid;
-        bool        isvarlena;
         Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
 
-        if (cstate->opts.binary)
-            getTypeBinaryOutputInfo(attr->atttypid,
-                                    &out_func_oid,
-                                    &isvarlena);
-        else
-            getTypeOutputInfo(attr->atttypid,
-                              &out_func_oid,
-                              &isvarlena);
-        fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
+        cstate->routine->CopyToOutFunc(cstate, attr->atttypid,
+                                       &cstate->out_functions[attnum - 1]);
     }
 
     /*
@@ -796,56 +1066,7 @@ DoCopyTo(CopyToState cstate)
                                                "COPY TO",
                                                ALLOCSET_DEFAULT_SIZES);
 
-    if (cstate->opts.binary)
-    {
-        /* Generate header for a binary copy */
-        int32        tmp;
-
-        /* Signature */
-        CopySendData(cstate, BinarySignature, 11);
-        /* Flags field */
-        tmp = 0;
-        CopySendInt32(cstate, tmp);
-        /* No header extension */
-        tmp = 0;
-        CopySendInt32(cstate, tmp);
-    }
-    else
-    {
-        /*
-         * For non-binary copy, we need to convert null_print to file
-         * encoding, because it will be sent directly with CopySendString.
-         */
-        if (cstate->need_transcoding)
-            cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print,
-                                                              cstate->opts.null_print_len,
-                                                              cstate->file_encoding);
-
-        /* if a header has been requested send the line */
-        if (cstate->opts.header_line)
-        {
-            bool        hdr_delim = false;
-
-            foreach(cur, cstate->attnumlist)
-            {
-                int            attnum = lfirst_int(cur);
-                char       *colname;
-
-                if (hdr_delim)
-                    CopySendChar(cstate, cstate->opts.delim[0]);
-                hdr_delim = true;
-
-                colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
-
-                if (cstate->opts.csv_mode)
-                    CopyAttributeOutCSV(cstate, colname, false);
-                else
-                    CopyAttributeOutText(cstate, colname);
-            }
-
-            CopySendEndOfRow(cstate);
-        }
-    }
+    cstate->routine->CopyToStart(cstate, tupDesc);
 
     if (cstate->rel)
     {
@@ -884,13 +1105,7 @@ DoCopyTo(CopyToState cstate)
         processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
     }
 
-    if (cstate->opts.binary)
-    {
-        /* Generate trailer for a binary copy */
-        CopySendInt16(cstate, -1);
-        /* Need to flush out the trailer */
-        CopySendEndOfRow(cstate);
-    }
+    cstate->routine->CopyToEnd(cstate);
 
     MemoryContextDelete(cstate->rowcontext);
 
@@ -903,74 +1118,18 @@ DoCopyTo(CopyToState cstate)
 /*
  * Emit one row during DoCopyTo().
  */
-static void
+static inline void
 CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 {
-    FmgrInfo   *out_functions = cstate->out_functions;
     MemoryContext oldcontext;
 
     MemoryContextReset(cstate->rowcontext);
     oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
 
-    if (cstate->opts.binary)
-    {
-        /* Binary per-tuple header */
-        CopySendInt16(cstate, list_length(cstate->attnumlist));
-    }
-
     /* Make sure the tuple is fully deconstructed */
     slot_getallattrs(slot);
 
-    if (!cstate->opts.binary)
-    {
-        bool        need_delim = false;
-
-        foreach_int(attnum, cstate->attnumlist)
-        {
-            Datum        value = slot->tts_values[attnum - 1];
-            bool        isnull = slot->tts_isnull[attnum - 1];
-            char       *string;
-
-            if (need_delim)
-                CopySendChar(cstate, cstate->opts.delim[0]);
-            need_delim = true;
-
-            if (isnull)
-                CopySendString(cstate, cstate->opts.null_print_client);
-            else
-            {
-                string = OutputFunctionCall(&out_functions[attnum - 1],
-                                            value);
-                if (cstate->opts.csv_mode)
-                    CopyAttributeOutCSV(cstate, string,
-                                        cstate->opts.force_quote_flags[attnum - 1]);
-                else
-                    CopyAttributeOutText(cstate, string);
-            }
-        }
-    }
-    else
-    {
-        foreach_int(attnum, cstate->attnumlist)
-        {
-            Datum        value = slot->tts_values[attnum - 1];
-            bool        isnull = slot->tts_isnull[attnum - 1];
-            bytea       *outputbytes;
-
-            if (isnull)
-                CopySendInt32(cstate, -1);
-            else
-            {
-                outputbytes = SendFunctionCall(&out_functions[attnum - 1],
-                                               value);
-                CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
-                CopySendData(cstate, VARDATA(outputbytes),
-                             VARSIZE(outputbytes) - VARHDRSZ);
-            }
-        }
-    }
-
-    CopySendEndOfRow(cstate);
+    cstate->routine->CopyToOneRow(cstate, slot);
 
     MemoryContextSwitchTo(oldcontext);
 }
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
new file mode 100644
index 00000000000..eccc875d0e8
--- /dev/null
+++ b/src/include/commands/copyapi.h
@@ -0,0 +1,57 @@
+/*-------------------------------------------------------------------------
+ *
+ * copyapi.h
+ *      API for COPY TO handlers
+ *
+ *
+ * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/commands/copyapi.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef COPYAPI_H
+#define COPYAPI_H
+
+#include "commands/copy.h"
+#include "executor/tuptable.h"
+#include "nodes/execnodes.h"
+
+/*
+ * API structure for a COPY TO format implementation. Note this must be
+ * allocated in a server-lifetime manner, typically as a static const struct.
+ */
+typedef struct CopyToRoutine
+{
+    /*
+     * Set output function information. This callback is called once at the
+     * beginning of COPY TO.
+     *
+     * 'finfo' can be optionally filled to provide the catalog information of
+     * the output function.
+     *
+     * 'atttypid' is the OID of data type used by the relation's attribute.
+     */
+    void        (*CopyToOutFunc) (CopyToState cstate, Oid atttypid,
+                                  FmgrInfo *finfo);
+
+    /*
+     * Start a COPY TO. This callback is called once at the beginning of COPY
+     * FROM.
+     *
+     * 'tupDesc' is the tuple descriptor of the relation from where the data
+     * is read.
+     */
+    void        (*CopyToStart) (CopyToState cstate, TupleDesc tupDesc);
+
+    /*
+     * Write one row to the 'slot'.
+     */
+    void        (*CopyToOneRow) (CopyToState cstate, TupleTableSlot *slot);
+
+    /* End a COPY TO. This callback is called once at the end of COPY FROM */
+    void        (*CopyToEnd) (CopyToState cstate);
+} CopyToRoutine;
+
+#endif                            /* COPYAPI_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b54428b38cd..8edb41cce2e 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -503,6 +503,7 @@ CopyMultiInsertInfo
 CopyOnErrorChoice
 CopySource
 CopyStmt
+CopyToRoutine
 CopyToState
 CopyToStateData
 Cost
-- 
2.45.2
From c7eba0bf7bf4c42933b71d98aa6d519af0ce0121 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 18 Nov 2024 16:32:43 -0800
Subject: [PATCH v27 2/9] Refactor COPY FROM to use format callback functions.
This commit introduces a new CopyFromRoutine struct, which is a set of
callback routines to read tuples in a specific format. It also makes
COPY FROM with the existing formats (text, CSV, and binary) utilize
these format callbacks.
This change is a preliminary step towards making the COPY TO command
extensible in terms of output formats.
Similar to XXXX, this refactoring contributes to a performance
improvement by reducing the number of "if" branches that need to be
checked on a per-row basis when sending field representations in text
or CSV mode. The performance benchmark results showed ~5% performance
gain in text or CSV mode.
Author: Sutou Kouhei
Reviewed-by: Michael Paquier, Tomas Vondra, Masahiko Sawada
Reviewed-by: Junwang Zhao
Discussion: https://postgr.es/m/20231204.153548.2126325458835528809.kou@clear-code.com
---
 contrib/file_fdw/file_fdw.c              |   1 -
 src/backend/commands/copyfrom.c          | 190 +++++++--
 src/backend/commands/copyfromparse.c     | 504 +++++++++++++----------
 src/include/commands/copy.h              |   2 -
 src/include/commands/copyapi.h           |  48 ++-
 src/include/commands/copyfrom_internal.h |  13 +-
 src/tools/pgindent/typedefs.list         |   1 +
 7 files changed, 492 insertions(+), 267 deletions(-)
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 9e2896f32ae..bac31315fcb 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -21,7 +21,6 @@
 #include "access/table.h"
 #include "catalog/pg_authid.h"
 #include "catalog/pg_foreign_table.h"
-#include "commands/copy.h"
 #include "commands/copyfrom_internal.h"
 #include "commands/defrem.h"
 #include "commands/explain.h"
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 754cb496169..c84081c3ba3 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -106,6 +106,145 @@ typedef struct CopyMultiInsertInfo
 /* non-export function prototypes */
 static void ClosePipeFromProgram(CopyFromState cstate);
 
+/*
+ * Built-in format-specific routines. One-row callbacks are defined in
+ * copyfromparse.c
+ */
+static void CopyFromTextLikeInFunc(CopyFromState cstate, Oid atttypid, FmgrInfo *finfo,
+                                   Oid *typioparam);
+static void CopyFromTextLikeStart(CopyFromState cstate, TupleDesc tupDesc);
+static void CopyFromTextLikeEnd(CopyFromState cstate);
+static void CopyFromBinaryInFunc(CopyFromState cstate, Oid atttypid,
+                                 FmgrInfo *finfo, Oid *typioparam);
+static void CopyFromBinaryStart(CopyFromState cstate, TupleDesc tupDesc);
+static void CopyFromBinaryEnd(CopyFromState cstate);
+
+
+/*
+ * COPY FROM routines for built-in formats.
+ *
+ * CSV and text formats share the same TextLike routines except for the
+ * one-row callback.
+ */
+
+/* text format */
+static const CopyFromRoutine CopyFromRoutineText = {
+    .CopyFromInFunc = CopyFromTextLikeInFunc,
+    .CopyFromStart = CopyFromTextLikeStart,
+    .CopyFromOneRow = CopyFromTextOneRow,
+    .CopyFromEnd = CopyFromTextLikeEnd,
+};
+
+/* CSV format */
+static const CopyFromRoutine CopyFromRoutineCSV = {
+    .CopyFromInFunc = CopyFromTextLikeInFunc,
+    .CopyFromStart = CopyFromTextLikeStart,
+    .CopyFromOneRow = CopyFromCSVOneRow,
+    .CopyFromEnd = CopyFromTextLikeEnd,
+};
+
+/* binary format */
+static const CopyFromRoutine CopyFromRoutineBinary = {
+    .CopyFromInFunc = CopyFromBinaryInFunc,
+    .CopyFromStart = CopyFromBinaryStart,
+    .CopyFromOneRow = CopyFromBinaryOneRow,
+    .CopyFromEnd = CopyFromBinaryEnd,
+};
+
+/* Return COPY FROM routines for the given options */
+static const CopyFromRoutine *
+CopyFromGetRoutine(CopyFormatOptions opts)
+{
+    if (opts.csv_mode)
+        return &CopyFromRoutineCSV;
+    else if (opts.binary)
+        return &CopyFromRoutineBinary;
+
+    /* default is text */
+    return &CopyFromRoutineText;
+}
+
+/* Implementation of the start callback for text and CSV formats */
+static void
+CopyFromTextLikeStart(CopyFromState cstate, TupleDesc tupDesc)
+{
+    AttrNumber    attr_count;
+
+    /*
+     * If encoding conversion is needed, we need another buffer to hold the
+     * converted input data.  Otherwise, we can just point input_buf to the
+     * same buffer as raw_buf.
+     */
+    if (cstate->need_transcoding)
+    {
+        cstate->input_buf = (char *) palloc(INPUT_BUF_SIZE + 1);
+        cstate->input_buf_index = cstate->input_buf_len = 0;
+    }
+    else
+        cstate->input_buf = cstate->raw_buf;
+    cstate->input_reached_eof = false;
+
+    initStringInfo(&cstate->line_buf);
+
+    /*
+     * Create workspace for CopyReadAttributes results; used by CSV and text
+     * format.
+     */
+    attr_count = list_length(cstate->attnumlist);
+    cstate->max_fields = attr_count;
+    cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *));
+}
+
+/*
+ * Implementation of the infunc callback for text and CSV formats. Assign
+ * the input function data to the given *finfo.
+ */
+static void
+CopyFromTextLikeInFunc(CopyFromState cstate, Oid atttypid, FmgrInfo *finfo,
+                       Oid *typioparam)
+{
+    Oid            func_oid;
+
+    getTypeInputInfo(atttypid, &func_oid, typioparam);
+    fmgr_info(func_oid, finfo);
+}
+
+/* Implementation of the end callback for text and CSV formats */
+static void
+CopyFromTextLikeEnd(CopyFromState cstate)
+{
+    /* nothing to do */
+}
+
+/* Implementation of the start callback for binary format */
+static void
+CopyFromBinaryStart(CopyFromState cstate, TupleDesc tupDesc)
+{
+    /* Read and verify binary header */
+    ReceiveCopyBinaryHeader(cstate);
+}
+
+/*
+ * Implementation of the infunc callback for binary format. Assign
+ * the binary input function to the given *finfo.
+ */
+static void
+CopyFromBinaryInFunc(CopyFromState cstate, Oid atttypid,
+                     FmgrInfo *finfo, Oid *typioparam)
+{
+    Oid            func_oid;
+
+    getTypeBinaryInputInfo(atttypid, &func_oid, typioparam);
+    fmgr_info(func_oid, finfo);
+}
+
+/* Implementation of the end callback for binary format */
+static void
+CopyFromBinaryEnd(CopyFromState cstate)
+{
+    /* nothing to do */
+}
+
 /*
  * error context callback for COPY FROM
  *
@@ -1396,7 +1535,6 @@ BeginCopyFrom(ParseState *pstate,
                 num_defaults;
     FmgrInfo   *in_functions;
     Oid           *typioparams;
-    Oid            in_func_oid;
     int           *defmap;
     ExprState **defexprs;
     MemoryContext oldcontext;
@@ -1428,6 +1566,9 @@ BeginCopyFrom(ParseState *pstate,
     /* Extract options from the statement node tree */
     ProcessCopyOptions(pstate, &cstate->opts, true /* is_from */ , options);
 
+    /* Set the format routine */
+    cstate->routine = CopyFromGetRoutine(cstate->opts);
+
     /* Process the target relation */
     cstate->rel = rel;
 
@@ -1583,25 +1724,6 @@ BeginCopyFrom(ParseState *pstate,
     cstate->raw_buf_index = cstate->raw_buf_len = 0;
     cstate->raw_reached_eof = false;
 
-    if (!cstate->opts.binary)
-    {
-        /*
-         * If encoding conversion is needed, we need another buffer to hold
-         * the converted input data.  Otherwise, we can just point input_buf
-         * to the same buffer as raw_buf.
-         */
-        if (cstate->need_transcoding)
-        {
-            cstate->input_buf = (char *) palloc(INPUT_BUF_SIZE + 1);
-            cstate->input_buf_index = cstate->input_buf_len = 0;
-        }
-        else
-            cstate->input_buf = cstate->raw_buf;
-        cstate->input_reached_eof = false;
-
-        initStringInfo(&cstate->line_buf);
-    }
-
     initStringInfo(&cstate->attribute_buf);
 
     /* Assign range table and rteperminfos, we'll need them in CopyFrom. */
@@ -1634,13 +1756,9 @@ BeginCopyFrom(ParseState *pstate,
             continue;
 
         /* Fetch the input function and typioparam info */
-        if (cstate->opts.binary)
-            getTypeBinaryInputInfo(att->atttypid,
-                                   &in_func_oid, &typioparams[attnum - 1]);
-        else
-            getTypeInputInfo(att->atttypid,
-                             &in_func_oid, &typioparams[attnum - 1]);
-        fmgr_info(in_func_oid, &in_functions[attnum - 1]);
+        cstate->routine->CopyFromInFunc(cstate, att->atttypid,
+                                        &in_functions[attnum - 1],
+                                        &typioparams[attnum - 1]);
 
         /* Get default info if available */
         defexprs[attnum - 1] = NULL;
@@ -1775,20 +1893,7 @@ BeginCopyFrom(ParseState *pstate,
 
     pgstat_progress_update_multi_param(3, progress_cols, progress_vals);
 
-    if (cstate->opts.binary)
-    {
-        /* Read and verify binary header */
-        ReceiveCopyBinaryHeader(cstate);
-    }
-
-    /* create workspace for CopyReadAttributes results */
-    if (!cstate->opts.binary)
-    {
-        AttrNumber    attr_count = list_length(cstate->attnumlist);
-
-        cstate->max_fields = attr_count;
-        cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *));
-    }
+    cstate->routine->CopyFromStart(cstate, tupDesc);
 
     MemoryContextSwitchTo(oldcontext);
 
@@ -1801,6 +1906,9 @@ BeginCopyFrom(ParseState *pstate,
 void
 EndCopyFrom(CopyFromState cstate)
 {
+    /* Invoke the end callback */
+    cstate->routine->CopyFromEnd(cstate);
+
     /* No COPY FROM related resources except memory. */
     if (cstate->is_program)
     {
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index d1d43b53d83..fdb506c58be 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -62,7 +62,6 @@
 #include <unistd.h>
 #include <sys/stat.h>
 
-#include "commands/copy.h"
 #include "commands/copyfrom_internal.h"
 #include "commands/progress.h"
 #include "executor/executor.h"
@@ -140,8 +139,8 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 
 
 /* non-export function prototypes */
-static bool CopyReadLine(CopyFromState cstate);
-static bool CopyReadLineText(CopyFromState cstate);
+static bool CopyReadLine(CopyFromState cstate, bool is_csv);
+static bool CopyReadLineText(CopyFromState cstate, bool is_csv);
 static int    CopyReadAttributesText(CopyFromState cstate);
 static int    CopyReadAttributesCSV(CopyFromState cstate);
 static Datum CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
@@ -740,9 +739,11 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes)
  * in the relation.
  *
  * NOTE: force_not_null option are not applied to the returned fields.
+ *
+ * We use pg_attribute_always_inline to reduce function call overheads.
  */
-bool
-NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
+static pg_attribute_always_inline bool
+NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields, bool is_csv)
 {
     int            fldct;
     bool        done;
@@ -759,13 +760,17 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
         tupDesc = RelationGetDescr(cstate->rel);
 
         cstate->cur_lineno++;
-        done = CopyReadLine(cstate);
+        done = CopyReadLine(cstate, is_csv);
 
         if (cstate->opts.header_line == COPY_HEADER_MATCH)
         {
             int            fldnum;
 
-            if (cstate->opts.csv_mode)
+            /*
+             * is_csv will be optimized away by compiler, as argument is
+             * constant at caller.
+             */
+            if (is_csv)
                 fldct = CopyReadAttributesCSV(cstate);
             else
                 fldct = CopyReadAttributesText(cstate);
@@ -809,7 +814,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
     cstate->cur_lineno++;
 
     /* Actually read the line into memory here */
-    done = CopyReadLine(cstate);
+    done = CopyReadLine(cstate, is_csv);
 
     /*
      * EOF at start of line means we're done.  If we see EOF after some
@@ -819,8 +824,13 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
     if (done && cstate->line_buf.len == 0)
         return false;
 
-    /* Parse the line into de-escaped field values */
-    if (cstate->opts.csv_mode)
+    /*
+     * Parse the line into de-escaped field values
+     *
+     * is_csv will be optimized away by compiler, as argument is constant at
+     * caller.
+     */
+    if (is_csv)
         fldct = CopyReadAttributesCSV(cstate);
     else
         fldct = CopyReadAttributesText(cstate);
@@ -830,6 +840,244 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
     return true;
 }
 
+/*
+ * Workhorse for CopyFromTextOneRow() and CopyFromCSVOneRow().
+ *
+ * We use pg_attribute_always_inline to reduce function call overheads.
+ */
+static pg_attribute_always_inline bool
+CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
+                       Datum *values, bool *nulls, bool is_csv)
+{
+    TupleDesc    tupDesc;
+    AttrNumber    attr_count;
+    FmgrInfo   *in_functions = cstate->in_functions;
+    Oid           *typioparams = cstate->typioparams;
+    ExprState **defexprs = cstate->defexprs;
+    char      **field_strings;
+    ListCell   *cur;
+    int            fldct;
+    int            fieldno;
+    char       *string;
+
+    tupDesc = RelationGetDescr(cstate->rel);
+    attr_count = list_length(cstate->attnumlist);
+
+    /* read raw fields in the next line */
+    if (!NextCopyFromRawFields(cstate, &field_strings, &fldct, is_csv))
+        return false;
+
+    /* check for overflowing fields */
+    if (attr_count > 0 && fldct > attr_count)
+        ereport(ERROR,
+                (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                 errmsg("extra data after last expected column")));
+
+    fieldno = 0;
+
+    /* Loop to read the user attributes on the line. */
+    foreach(cur, cstate->attnumlist)
+    {
+        int            attnum = lfirst_int(cur);
+        int            m = attnum - 1;
+        Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+        if (fieldno >= fldct)
+            ereport(ERROR,
+                    (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                     errmsg("missing data for column \"%s\"",
+                            NameStr(att->attname))));
+        string = field_strings[fieldno++];
+
+        if (cstate->convert_select_flags &&
+            !cstate->convert_select_flags[m])
+        {
+            /* ignore input field, leaving column as NULL */
+            continue;
+        }
+
+        if (is_csv)
+        {
+            if (string == NULL &&
+                cstate->opts.force_notnull_flags[m])
+            {
+                /*
+                 * FORCE_NOT_NULL option is set and column is NULL - convert
+                 * it to the NULL string.
+                 */
+                string = cstate->opts.null_print;
+            }
+            else if (string != NULL && cstate->opts.force_null_flags[m]
+                     && strcmp(string, cstate->opts.null_print) == 0)
+            {
+                /*
+                 * FORCE_NULL option is set and column matches the NULL
+                 * string. It must have been quoted, or otherwise the string
+                 * would already have been set to NULL. Convert it to NULL as
+                 * specified.
+                 */
+                string = NULL;
+            }
+        }
+
+        cstate->cur_attname = NameStr(att->attname);
+        cstate->cur_attval = string;
+
+        if (string != NULL)
+            nulls[m] = false;
+
+        if (cstate->defaults[m])
+        {
+            /*
+             * The caller must supply econtext and have switched into the
+             * per-tuple memory context in it.
+             */
+            Assert(econtext != NULL);
+            Assert(CurrentMemoryContext == econtext->ecxt_per_tuple_memory);
+
+            values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]);
+        }
+
+        /*
+         * If ON_ERROR is specified with IGNORE, skip rows with soft errors
+         */
+        else if (!InputFunctionCallSafe(&in_functions[m],
+                                        string,
+                                        typioparams[m],
+                                        att->atttypmod,
+                                        (Node *) cstate->escontext,
+                                        &values[m]))
+        {
+            Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
+
+            cstate->num_errors++;
+
+            if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+            {
+                /*
+                 * Since we emit line number and column info in the below
+                 * notice message, we suppress error context information other
+                 * than the relation name.
+                 */
+                Assert(!cstate->relname_only);
+                cstate->relname_only = true;
+
+                if (cstate->cur_attval)
+                {
+                    char       *attval;
+
+                    attval = CopyLimitPrintoutLength(cstate->cur_attval);
+                    ereport(NOTICE,
+                            errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\":
\"%s\"",
+                                   (unsigned long long) cstate->cur_lineno,
+                                   cstate->cur_attname,
+                                   attval));
+                    pfree(attval);
+                }
+                else
+                    ereport(NOTICE,
+                            errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": null
input",
+                                   (unsigned long long) cstate->cur_lineno,
+                                   cstate->cur_attname));
+
+                /* reset relname_only */
+                cstate->relname_only = false;
+            }
+
+            return true;
+        }
+
+        cstate->cur_attname = NULL;
+        cstate->cur_attval = NULL;
+    }
+
+    Assert(fieldno == attr_count);
+
+    return true;
+}
+
+/* Implementation of the per-row callback for text format */
+bool
+CopyFromTextOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values,
+                   bool *nulls)
+{
+    return CopyFromTextLikeOneRow(cstate, econtext, values, nulls, false);
+}
+
+/* Implementation of the per-row callback for CSV format */
+bool
+CopyFromCSVOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values,
+                  bool *nulls)
+{
+    return CopyFromTextLikeOneRow(cstate, econtext, values, nulls, true);
+}
+
+/* Implementation of the per-row callback for binary format */
+bool
+CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values,
+                     bool *nulls)
+{
+    TupleDesc    tupDesc;
+    AttrNumber    attr_count;
+    FmgrInfo   *in_functions = cstate->in_functions;
+    Oid           *typioparams = cstate->typioparams;
+    int16        fld_count;
+    ListCell   *cur;
+
+    tupDesc = RelationGetDescr(cstate->rel);
+    attr_count = list_length(cstate->attnumlist);
+
+    cstate->cur_lineno++;
+
+    if (!CopyGetInt16(cstate, &fld_count))
+    {
+        /* EOF detected (end of file, or protocol-level EOF) */
+        return false;
+    }
+
+    if (fld_count == -1)
+    {
+        /*
+         * Received EOF marker.  Wait for the protocol-level EOF, and complain
+         * if it doesn't come immediately.  In COPY FROM STDIN, this ensures
+         * that we correctly handle CopyFail, if client chooses to send that
+         * now.  When copying from file, we could ignore the rest of the file
+         * like in text mode, but we choose to be consistent with the COPY
+         * FROM STDIN case.
+         */
+        char        dummy;
+
+        if (CopyReadBinaryData(cstate, &dummy, 1) > 0)
+            ereport(ERROR,
+                    (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                     errmsg("received copy data after EOF marker")));
+        return false;
+    }
+
+    if (fld_count != attr_count)
+        ereport(ERROR,
+                (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                 errmsg("row field count is %d, expected %d",
+                        (int) fld_count, attr_count)));
+
+    foreach(cur, cstate->attnumlist)
+    {
+        int            attnum = lfirst_int(cur);
+        int            m = attnum - 1;
+        Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+        cstate->cur_attname = NameStr(att->attname);
+        values[m] = CopyReadBinaryAttribute(cstate,
+                                            &in_functions[m],
+                                            typioparams[m],
+                                            att->atttypmod,
+                                            &nulls[m]);
+        cstate->cur_attname = NULL;
+    }
+
+    return true;
+}
+
 /*
  * Read next tuple from file for COPY FROM. Return false if no more tuples.
  *
@@ -847,216 +1095,22 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 {
     TupleDesc    tupDesc;
     AttrNumber    num_phys_attrs,
-                attr_count,
                 num_defaults = cstate->num_defaults;
-    FmgrInfo   *in_functions = cstate->in_functions;
-    Oid           *typioparams = cstate->typioparams;
     int            i;
     int           *defmap = cstate->defmap;
     ExprState **defexprs = cstate->defexprs;
 
     tupDesc = RelationGetDescr(cstate->rel);
     num_phys_attrs = tupDesc->natts;
-    attr_count = list_length(cstate->attnumlist);
 
     /* Initialize all values for row to NULL */
     MemSet(values, 0, num_phys_attrs * sizeof(Datum));
     MemSet(nulls, true, num_phys_attrs * sizeof(bool));
     MemSet(cstate->defaults, false, num_phys_attrs * sizeof(bool));
 
-    if (!cstate->opts.binary)
-    {
-        char      **field_strings;
-        ListCell   *cur;
-        int            fldct;
-        int            fieldno;
-        char       *string;
-
-        /* read raw fields in the next line */
-        if (!NextCopyFromRawFields(cstate, &field_strings, &fldct))
-            return false;
-
-        /* check for overflowing fields */
-        if (attr_count > 0 && fldct > attr_count)
-            ereport(ERROR,
-                    (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                     errmsg("extra data after last expected column")));
-
-        fieldno = 0;
-
-        /* Loop to read the user attributes on the line. */
-        foreach(cur, cstate->attnumlist)
-        {
-            int            attnum = lfirst_int(cur);
-            int            m = attnum - 1;
-            Form_pg_attribute att = TupleDescAttr(tupDesc, m);
-
-            if (fieldno >= fldct)
-                ereport(ERROR,
-                        (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                         errmsg("missing data for column \"%s\"",
-                                NameStr(att->attname))));
-            string = field_strings[fieldno++];
-
-            if (cstate->convert_select_flags &&
-                !cstate->convert_select_flags[m])
-            {
-                /* ignore input field, leaving column as NULL */
-                continue;
-            }
-
-            if (cstate->opts.csv_mode)
-            {
-                if (string == NULL &&
-                    cstate->opts.force_notnull_flags[m])
-                {
-                    /*
-                     * FORCE_NOT_NULL option is set and column is NULL -
-                     * convert it to the NULL string.
-                     */
-                    string = cstate->opts.null_print;
-                }
-                else if (string != NULL && cstate->opts.force_null_flags[m]
-                         && strcmp(string, cstate->opts.null_print) == 0)
-                {
-                    /*
-                     * FORCE_NULL option is set and column matches the NULL
-                     * string. It must have been quoted, or otherwise the
-                     * string would already have been set to NULL. Convert it
-                     * to NULL as specified.
-                     */
-                    string = NULL;
-                }
-            }
-
-            cstate->cur_attname = NameStr(att->attname);
-            cstate->cur_attval = string;
-
-            if (string != NULL)
-                nulls[m] = false;
-
-            if (cstate->defaults[m])
-            {
-                /*
-                 * The caller must supply econtext and have switched into the
-                 * per-tuple memory context in it.
-                 */
-                Assert(econtext != NULL);
-                Assert(CurrentMemoryContext == econtext->ecxt_per_tuple_memory);
-
-                values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]);
-            }
-
-            /*
-             * If ON_ERROR is specified with IGNORE, skip rows with soft
-             * errors
-             */
-            else if (!InputFunctionCallSafe(&in_functions[m],
-                                            string,
-                                            typioparams[m],
-                                            att->atttypmod,
-                                            (Node *) cstate->escontext,
-                                            &values[m]))
-            {
-                Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
-
-                cstate->num_errors++;
-
-                if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
-                {
-                    /*
-                     * Since we emit line number and column info in the below
-                     * notice message, we suppress error context information
-                     * other than the relation name.
-                     */
-                    Assert(!cstate->relname_only);
-                    cstate->relname_only = true;
-
-                    if (cstate->cur_attval)
-                    {
-                        char       *attval;
-
-                        attval = CopyLimitPrintoutLength(cstate->cur_attval);
-                        ereport(NOTICE,
-                                errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\":
\"%s\"",
-                                       (unsigned long long) cstate->cur_lineno,
-                                       cstate->cur_attname,
-                                       attval));
-                        pfree(attval);
-                    }
-                    else
-                        ereport(NOTICE,
-                                errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\":
nullinput",
 
-                                       (unsigned long long) cstate->cur_lineno,
-                                       cstate->cur_attname));
-
-                    /* reset relname_only */
-                    cstate->relname_only = false;
-                }
-
-                return true;
-            }
-
-            cstate->cur_attname = NULL;
-            cstate->cur_attval = NULL;
-        }
-
-        Assert(fieldno == attr_count);
-    }
-    else
-    {
-        /* binary */
-        int16        fld_count;
-        ListCell   *cur;
-
-        cstate->cur_lineno++;
-
-        if (!CopyGetInt16(cstate, &fld_count))
-        {
-            /* EOF detected (end of file, or protocol-level EOF) */
-            return false;
-        }
-
-        if (fld_count == -1)
-        {
-            /*
-             * Received EOF marker.  Wait for the protocol-level EOF, and
-             * complain if it doesn't come immediately.  In COPY FROM STDIN,
-             * this ensures that we correctly handle CopyFail, if client
-             * chooses to send that now.  When copying from file, we could
-             * ignore the rest of the file like in text mode, but we choose to
-             * be consistent with the COPY FROM STDIN case.
-             */
-            char        dummy;
-
-            if (CopyReadBinaryData(cstate, &dummy, 1) > 0)
-                ereport(ERROR,
-                        (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                         errmsg("received copy data after EOF marker")));
-            return false;
-        }
-
-        if (fld_count != attr_count)
-            ereport(ERROR,
-                    (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                     errmsg("row field count is %d, expected %d",
-                            (int) fld_count, attr_count)));
-
-        foreach(cur, cstate->attnumlist)
-        {
-            int            attnum = lfirst_int(cur);
-            int            m = attnum - 1;
-            Form_pg_attribute att = TupleDescAttr(tupDesc, m);
-
-            cstate->cur_attname = NameStr(att->attname);
-            values[m] = CopyReadBinaryAttribute(cstate,
-                                                &in_functions[m],
-                                                typioparams[m],
-                                                att->atttypmod,
-                                                &nulls[m]);
-            cstate->cur_attname = NULL;
-        }
-    }
+    /* Get one row from source */
+    if (!cstate->routine->CopyFromOneRow(cstate, econtext, values, nulls))
+        return false;
 
     /*
      * Now compute and insert any defaults available for the columns not
@@ -1087,7 +1141,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
  * in the final value of line_buf.
  */
 static bool
-CopyReadLine(CopyFromState cstate)
+CopyReadLine(CopyFromState cstate, bool is_csv)
 {
     bool        result;
 
@@ -1095,7 +1149,7 @@ CopyReadLine(CopyFromState cstate)
     cstate->line_buf_valid = false;
 
     /* Parse data and transfer into line_buf */
-    result = CopyReadLineText(cstate);
+    result = CopyReadLineText(cstate, is_csv);
 
     if (result)
     {
@@ -1163,7 +1217,7 @@ CopyReadLine(CopyFromState cstate)
  * CopyReadLineText - inner loop of CopyReadLine for text mode
  */
 static bool
-CopyReadLineText(CopyFromState cstate)
+CopyReadLineText(CopyFromState cstate, bool is_csv)
 {
     char       *copy_input_buf;
     int            input_buf_ptr;
@@ -1178,7 +1232,11 @@ CopyReadLineText(CopyFromState cstate)
     char        quotec = '\0';
     char        escapec = '\0';
 
-    if (cstate->opts.csv_mode)
+    /*
+     * is_csv will be optimized away by compiler, as argument is constant at
+     * caller.
+     */
+    if (is_csv)
     {
         quotec = cstate->opts.quote[0];
         escapec = cstate->opts.escape[0];
@@ -1255,7 +1313,11 @@ CopyReadLineText(CopyFromState cstate)
         prev_raw_ptr = input_buf_ptr;
         c = copy_input_buf[input_buf_ptr++];
 
-        if (cstate->opts.csv_mode)
+        /*
+         * is_csv will be optimized away by compiler, as argument is constant
+         * at caller.
+         */
+        if (is_csv)
         {
             /*
              * If character is '\r', we may need to look ahead below.  Force
@@ -1294,7 +1356,7 @@ CopyReadLineText(CopyFromState cstate)
         }
 
         /* Process \r */
-        if (c == '\r' && (!cstate->opts.csv_mode || !in_quote))
+        if (c == '\r' && (!is_csv || !in_quote))
         {
             /* Check for \r\n on first line, _and_ handle \r\n. */
             if (cstate->eol_type == EOL_UNKNOWN ||
@@ -1322,10 +1384,10 @@ CopyReadLineText(CopyFromState cstate)
                     if (cstate->eol_type == EOL_CRNL)
                         ereport(ERROR,
                                 (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                                 !cstate->opts.csv_mode ?
+                                 !is_csv ?
                                  errmsg("literal carriage return found in data") :
                                  errmsg("unquoted carriage return found in data"),
-                                 !cstate->opts.csv_mode ?
+                                 !is_csv ?
                                  errhint("Use \"\\r\" to represent carriage return.") :
                                  errhint("Use quoted CSV field to represent carriage return.")));
 
@@ -1339,10 +1401,10 @@ CopyReadLineText(CopyFromState cstate)
             else if (cstate->eol_type == EOL_NL)
                 ereport(ERROR,
                         (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                         !cstate->opts.csv_mode ?
+                         !is_csv ?
                          errmsg("literal carriage return found in data") :
                          errmsg("unquoted carriage return found in data"),
-                         !cstate->opts.csv_mode ?
+                         !is_csv ?
                          errhint("Use \"\\r\" to represent carriage return.") :
                          errhint("Use quoted CSV field to represent carriage return.")));
             /* If reach here, we have found the line terminator */
@@ -1350,15 +1412,15 @@ CopyReadLineText(CopyFromState cstate)
         }
 
         /* Process \n */
-        if (c == '\n' && (!cstate->opts.csv_mode || !in_quote))
+        if (c == '\n' && (!is_csv || !in_quote))
         {
             if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL)
                 ereport(ERROR,
                         (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                         !cstate->opts.csv_mode ?
+                         !is_csv ?
                          errmsg("literal newline found in data") :
                          errmsg("unquoted newline found in data"),
-                         !cstate->opts.csv_mode ?
+                         !is_csv ?
                          errhint("Use \"\\n\" to represent newline.") :
                          errhint("Use quoted CSV field to represent newline.")));
             cstate->eol_type = EOL_NL;    /* in case not set yet */
@@ -1370,7 +1432,7 @@ CopyReadLineText(CopyFromState cstate)
          * Process backslash, except in CSV mode where backslash is a normal
          * character.
          */
-        if (c == '\\' && !cstate->opts.csv_mode)
+        if (c == '\\' && !is_csv)
         {
             char        c2;
 
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 4002a7f5382..f2409013fba 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -107,8 +107,6 @@ extern CopyFromState BeginCopyFrom(ParseState *pstate, Relation rel, Node *where
 extern void EndCopyFrom(CopyFromState cstate);
 extern bool NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
                          Datum *values, bool *nulls);
-extern bool NextCopyFromRawFields(CopyFromState cstate,
-                                  char ***fields, int *nfields);
 extern void CopyFromErrorCallback(void *arg);
 extern char *CopyLimitPrintoutLength(const char *str);
 
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index eccc875d0e8..19aacc8ddd3 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -1,7 +1,7 @@
 /*-------------------------------------------------------------------------
  *
  * copyapi.h
- *      API for COPY TO handlers
+ *      API for COPY TO/FROM handlers
  *
  *
  * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
@@ -54,4 +54,50 @@ typedef struct CopyToRoutine
     void        (*CopyToEnd) (CopyToState cstate);
 } CopyToRoutine;
 
+/*
+ * API structure for a COPY FROM format implementation.     Note this must be
+ * allocated in a server-lifetime manner, typically as a static const struct.
+ */
+typedef struct CopyFromRoutine
+{
+    /*
+     * Set input function information. This callback is called once at the
+     * beginning of COPY FROM.
+     *
+     * 'finfo' can be optionally filled to provide the catalog information of
+     * the input function.
+     *
+     * 'typioparam' can be optionally filled to define the OID of the type to
+     * pass to the input function.'atttypid' is the OID of data type used by
+     * the relation's attribute.
+     */
+    void        (*CopyFromInFunc) (CopyFromState cstate, Oid atttypid,
+                                   FmgrInfo *finfo, Oid *typioparam);
+
+    /*
+     * Start a COPY FROM. This callback is called once at the beginning of
+     * COPY FROM.
+     *
+     * 'tupDesc' is the tuple descriptor of the relation where the data needs
+     * to be copied.  This can be used for any initialization steps required
+     * by a format.
+     */
+    void        (*CopyFromStart) (CopyFromState cstate, TupleDesc tupDesc);
+
+    /*
+     * Read one row from the source and fill *values and *nulls.
+     *
+     * 'econtext' is used to evaluate default expression for each column that
+     * is either not read from the file or is using the DEFAULT option of COPY
+     * FROM.  It is NULL if no default values are used.
+     *
+     * Returns false if there are no more tuples to read.
+     */
+    bool        (*CopyFromOneRow) (CopyFromState cstate, ExprContext *econtext,
+                                   Datum *values, bool *nulls);
+
+    /* End a COPY FROM. This callback is called once at the end of COPY FROM */
+    void        (*CopyFromEnd) (CopyFromState cstate);
+} CopyFromRoutine;
+
 #endif                            /* COPYAPI_H */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index cad52fcc783..1ca058c6add 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -14,7 +14,7 @@
 #ifndef COPYFROM_INTERNAL_H
 #define COPYFROM_INTERNAL_H
 
-#include "commands/copy.h"
+#include "commands/copyapi.h"
 #include "commands/trigger.h"
 #include "nodes/miscnodes.h"
 
@@ -58,6 +58,9 @@ typedef enum CopyInsertMethod
  */
 typedef struct CopyFromStateData
 {
+    /* format routine */
+    const CopyFromRoutine *routine;
+
     /* low-level state data */
     CopySource    copy_src;        /* type of copy source */
     FILE       *copy_file;        /* used if copy_src == COPY_FILE */
@@ -183,4 +186,12 @@ typedef struct CopyFromStateData
 extern void ReceiveCopyBegin(CopyFromState cstate);
 extern void ReceiveCopyBinaryHeader(CopyFromState cstate);
 
+/* One-row callbacks for built-in formats defined in copyfromparse.c */
+extern bool CopyFromTextOneRow(CopyFromState cstate, ExprContext *econtext,
+                               Datum *values, bool *nulls);
+extern bool CopyFromCSVOneRow(CopyFromState cstate, ExprContext *econtext,
+                              Datum *values, bool *nulls);
+extern bool CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext,
+                                 Datum *values, bool *nulls);
+
 #endif                            /* COPYFROM_INTERNAL_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 8edb41cce2e..e09407c7463 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -492,6 +492,7 @@ ConvertRowtypeExpr
 CookedConstraint
 CopyDest
 CopyFormatOptions
+CopyFromRoutine
 CopyFromState
 CopyFromStateData
 CopyHeaderChoice
-- 
2.45.2
From 6cb25f8b8f76ff40a667485b09f5886a63f6d9bd Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Mon, 25 Nov 2024 12:19:15 +0900
Subject: [PATCH v27 3/9] Add support for adding custom COPY TO format
This uses the handler approach like tablesample. The approach creates
an internal function that returns an internal struct. In this case,
a COPY TO handler returns a CopyToRoutine.
This also add a test module for custom COPY TO handler.
---
 src/backend/commands/copy.c                   | 82 ++++++++++++++++---
 src/backend/commands/copyto.c                 |  4 +-
 src/backend/nodes/Makefile                    |  1 +
 src/backend/nodes/gen_node_support.pl         |  2 +
 src/backend/utils/adt/pseudotypes.c           |  1 +
 src/include/catalog/pg_proc.dat               |  6 ++
 src/include/catalog/pg_type.dat               |  6 ++
 src/include/commands/copy.h                   |  1 +
 src/include/commands/copyapi.h                |  2 +
 src/include/nodes/meson.build                 |  1 +
 src/test/modules/Makefile                     |  1 +
 src/test/modules/meson.build                  |  1 +
 src/test/modules/test_copy_format/.gitignore  |  4 +
 src/test/modules/test_copy_format/Makefile    | 23 ++++++
 .../expected/test_copy_format.out             | 17 ++++
 src/test/modules/test_copy_format/meson.build | 33 ++++++++
 .../test_copy_format/sql/test_copy_format.sql |  5 ++
 .../test_copy_format--1.0.sql                 |  8 ++
 .../test_copy_format/test_copy_format.c       | 63 ++++++++++++++
 .../test_copy_format/test_copy_format.control |  4 +
 20 files changed, 251 insertions(+), 14 deletions(-)
 mode change 100644 => 100755 src/backend/nodes/gen_node_support.pl
 create mode 100644 src/test/modules/test_copy_format/.gitignore
 create mode 100644 src/test/modules/test_copy_format/Makefile
 create mode 100644 src/test/modules/test_copy_format/expected/test_copy_format.out
 create mode 100644 src/test/modules/test_copy_format/meson.build
 create mode 100644 src/test/modules/test_copy_format/sql/test_copy_format.sql
 create mode 100644 src/test/modules/test_copy_format/test_copy_format--1.0.sql
 create mode 100644 src/test/modules/test_copy_format/test_copy_format.c
 create mode 100644 src/test/modules/test_copy_format/test_copy_format.control
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 2d98ecf3f4e..d4906b44751 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -32,6 +32,7 @@
 #include "parser/parse_coerce.h"
 #include "parser/parse_collate.h"
 #include "parser/parse_expr.h"
+#include "parser/parse_func.h"
 #include "parser/parse_relation.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
@@ -476,6 +477,73 @@ defGetCopyLogVerbosityChoice(DefElem *def, ParseState *pstate)
     return COPY_LOG_VERBOSITY_DEFAULT;    /* keep compiler quiet */
 }
 
+/*
+ * Process the "format" option.
+ *
+ * This function checks whether the option value is a built-in format such as
+ * "text" and "csv" or not. If the option value isn't a built-in format, this
+ * function finds a COPY format handler that returns a CopyToRoutine (for
+ * is_from == false). If no COPY format handler is found, this function
+ * reports an error.
+ */
+static void
+ProcessCopyOptionFormat(ParseState *pstate,
+                        CopyFormatOptions *opts_out,
+                        bool is_from,
+                        DefElem *defel)
+{
+    char       *format;
+    Oid            funcargtypes[1];
+    Oid            handlerOid = InvalidOid;
+    Datum        datum;
+    Node       *routine;
+
+    format = defGetString(defel);
+
+    /* built-in formats */
+    if (strcmp(format, "text") == 0)
+         /* default format */ return;
+    else if (strcmp(format, "csv") == 0)
+    {
+        opts_out->csv_mode = true;
+        return;
+    }
+    else if (strcmp(format, "binary") == 0)
+    {
+        opts_out->binary = true;
+        return;
+    }
+
+    /* custom format */
+    if (!is_from)
+    {
+        funcargtypes[0] = INTERNALOID;
+        handlerOid = LookupFuncName(list_make1(makeString(format)), 1,
+                                    funcargtypes, true);
+    }
+    if (!OidIsValid(handlerOid))
+        ereport(ERROR,
+                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                 errmsg("COPY format \"%s\" not recognized", format),
+                 parser_errposition(pstate, defel->location)));
+
+    datum = OidFunctionCall1(handlerOid, BoolGetDatum(is_from));
+    routine = (Node *) DatumGetPointer(datum);
+    if (routine == NULL || !IsA(routine, CopyToRoutine))
+        ereport(
+                ERROR,
+                (errcode(
+                         ERRCODE_INVALID_PARAMETER_VALUE),
+                 errmsg("COPY handler function "
+                        "%s(%u) did not return a "
+                        "CopyToRoutine struct",
+                        format, handlerOid),
+                 parser_errposition(
+                                    pstate, defel->location)));
+
+    opts_out->routine = routine;
+}
+
 /*
  * Process the statement option list for COPY.
  *
@@ -519,22 +587,10 @@ ProcessCopyOptions(ParseState *pstate,
 
         if (strcmp(defel->defname, "format") == 0)
         {
-            char       *fmt = defGetString(defel);
-
             if (format_specified)
                 errorConflictingDefElem(defel, pstate);
             format_specified = true;
-            if (strcmp(fmt, "text") == 0)
-                 /* default format */ ;
-            else if (strcmp(fmt, "csv") == 0)
-                opts_out->csv_mode = true;
-            else if (strcmp(fmt, "binary") == 0)
-                opts_out->binary = true;
-            else
-                ereport(ERROR,
-                        (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-                         errmsg("COPY format \"%s\" not recognized", fmt),
-                         parser_errposition(pstate, defel->location)));
+            ProcessCopyOptionFormat(pstate, opts_out, is_from, defel);
         }
         else if (strcmp(defel->defname, "freeze") == 0)
         {
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index f81dadcc12b..ce3dd252c32 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -176,7 +176,9 @@ static const CopyToRoutine CopyToRoutineBinary = {
 static const CopyToRoutine *
 CopyToGetRoutine(CopyFormatOptions opts)
 {
-    if (opts.csv_mode)
+    if (opts.routine)
+        return (const CopyToRoutine *) opts.routine;
+    else if (opts.csv_mode)
         return &CopyToRoutineCSV;
     else if (opts.binary)
         return &CopyToRoutineBinary;
diff --git a/src/backend/nodes/Makefile b/src/backend/nodes/Makefile
index 66bbad8e6e0..173ee11811c 100644
--- a/src/backend/nodes/Makefile
+++ b/src/backend/nodes/Makefile
@@ -49,6 +49,7 @@ node_headers = \
     access/sdir.h \
     access/tableam.h \
     access/tsmapi.h \
+    commands/copyapi.h \
     commands/event_trigger.h \
     commands/trigger.h \
     executor/tuptable.h \
diff --git a/src/backend/nodes/gen_node_support.pl b/src/backend/nodes/gen_node_support.pl
old mode 100644
new mode 100755
index 81df3bdf95f..428ab4f0d93
--- a/src/backend/nodes/gen_node_support.pl
+++ b/src/backend/nodes/gen_node_support.pl
@@ -61,6 +61,7 @@ my @all_input_files = qw(
   access/sdir.h
   access/tableam.h
   access/tsmapi.h
+  commands/copyapi.h
   commands/event_trigger.h
   commands/trigger.h
   executor/tuptable.h
@@ -85,6 +86,7 @@ my @nodetag_only_files = qw(
   access/sdir.h
   access/tableam.h
   access/tsmapi.h
+  commands/copyapi.h
   commands/event_trigger.h
   commands/trigger.h
   executor/tuptable.h
diff --git a/src/backend/utils/adt/pseudotypes.c b/src/backend/utils/adt/pseudotypes.c
index e189e9b79d2..25f24ab95d2 100644
--- a/src/backend/utils/adt/pseudotypes.c
+++ b/src/backend/utils/adt/pseudotypes.c
@@ -370,6 +370,7 @@ PSEUDOTYPE_DUMMY_IO_FUNCS(fdw_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(table_am_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(index_am_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(tsm_handler);
+PSEUDOTYPE_DUMMY_IO_FUNCS(copy_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(internal);
 PSEUDOTYPE_DUMMY_IO_FUNCS(anyelement);
 PSEUDOTYPE_DUMMY_IO_FUNCS(anynonarray);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index cbbe8acd382..959d0301c20 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -7771,6 +7771,12 @@
 { oid => '3312', descr => 'I/O',
   proname => 'tsm_handler_out', prorettype => 'cstring',
   proargtypes => 'tsm_handler', prosrc => 'tsm_handler_out' },
+{ oid => '8753', descr => 'I/O',
+  proname => 'copy_handler_in', proisstrict => 'f', prorettype => 'copy_handler',
+  proargtypes => 'cstring', prosrc => 'copy_handler_in' },
+{ oid => '8754', descr => 'I/O',
+  proname => 'copy_handler_out', prorettype => 'cstring',
+  proargtypes => 'copy_handler', prosrc => 'copy_handler_out' },
 { oid => '267', descr => 'I/O',
   proname => 'table_am_handler_in', proisstrict => 'f',
   prorettype => 'table_am_handler', proargtypes => 'cstring',
diff --git a/src/include/catalog/pg_type.dat b/src/include/catalog/pg_type.dat
index ceff66ccde1..793dd671935 100644
--- a/src/include/catalog/pg_type.dat
+++ b/src/include/catalog/pg_type.dat
@@ -633,6 +633,12 @@
   typcategory => 'P', typinput => 'tsm_handler_in',
   typoutput => 'tsm_handler_out', typreceive => '-', typsend => '-',
   typalign => 'i' },
+{ oid => '8752',
+  descr => 'pseudo-type for the result of a copy to method function',
+  typname => 'copy_handler', typlen => '4', typbyval => 't', typtype => 'p',
+  typcategory => 'P', typinput => 'copy_handler_in',
+  typoutput => 'copy_handler_out', typreceive => '-', typsend => '-',
+  typalign => 'i' },
 { oid => '269',
   descr => 'pseudo-type for the result of a table AM handler function',
   typname => 'table_am_handler', typlen => '4', typbyval => 't', typtype => 'p',
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index f2409013fba..6b740d5b917 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -87,6 +87,7 @@ typedef struct CopyFormatOptions
     CopyLogVerbosityChoice log_verbosity;    /* verbosity of logged messages */
     int64        reject_limit;    /* maximum tolerable number of errors */
     List       *convert_select; /* list of column names (can be NIL) */
+    Node       *routine;        /* CopyToRoutine (can be NULL) */
 } CopyFormatOptions;
 
 /* These are private in commands/copy[from|to].c */
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 19aacc8ddd3..36057b92417 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -24,6 +24,8 @@
  */
 typedef struct CopyToRoutine
 {
+    NodeTag        type;
+
     /*
      * Set output function information. This callback is called once at the
      * beginning of COPY TO.
diff --git a/src/include/nodes/meson.build b/src/include/nodes/meson.build
index b665e55b657..103df1a7873 100644
--- a/src/include/nodes/meson.build
+++ b/src/include/nodes/meson.build
@@ -11,6 +11,7 @@ node_support_input_i = [
   'access/sdir.h',
   'access/tableam.h',
   'access/tsmapi.h',
+  'commands/copyapi.h',
   'commands/event_trigger.h',
   'commands/trigger.h',
   'executor/tuptable.h',
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index c0d3cf0e14b..33e3a49a4fb 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -15,6 +15,7 @@ SUBDIRS = \
           spgist_name_ops \
           test_bloomfilter \
           test_copy_callbacks \
+          test_copy_format \
           test_custom_rmgrs \
           test_ddl_deparse \
           test_dsa \
diff --git a/src/test/modules/meson.build b/src/test/modules/meson.build
index c829b619530..75b6ab1b6a9 100644
--- a/src/test/modules/meson.build
+++ b/src/test/modules/meson.build
@@ -14,6 +14,7 @@ subdir('spgist_name_ops')
 subdir('ssl_passphrase_callback')
 subdir('test_bloomfilter')
 subdir('test_copy_callbacks')
+subdir('test_copy_format')
 subdir('test_custom_rmgrs')
 subdir('test_ddl_deparse')
 subdir('test_dsa')
diff --git a/src/test/modules/test_copy_format/.gitignore b/src/test/modules/test_copy_format/.gitignore
new file mode 100644
index 00000000000..5dcb3ff9723
--- /dev/null
+++ b/src/test/modules/test_copy_format/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/src/test/modules/test_copy_format/Makefile b/src/test/modules/test_copy_format/Makefile
new file mode 100644
index 00000000000..8497f91624d
--- /dev/null
+++ b/src/test/modules/test_copy_format/Makefile
@@ -0,0 +1,23 @@
+# src/test/modules/test_copy_format/Makefile
+
+MODULE_big = test_copy_format
+OBJS = \
+    $(WIN32RES) \
+    test_copy_format.o
+PGFILEDESC = "test_copy_format - test custom COPY FORMAT"
+
+EXTENSION = test_copy_format
+DATA = test_copy_format--1.0.sql
+
+REGRESS = test_copy_format
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_copy_format
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out
b/src/test/modules/test_copy_format/expected/test_copy_format.out
new file mode 100644
index 00000000000..adfe7d1572a
--- /dev/null
+++ b/src/test/modules/test_copy_format/expected/test_copy_format.out
@@ -0,0 +1,17 @@
+CREATE EXTENSION test_copy_format;
+CREATE TABLE public.test (a smallint, b integer, c bigint);
+INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
+ERROR:  COPY format "test_copy_format" not recognized
+LINE 1: COPY public.test FROM stdin WITH (FORMAT 'test_copy_format')...
+                                          ^
+COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
+NOTICE:  test_copy_format: is_from=false
+NOTICE:  CopyToOutFunc: atttypid=21
+NOTICE:  CopyToOutFunc: atttypid=23
+NOTICE:  CopyToOutFunc: atttypid=20
+NOTICE:  CopyToStart: natts=3
+NOTICE:  CopyToOneRow: tts_nvalid=3
+NOTICE:  CopyToOneRow: tts_nvalid=3
+NOTICE:  CopyToOneRow: tts_nvalid=3
+NOTICE:  CopyToEnd
diff --git a/src/test/modules/test_copy_format/meson.build b/src/test/modules/test_copy_format/meson.build
new file mode 100644
index 00000000000..4cefe7b709a
--- /dev/null
+++ b/src/test/modules/test_copy_format/meson.build
@@ -0,0 +1,33 @@
+# Copyright (c) 2024, PostgreSQL Global Development Group
+
+test_copy_format_sources = files(
+  'test_copy_format.c',
+)
+
+if host_system == 'windows'
+  test_copy_format_sources += rc_lib_gen.process(win32ver_rc, extra_args: [
+    '--NAME', 'test_copy_format',
+    '--FILEDESC', 'test_copy_format - test custom COPY FORMAT',])
+endif
+
+test_copy_format = shared_module('test_copy_format',
+  test_copy_format_sources,
+  kwargs: pg_test_mod_args,
+)
+test_install_libs += test_copy_format
+
+test_install_data += files(
+  'test_copy_format.control',
+  'test_copy_format--1.0.sql',
+)
+
+tests += {
+  'name': 'test_copy_format',
+  'sd': meson.current_source_dir(),
+  'bd': meson.current_build_dir(),
+  'regress': {
+    'sql': [
+      'test_copy_format',
+    ],
+  },
+}
diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql
b/src/test/modules/test_copy_format/sql/test_copy_format.sql
new file mode 100644
index 00000000000..810b3d8cedc
--- /dev/null
+++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql
@@ -0,0 +1,5 @@
+CREATE EXTENSION test_copy_format;
+CREATE TABLE public.test (a smallint, b integer, c bigint);
+INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
+COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
diff --git a/src/test/modules/test_copy_format/test_copy_format--1.0.sql
b/src/test/modules/test_copy_format/test_copy_format--1.0.sql
new file mode 100644
index 00000000000..d24ea03ce99
--- /dev/null
+++ b/src/test/modules/test_copy_format/test_copy_format--1.0.sql
@@ -0,0 +1,8 @@
+/* src/test/modules/test_copy_format/test_copy_format--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_copy_format" to load this file. \quit
+
+CREATE FUNCTION test_copy_format(internal)
+    RETURNS copy_handler
+    AS 'MODULE_PATHNAME' LANGUAGE C;
diff --git a/src/test/modules/test_copy_format/test_copy_format.c
b/src/test/modules/test_copy_format/test_copy_format.c
new file mode 100644
index 00000000000..e064f40473b
--- /dev/null
+++ b/src/test/modules/test_copy_format/test_copy_format.c
@@ -0,0 +1,63 @@
+/*--------------------------------------------------------------------------
+ *
+ * test_copy_format.c
+ *        Code for testing custom COPY format.
+ *
+ * Portions Copyright (c) 2024, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *        src/test/modules/test_copy_format/test_copy_format.c
+ *
+ * -------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "commands/copyapi.h"
+#include "commands/defrem.h"
+
+PG_MODULE_MAGIC;
+
+static void
+CopyToOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo)
+{
+    ereport(NOTICE, (errmsg("CopyToOutFunc: atttypid=%d", atttypid)));
+}
+
+static void
+CopyToStart(CopyToState cstate, TupleDesc tupDesc)
+{
+    ereport(NOTICE, (errmsg("CopyToStart: natts=%d", tupDesc->natts)));
+}
+
+static void
+CopyToOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+    ereport(NOTICE, (errmsg("CopyToOneRow: tts_nvalid=%u", slot->tts_nvalid)));
+}
+
+static void
+CopyToEnd(CopyToState cstate)
+{
+    ereport(NOTICE, (errmsg("CopyToEnd")));
+}
+
+static const CopyToRoutine CopyToRoutineTestCopyFormat = {
+    .type = T_CopyToRoutine,
+    .CopyToOutFunc = CopyToOutFunc,
+    .CopyToStart = CopyToStart,
+    .CopyToOneRow = CopyToOneRow,
+    .CopyToEnd = CopyToEnd,
+};
+
+PG_FUNCTION_INFO_V1(test_copy_format);
+Datum
+test_copy_format(PG_FUNCTION_ARGS)
+{
+    bool        is_from = PG_GETARG_BOOL(0);
+
+    ereport(NOTICE,
+            (errmsg("test_copy_format: is_from=%s", is_from ? "true" : "false")));
+
+    PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat);
+}
diff --git a/src/test/modules/test_copy_format/test_copy_format.control
b/src/test/modules/test_copy_format/test_copy_format.control
new file mode 100644
index 00000000000..f05a6362358
--- /dev/null
+++ b/src/test/modules/test_copy_format/test_copy_format.control
@@ -0,0 +1,4 @@
+comment = 'Test code for custom COPY format'
+default_version = '1.0'
+module_pathname = '$libdir/test_copy_format'
+relocatable = true
-- 
2.45.2
From d0d842e37c7ec59bf7c77df3e6518ce80ec59575 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Mon, 25 Nov 2024 13:58:33 +0900
Subject: [PATCH v27 4/9] Export CopyToStateData
It's for custom COPY TO format handlers implemented as extension.
This just moves codes. This doesn't change codes except CopyDest enum
values. CopyDest/CopyFrom enum values such as COPY_FILE are conflicted
each other. So COPY_DEST_ prefix instead of COPY_ prefix is used for
CopyDest enum values. For example, COPY_FILE in CopyDest is renamed to
COPY_DEST_FILE.
Note that this isn't enough to implement custom COPY TO format
handlers as extension. We'll do the followings in a subsequent commit:
1. Add an opaque space for custom COPY TO format handler
2. Export CopySendEndOfRow() to flush buffer
---
 src/backend/commands/copyto.c  | 77 ++++------------------------------
 src/include/commands/copy.h    |  2 +-
 src/include/commands/copyapi.h | 62 +++++++++++++++++++++++++++
 3 files changed, 71 insertions(+), 70 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index ce3dd252c32..96b5e144a1d 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -36,67 +36,6 @@
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
-/*
- * Represents the different dest cases we need to worry about at
- * the bottom level
- */
-typedef enum CopyDest
-{
-    COPY_FILE,                    /* to file (or a piped program) */
-    COPY_FRONTEND,                /* to frontend */
-    COPY_CALLBACK,                /* to callback function */
-} CopyDest;
-
-/*
- * This struct contains all the state variables used throughout a COPY TO
- * operation.
- *
- * Multi-byte encodings: all supported client-side encodings encode multi-byte
- * characters by having the first byte's high bit set. Subsequent bytes of the
- * character can have the high bit not set. When scanning data in such an
- * encoding to look for a match to a single-byte (ie ASCII) character, we must
- * use the full pg_encoding_mblen() machinery to skip over multibyte
- * characters, else we might find a false match to a trailing byte. In
- * supported server encodings, there is no possibility of a false match, and
- * it's faster to make useless comparisons to trailing bytes than it is to
- * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true
- * when we have to do it the hard way.
- */
-typedef struct CopyToStateData
-{
-    /* format-specific routines */
-    const CopyToRoutine *routine;
-
-    /* low-level state data */
-    CopyDest    copy_dest;        /* type of copy source/destination */
-    FILE       *copy_file;        /* used if copy_dest == COPY_FILE */
-    StringInfo    fe_msgbuf;        /* used for all dests during COPY TO */
-
-    int            file_encoding;    /* file or remote side's character encoding */
-    bool        need_transcoding;    /* file encoding diff from server? */
-    bool        encoding_embeds_ascii;    /* ASCII can be non-first byte? */
-
-    /* parameters from the COPY command */
-    Relation    rel;            /* relation to copy to */
-    QueryDesc  *queryDesc;        /* executable query to copy from */
-    List       *attnumlist;        /* integer list of attnums to copy */
-    char       *filename;        /* filename, or NULL for STDOUT */
-    bool        is_program;        /* is 'filename' a program to popen? */
-    copy_data_dest_cb data_dest_cb; /* function for writing data */
-
-    CopyFormatOptions opts;
-    Node       *whereClause;    /* WHERE condition (or NULL) */
-
-    /*
-     * Working state
-     */
-    MemoryContext copycontext;    /* per-copy execution context */
-
-    FmgrInfo   *out_functions;    /* lookup info for output functions */
-    MemoryContext rowcontext;    /* per-row evaluation context */
-    uint64        bytes_processed;    /* number of bytes processed so far */
-} CopyToStateData;
-
 /* DestReceiver for COPY (query) TO */
 typedef struct
 {
@@ -406,7 +345,7 @@ SendCopyBegin(CopyToState cstate)
     for (i = 0; i < natts; i++)
         pq_sendint16(&buf, format); /* per-column formats */
     pq_endmessage(&buf);
-    cstate->copy_dest = COPY_FRONTEND;
+    cstate->copy_dest = COPY_DEST_FRONTEND;
 }
 
 static void
@@ -453,7 +392,7 @@ CopySendEndOfRow(CopyToState cstate)
 
     switch (cstate->copy_dest)
     {
-        case COPY_FILE:
+        case COPY_DEST_FILE:
             if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1,
                        cstate->copy_file) != 1 ||
                 ferror(cstate->copy_file))
@@ -487,11 +426,11 @@ CopySendEndOfRow(CopyToState cstate)
                              errmsg("could not write to COPY file: %m")));
             }
             break;
-        case COPY_FRONTEND:
+        case COPY_DEST_FRONTEND:
             /* Dump the accumulated row as one CopyData message */
             (void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len);
             break;
-        case COPY_CALLBACK:
+        case COPY_DEST_CALLBACK:
             cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len);
             break;
     }
@@ -512,7 +451,7 @@ CopySendTextLikeEndOfRow(CopyToState cstate)
 {
     switch (cstate->copy_dest)
     {
-        case COPY_FILE:
+        case COPY_DEST_FILE:
             /* Default line termination depends on platform */
 #ifndef WIN32
             CopySendChar(cstate, '\n');
@@ -520,7 +459,7 @@ CopySendTextLikeEndOfRow(CopyToState cstate)
             CopySendString(cstate, "\r\n");
 #endif
             break;
-        case COPY_FRONTEND:
+        case COPY_DEST_FRONTEND:
             /* The FE/BE protocol uses \n as newline for all platforms */
             CopySendChar(cstate, '\n');
             break;
@@ -904,12 +843,12 @@ BeginCopyTo(ParseState *pstate,
     /* See Multibyte encoding comment above */
     cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding);
 
-    cstate->copy_dest = COPY_FILE;    /* default */
+    cstate->copy_dest = COPY_DEST_FILE; /* default */
 
     if (data_dest_cb)
     {
         progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK;
-        cstate->copy_dest = COPY_CALLBACK;
+        cstate->copy_dest = COPY_DEST_CALLBACK;
         cstate->data_dest_cb = data_dest_cb;
     }
     else if (pipe)
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 6b740d5b917..98aa5707102 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -90,7 +90,7 @@ typedef struct CopyFormatOptions
     Node       *routine;        /* CopyToRoutine (can be NULL) */
 } CopyFormatOptions;
 
-/* These are private in commands/copy[from|to].c */
+/* This is private in commands/copyfrom.c */
 typedef struct CopyFromStateData *CopyFromState;
 typedef struct CopyToStateData *CopyToState;
 
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 36057b92417..1cb2815deab 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -15,6 +15,7 @@
 #define COPYAPI_H
 
 #include "commands/copy.h"
+#include "executor/execdesc.h"
 #include "executor/tuptable.h"
 #include "nodes/execnodes.h"
 
@@ -56,6 +57,67 @@ typedef struct CopyToRoutine
     void        (*CopyToEnd) (CopyToState cstate);
 } CopyToRoutine;
 
+/*
+ * Represents the different dest cases we need to worry about at
+ * the bottom level
+ */
+typedef enum CopyDest
+{
+    COPY_DEST_FILE,                /* to file (or a piped program) */
+    COPY_DEST_FRONTEND,            /* to frontend */
+    COPY_DEST_CALLBACK,            /* to callback function */
+} CopyDest;
+
+/*
+ * This struct contains all the state variables used throughout a COPY TO
+ * operation.
+ *
+ * Multi-byte encodings: all supported client-side encodings encode multi-byte
+ * characters by having the first byte's high bit set. Subsequent bytes of the
+ * character can have the high bit not set. When scanning data in such an
+ * encoding to look for a match to a single-byte (ie ASCII) character, we must
+ * use the full pg_encoding_mblen() machinery to skip over multibyte
+ * characters, else we might find a false match to a trailing byte. In
+ * supported server encodings, there is no possibility of a false match, and
+ * it's faster to make useless comparisons to trailing bytes than it is to
+ * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true
+ * when we have to do it the hard way.
+ */
+typedef struct CopyToStateData
+{
+    /* format-specific routines */
+    const CopyToRoutine *routine;
+
+    /* low-level state data */
+    CopyDest    copy_dest;        /* type of copy source/destination */
+    FILE       *copy_file;        /* used if copy_dest == COPY_FILE */
+    StringInfo    fe_msgbuf;        /* used for all dests during COPY TO */
+
+    int            file_encoding;    /* file or remote side's character encoding */
+    bool        need_transcoding;    /* file encoding diff from server? */
+    bool        encoding_embeds_ascii;    /* ASCII can be non-first byte? */
+
+    /* parameters from the COPY command */
+    Relation    rel;            /* relation to copy to */
+    QueryDesc  *queryDesc;        /* executable query to copy from */
+    List       *attnumlist;        /* integer list of attnums to copy */
+    char       *filename;        /* filename, or NULL for STDOUT */
+    bool        is_program;        /* is 'filename' a program to popen? */
+    copy_data_dest_cb data_dest_cb; /* function for writing data */
+
+    CopyFormatOptions opts;
+    Node       *whereClause;    /* WHERE condition (or NULL) */
+
+    /*
+     * Working state
+     */
+    MemoryContext copycontext;    /* per-copy execution context */
+
+    FmgrInfo   *out_functions;    /* lookup info for output functions */
+    MemoryContext rowcontext;    /* per-row evaluation context */
+    uint64        bytes_processed;    /* number of bytes processed so far */
+} CopyToStateData;
+
 /*
  * API structure for a COPY FROM format implementation.     Note this must be
  * allocated in a server-lifetime manner, typically as a static const struct.
-- 
2.45.2
From ac79ec655da666e4f03ac865d15821f4ac883cc4 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Mon, 25 Nov 2024 14:01:18 +0900
Subject: [PATCH v27 5/9] Add support for implementing custom COPY TO format as
 extension
* Add CopyToStateData::opaque that can be used to keep data for custom
  COPY TO format implementation
* Export CopySendEndOfRow() to flush data in CopyToStateData::fe_msgbuf
  as CopyToStateFlush()
---
 src/backend/commands/copyto.c  | 12 ++++++++++++
 src/include/commands/copyapi.h |  5 +++++
 2 files changed, 17 insertions(+)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 96b5e144a1d..cb9bfa0053f 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -442,6 +442,18 @@ CopySendEndOfRow(CopyToState cstate)
     resetStringInfo(fe_msgbuf);
 }
 
+/*
+ * Export CopySendEndOfRow() for extensions. We want to keep
+ * CopySendEndOfRow() as a static function for
+ * optimization. CopySendEndOfRow() calls in this file may be optimized by a
+ * compiler.
+ */
+void
+CopyToStateFlush(CopyToState cstate)
+{
+    CopySendEndOfRow(cstate);
+}
+
 /*
  * Wrapper function of CopySendEndOfRow for text and CSV formats. Sends the
  * the line termination and do common appropriate things for the end of row.
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 1cb2815deab..030a82aca7f 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -116,8 +116,13 @@ typedef struct CopyToStateData
     FmgrInfo   *out_functions;    /* lookup info for output functions */
     MemoryContext rowcontext;    /* per-row evaluation context */
     uint64        bytes_processed;    /* number of bytes processed so far */
+
+    /* For custom format implementation */
+    void       *opaque;            /* private space */
 } CopyToStateData;
 
+extern void CopyToStateFlush(CopyToState cstate);
+
 /*
  * API structure for a COPY FROM format implementation.     Note this must be
  * allocated in a server-lifetime manner, typically as a static const struct.
-- 
2.45.2
From 859faaa4d19e3c659afec43267eae6b7788bf964 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Mon, 25 Nov 2024 14:11:55 +0900
Subject: [PATCH v27 6/9] Add support for adding custom COPY FROM format
This uses the same handler for COPY TO and COPY FROM but uses
different routine. This uses CopyToRoutine for COPY TO and
CopyFromRoutine for COPY FROM. PostgreSQL calls a COPY TO/FROM handler
with "is_from" argument. It's true for COPY FROM and false for COPY
TO:
    copy_handler(true) returns CopyToRoutine
    copy_handler(false) returns CopyFromRoutine
This also add a test module for custom COPY FROM handler.
---
 src/backend/commands/copy.c                   | 52 ++++++++++++-------
 src/backend/commands/copyfrom.c               |  4 +-
 src/include/catalog/pg_type.dat               |  2 +-
 src/include/commands/copy.h                   |  3 +-
 src/include/commands/copyapi.h                |  2 +
 .../expected/test_copy_format.out             | 10 ++--
 .../test_copy_format/sql/test_copy_format.sql |  1 +
 .../test_copy_format/test_copy_format.c       | 39 +++++++++++++-
 8 files changed, 87 insertions(+), 26 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index d4906b44751..5be649c9c89 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -483,8 +483,8 @@ defGetCopyLogVerbosityChoice(DefElem *def, ParseState *pstate)
  * This function checks whether the option value is a built-in format such as
  * "text" and "csv" or not. If the option value isn't a built-in format, this
  * function finds a COPY format handler that returns a CopyToRoutine (for
- * is_from == false). If no COPY format handler is found, this function
- * reports an error.
+ * is_from == false) or CopyFromRountine (for is_from == true). If no COPY
+ * format handler is found, this function reports an error.
  */
 static void
 ProcessCopyOptionFormat(ParseState *pstate,
@@ -515,12 +515,9 @@ ProcessCopyOptionFormat(ParseState *pstate,
     }
 
     /* custom format */
-    if (!is_from)
-    {
-        funcargtypes[0] = INTERNALOID;
-        handlerOid = LookupFuncName(list_make1(makeString(format)), 1,
-                                    funcargtypes, true);
-    }
+    funcargtypes[0] = INTERNALOID;
+    handlerOid = LookupFuncName(list_make1(makeString(format)), 1,
+                                funcargtypes, true);
     if (!OidIsValid(handlerOid))
         ereport(ERROR,
                 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -529,17 +526,34 @@ ProcessCopyOptionFormat(ParseState *pstate,
 
     datum = OidFunctionCall1(handlerOid, BoolGetDatum(is_from));
     routine = (Node *) DatumGetPointer(datum);
-    if (routine == NULL || !IsA(routine, CopyToRoutine))
-        ereport(
-                ERROR,
-                (errcode(
-                         ERRCODE_INVALID_PARAMETER_VALUE),
-                 errmsg("COPY handler function "
-                        "%s(%u) did not return a "
-                        "CopyToRoutine struct",
-                        format, handlerOid),
-                 parser_errposition(
-                                    pstate, defel->location)));
+    if (is_from)
+    {
+        if (routine == NULL || !IsA(routine, CopyFromRoutine))
+            ereport(
+                    ERROR,
+                    (errcode(
+                             ERRCODE_INVALID_PARAMETER_VALUE),
+                     errmsg("COPY handler function "
+                            "%s(%u) did not return a "
+                            "CopyFromRoutine struct",
+                            format, handlerOid),
+                     parser_errposition(
+                                        pstate, defel->location)));
+    }
+    else
+    {
+        if (routine == NULL || !IsA(routine, CopyToRoutine))
+            ereport(
+                    ERROR,
+                    (errcode(
+                             ERRCODE_INVALID_PARAMETER_VALUE),
+                     errmsg("COPY handler function "
+                            "%s(%u) did not return a "
+                            "CopyToRoutine struct",
+                            format, handlerOid),
+                     parser_errposition(
+                                        pstate, defel->location)));
+    }
 
     opts_out->routine = routine;
 }
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index c84081c3ba3..a4cdab75879 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -155,7 +155,9 @@ static const CopyFromRoutine CopyFromRoutineBinary = {
 static const CopyFromRoutine *
 CopyFromGetRoutine(CopyFormatOptions opts)
 {
-    if (opts.csv_mode)
+    if (opts.routine)
+        return (const CopyFromRoutine *) opts.routine;
+    else if (opts.csv_mode)
         return &CopyFromRoutineCSV;
     else if (opts.binary)
         return &CopyFromRoutineBinary;
diff --git a/src/include/catalog/pg_type.dat b/src/include/catalog/pg_type.dat
index 793dd671935..37ebfa0908f 100644
--- a/src/include/catalog/pg_type.dat
+++ b/src/include/catalog/pg_type.dat
@@ -634,7 +634,7 @@
   typoutput => 'tsm_handler_out', typreceive => '-', typsend => '-',
   typalign => 'i' },
 { oid => '8752',
-  descr => 'pseudo-type for the result of a copy to method function',
+  descr => 'pseudo-type for the result of a copy to/from method function',
   typname => 'copy_handler', typlen => '4', typbyval => 't', typtype => 'p',
   typcategory => 'P', typinput => 'copy_handler_in',
   typoutput => 'copy_handler_out', typreceive => '-', typsend => '-',
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 98aa5707102..e07988a0c74 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -87,7 +87,8 @@ typedef struct CopyFormatOptions
     CopyLogVerbosityChoice log_verbosity;    /* verbosity of logged messages */
     int64        reject_limit;    /* maximum tolerable number of errors */
     List       *convert_select; /* list of column names (can be NIL) */
-    Node       *routine;        /* CopyToRoutine (can be NULL) */
+    Node       *routine;        /* CopyToRoutine or CopyFromRoutine (can be
+                                 * NULL) */
 } CopyFormatOptions;
 
 /* This is private in commands/copyfrom.c */
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 030a82aca7f..fa3d8d87760 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -129,6 +129,8 @@ extern void CopyToStateFlush(CopyToState cstate);
  */
 typedef struct CopyFromRoutine
 {
+    NodeTag        type;
+
     /*
      * Set input function information. This callback is called once at the
      * beginning of COPY FROM.
diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out
b/src/test/modules/test_copy_format/expected/test_copy_format.out
index adfe7d1572a..016893e7026 100644
--- a/src/test/modules/test_copy_format/expected/test_copy_format.out
+++ b/src/test/modules/test_copy_format/expected/test_copy_format.out
@@ -2,9 +2,13 @@ CREATE EXTENSION test_copy_format;
 CREATE TABLE public.test (a smallint, b integer, c bigint);
 INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
 COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
-ERROR:  COPY format "test_copy_format" not recognized
-LINE 1: COPY public.test FROM stdin WITH (FORMAT 'test_copy_format')...
-                                          ^
+NOTICE:  test_copy_format: is_from=true
+NOTICE:  CopyFromInFunc: atttypid=21
+NOTICE:  CopyFromInFunc: atttypid=23
+NOTICE:  CopyFromInFunc: atttypid=20
+NOTICE:  CopyFromStart: natts=3
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromEnd
 COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
 NOTICE:  test_copy_format: is_from=false
 NOTICE:  CopyToOutFunc: atttypid=21
diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql
b/src/test/modules/test_copy_format/sql/test_copy_format.sql
index 810b3d8cedc..0dfdfa00080 100644
--- a/src/test/modules/test_copy_format/sql/test_copy_format.sql
+++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql
@@ -2,4 +2,5 @@ CREATE EXTENSION test_copy_format;
 CREATE TABLE public.test (a smallint, b integer, c bigint);
 INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
 COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
+\.
 COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
diff --git a/src/test/modules/test_copy_format/test_copy_format.c
b/src/test/modules/test_copy_format/test_copy_format.c
index e064f40473b..f6b105659ab 100644
--- a/src/test/modules/test_copy_format/test_copy_format.c
+++ b/src/test/modules/test_copy_format/test_copy_format.c
@@ -18,6 +18,40 @@
 
 PG_MODULE_MAGIC;
 
+static void
+CopyFromInFunc(CopyFromState cstate, Oid atttypid,
+               FmgrInfo *finfo, Oid *typioparam)
+{
+    ereport(NOTICE, (errmsg("CopyFromInFunc: atttypid=%d", atttypid)));
+}
+
+static void
+CopyFromStart(CopyFromState cstate, TupleDesc tupDesc)
+{
+    ereport(NOTICE, (errmsg("CopyFromStart: natts=%d", tupDesc->natts)));
+}
+
+static bool
+CopyFromOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls)
+{
+    ereport(NOTICE, (errmsg("CopyFromOneRow")));
+    return false;
+}
+
+static void
+CopyFromEnd(CopyFromState cstate)
+{
+    ereport(NOTICE, (errmsg("CopyFromEnd")));
+}
+
+static const CopyFromRoutine CopyFromRoutineTestCopyFormat = {
+    .type = T_CopyFromRoutine,
+    .CopyFromInFunc = CopyFromInFunc,
+    .CopyFromStart = CopyFromStart,
+    .CopyFromOneRow = CopyFromOneRow,
+    .CopyFromEnd = CopyFromEnd,
+};
+
 static void
 CopyToOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo)
 {
@@ -59,5 +93,8 @@ test_copy_format(PG_FUNCTION_ARGS)
     ereport(NOTICE,
             (errmsg("test_copy_format: is_from=%s", is_from ? "true" : "false")));
 
-    PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat);
+    if (is_from)
+        PG_RETURN_POINTER(&CopyFromRoutineTestCopyFormat);
+    else
+        PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat);
 }
-- 
2.45.2
From 81cee5244c5f7bdd52745cea09c98991d47207f5 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Mon, 25 Nov 2024 14:19:34 +0900
Subject: [PATCH v27 7/9] Export CopyFromStateData
It's for custom COPY FROM format handlers implemented as extension.
This just moves codes. This doesn't change codes except CopySource
enum values. This changes COPY_ prefix of CopySource enum values to
COPY_SOURCE_ prefix like the CopyDest enum values prefix change. For
example, COPY_FILE in CopySource is renamed to COPY_SOURCE_FILE.
Note that this isn't enough to implement custom COPY FROM format
handlers as extension. We'll do the followings in a subsequent commit:
1. Add an opaque space for custom COPY FROM format handler
2. Export CopyReadBinaryData() to read the next data
---
 src/backend/commands/copyfrom.c          |   4 +-
 src/backend/commands/copyfromparse.c     |  10 +-
 src/include/commands/copy.h              |   1 -
 src/include/commands/copyapi.h           | 166 +++++++++++++++++++++++
 src/include/commands/copyfrom_internal.h | 166 -----------------------
 5 files changed, 173 insertions(+), 174 deletions(-)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index a4cdab75879..e1fef1b95a5 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1704,7 +1704,7 @@ BeginCopyFrom(ParseState *pstate,
                             pg_encoding_to_char(GetDatabaseEncoding()))));
     }
 
-    cstate->copy_src = COPY_FILE;    /* default */
+    cstate->copy_src = COPY_SOURCE_FILE;    /* default */
 
     cstate->whereClause = whereClause;
 
@@ -1832,7 +1832,7 @@ BeginCopyFrom(ParseState *pstate,
     if (data_source_cb)
     {
         progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK;
-        cstate->copy_src = COPY_CALLBACK;
+        cstate->copy_src = COPY_SOURCE_CALLBACK;
         cstate->data_source_cb = data_source_cb;
     }
     else if (pipe)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index fdb506c58be..1c68b0d2952 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -170,7 +170,7 @@ ReceiveCopyBegin(CopyFromState cstate)
     for (i = 0; i < natts; i++)
         pq_sendint16(&buf, format); /* per-column formats */
     pq_endmessage(&buf);
-    cstate->copy_src = COPY_FRONTEND;
+    cstate->copy_src = COPY_SOURCE_FRONTEND;
     cstate->fe_msgbuf = makeStringInfo();
     /* We *must* flush here to ensure FE knows it can send. */
     pq_flush();
@@ -238,7 +238,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
 
     switch (cstate->copy_src)
     {
-        case COPY_FILE:
+        case COPY_SOURCE_FILE:
             bytesread = fread(databuf, 1, maxread, cstate->copy_file);
             if (ferror(cstate->copy_file))
                 ereport(ERROR,
@@ -247,7 +247,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
             if (bytesread == 0)
                 cstate->raw_reached_eof = true;
             break;
-        case COPY_FRONTEND:
+        case COPY_SOURCE_FRONTEND:
             while (maxread > 0 && bytesread < minread && !cstate->raw_reached_eof)
             {
                 int            avail;
@@ -330,7 +330,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
                 bytesread += avail;
             }
             break;
-        case COPY_CALLBACK:
+        case COPY_SOURCE_CALLBACK:
             bytesread = cstate->data_source_cb(databuf, minread, maxread);
             break;
     }
@@ -1158,7 +1158,7 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
          * after \. up to the protocol end of copy data.  (XXX maybe better
          * not to treat \. as special?)
          */
-        if (cstate->copy_src == COPY_FRONTEND)
+        if (cstate->copy_src == COPY_SOURCE_FRONTEND)
         {
             int            inbytes;
 
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index e07988a0c74..50af4b99258 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -91,7 +91,6 @@ typedef struct CopyFormatOptions
                                  * NULL) */
 } CopyFormatOptions;
 
-/* This is private in commands/copyfrom.c */
 typedef struct CopyFromStateData *CopyFromState;
 typedef struct CopyToStateData *CopyToState;
 
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index fa3d8d87760..335584f8877 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -15,6 +15,7 @@
 #define COPYAPI_H
 
 #include "commands/copy.h"
+#include "commands/trigger.h"
 #include "executor/execdesc.h"
 #include "executor/tuptable.h"
 #include "nodes/execnodes.h"
@@ -171,4 +172,169 @@ typedef struct CopyFromRoutine
     void        (*CopyFromEnd) (CopyFromState cstate);
 } CopyFromRoutine;
 
+/*
+ * Represents the different source cases we need to worry about at
+ * the bottom level
+ */
+typedef enum CopySource
+{
+    COPY_SOURCE_FILE,            /* from file (or a piped program) */
+    COPY_SOURCE_FRONTEND,        /* from frontend */
+    COPY_SOURCE_CALLBACK,        /* from callback function */
+} CopySource;
+
+/*
+ *    Represents the end-of-line terminator type of the input
+ */
+typedef enum EolType
+{
+    EOL_UNKNOWN,
+    EOL_NL,
+    EOL_CR,
+    EOL_CRNL,
+} EolType;
+
+/*
+ * Represents the insert method to be used during COPY FROM.
+ */
+typedef enum CopyInsertMethod
+{
+    CIM_SINGLE,                    /* use table_tuple_insert or ExecForeignInsert */
+    CIM_MULTI,                    /* always use table_multi_insert or
+                                 * ExecForeignBatchInsert */
+    CIM_MULTI_CONDITIONAL,        /* use table_multi_insert or
+                                 * ExecForeignBatchInsert only if valid */
+} CopyInsertMethod;
+
+/*
+ * This struct contains all the state variables used throughout a COPY FROM
+ * operation.
+ */
+typedef struct CopyFromStateData
+{
+    /* format routine */
+    const CopyFromRoutine *routine;
+
+    /* low-level state data */
+    CopySource    copy_src;        /* type of copy source */
+    FILE       *copy_file;        /* used if copy_src == COPY_FILE */
+    StringInfo    fe_msgbuf;        /* used if copy_src == COPY_FRONTEND */
+
+    EolType        eol_type;        /* EOL type of input */
+    int            file_encoding;    /* file or remote side's character encoding */
+    bool        need_transcoding;    /* file encoding diff from server? */
+    Oid            conversion_proc;    /* encoding conversion function */
+
+    /* parameters from the COPY command */
+    Relation    rel;            /* relation to copy from */
+    List       *attnumlist;        /* integer list of attnums to copy */
+    char       *filename;        /* filename, or NULL for STDIN */
+    bool        is_program;        /* is 'filename' a program to popen? */
+    copy_data_source_cb data_source_cb; /* function for reading data */
+
+    CopyFormatOptions opts;
+    bool       *convert_select_flags;    /* per-column CSV/TEXT CS flags */
+    Node       *whereClause;    /* WHERE condition (or NULL) */
+
+    /* these are just for error messages, see CopyFromErrorCallback */
+    const char *cur_relname;    /* table name for error messages */
+    uint64        cur_lineno;        /* line number for error messages */
+    const char *cur_attname;    /* current att for error messages */
+    const char *cur_attval;        /* current att value for error messages */
+    bool        relname_only;    /* don't output line number, att, etc. */
+
+    /*
+     * Working state
+     */
+    MemoryContext copycontext;    /* per-copy execution context */
+
+    AttrNumber    num_defaults;    /* count of att that are missing and have
+                                 * default value */
+    FmgrInfo   *in_functions;    /* array of input functions for each attrs */
+    Oid           *typioparams;    /* array of element types for in_functions */
+    ErrorSaveContext *escontext;    /* soft error trapper during in_functions
+                                     * execution */
+    uint64        num_errors;        /* total number of rows which contained soft
+                                 * errors */
+    int           *defmap;            /* array of default att numbers related to
+                                 * missing att */
+    ExprState **defexprs;        /* array of default att expressions for all
+                                 * att */
+    bool       *defaults;        /* if DEFAULT marker was found for
+                                 * corresponding att */
+    bool        volatile_defexprs;    /* is any of defexprs volatile? */
+    List       *range_table;    /* single element list of RangeTblEntry */
+    List       *rteperminfos;    /* single element list of RTEPermissionInfo */
+    ExprState  *qualexpr;
+
+    TransitionCaptureState *transition_capture;
+
+    /*
+     * These variables are used to reduce overhead in COPY FROM.
+     *
+     * attribute_buf holds the separated, de-escaped text for each field of
+     * the current line.  The CopyReadAttributes functions return arrays of
+     * pointers into this buffer.  We avoid palloc/pfree overhead by re-using
+     * the buffer on each cycle.
+     *
+     * In binary COPY FROM, attribute_buf holds the binary data for the
+     * current field, but the usage is otherwise similar.
+     */
+    StringInfoData attribute_buf;
+
+    /* field raw data pointers found by COPY FROM */
+
+    int            max_fields;
+    char      **raw_fields;
+
+    /*
+     * Similarly, line_buf holds the whole input line being processed. The
+     * input cycle is first to read the whole line into line_buf, and then
+     * extract the individual attribute fields into attribute_buf.  line_buf
+     * is preserved unmodified so that we can display it in error messages if
+     * appropriate.  (In binary mode, line_buf is not used.)
+     */
+    StringInfoData line_buf;
+    bool        line_buf_valid; /* contains the row being processed? */
+
+    /*
+     * input_buf holds input data, already converted to database encoding.
+     *
+     * In text mode, CopyReadLine parses this data sufficiently to locate line
+     * boundaries, then transfers the data to line_buf. We guarantee that
+     * there is a \0 at input_buf[input_buf_len] at all times.  (In binary
+     * mode, input_buf is not used.)
+     *
+     * If encoding conversion is not required, input_buf is not a separate
+     * buffer but points directly to raw_buf.  In that case, input_buf_len
+     * tracks the number of bytes that have been verified as valid in the
+     * database encoding, and raw_buf_len is the total number of bytes stored
+     * in the buffer.
+     */
+#define INPUT_BUF_SIZE 65536    /* we palloc INPUT_BUF_SIZE+1 bytes */
+    char       *input_buf;
+    int            input_buf_index;    /* next byte to process */
+    int            input_buf_len;    /* total # of bytes stored */
+    bool        input_reached_eof;    /* true if we reached EOF */
+    bool        input_reached_error;    /* true if a conversion error happened */
+    /* Shorthand for number of unconsumed bytes available in input_buf */
+#define INPUT_BUF_BYTES(cstate) ((cstate)->input_buf_len - (cstate)->input_buf_index)
+
+    /*
+     * raw_buf holds raw input data read from the data source (file or client
+     * connection), not yet converted to the database encoding.  Like with
+     * 'input_buf', we guarantee that there is a \0 at raw_buf[raw_buf_len].
+     */
+#define RAW_BUF_SIZE 65536        /* we palloc RAW_BUF_SIZE+1 bytes */
+    char       *raw_buf;
+    int            raw_buf_index;    /* next byte to process */
+    int            raw_buf_len;    /* total # of bytes stored */
+    bool        raw_reached_eof;    /* true if we reached EOF */
+
+    /* Shorthand for number of unconsumed bytes available in raw_buf */
+#define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index)
+
+    uint64        bytes_processed;    /* number of bytes processed so far */
+} CopyFromStateData;
+
 #endif                            /* COPYAPI_H */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 1ca058c6add..23760eb0e02 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -15,174 +15,8 @@
 #define COPYFROM_INTERNAL_H
 
 #include "commands/copyapi.h"
-#include "commands/trigger.h"
 #include "nodes/miscnodes.h"
 
-/*
- * Represents the different source cases we need to worry about at
- * the bottom level
- */
-typedef enum CopySource
-{
-    COPY_FILE,                    /* from file (or a piped program) */
-    COPY_FRONTEND,                /* from frontend */
-    COPY_CALLBACK,                /* from callback function */
-} CopySource;
-
-/*
- *    Represents the end-of-line terminator type of the input
- */
-typedef enum EolType
-{
-    EOL_UNKNOWN,
-    EOL_NL,
-    EOL_CR,
-    EOL_CRNL,
-} EolType;
-
-/*
- * Represents the insert method to be used during COPY FROM.
- */
-typedef enum CopyInsertMethod
-{
-    CIM_SINGLE,                    /* use table_tuple_insert or ExecForeignInsert */
-    CIM_MULTI,                    /* always use table_multi_insert or
-                                 * ExecForeignBatchInsert */
-    CIM_MULTI_CONDITIONAL,        /* use table_multi_insert or
-                                 * ExecForeignBatchInsert only if valid */
-} CopyInsertMethod;
-
-/*
- * This struct contains all the state variables used throughout a COPY FROM
- * operation.
- */
-typedef struct CopyFromStateData
-{
-    /* format routine */
-    const CopyFromRoutine *routine;
-
-    /* low-level state data */
-    CopySource    copy_src;        /* type of copy source */
-    FILE       *copy_file;        /* used if copy_src == COPY_FILE */
-    StringInfo    fe_msgbuf;        /* used if copy_src == COPY_FRONTEND */
-
-    EolType        eol_type;        /* EOL type of input */
-    int            file_encoding;    /* file or remote side's character encoding */
-    bool        need_transcoding;    /* file encoding diff from server? */
-    Oid            conversion_proc;    /* encoding conversion function */
-
-    /* parameters from the COPY command */
-    Relation    rel;            /* relation to copy from */
-    List       *attnumlist;        /* integer list of attnums to copy */
-    char       *filename;        /* filename, or NULL for STDIN */
-    bool        is_program;        /* is 'filename' a program to popen? */
-    copy_data_source_cb data_source_cb; /* function for reading data */
-
-    CopyFormatOptions opts;
-    bool       *convert_select_flags;    /* per-column CSV/TEXT CS flags */
-    Node       *whereClause;    /* WHERE condition (or NULL) */
-
-    /* these are just for error messages, see CopyFromErrorCallback */
-    const char *cur_relname;    /* table name for error messages */
-    uint64        cur_lineno;        /* line number for error messages */
-    const char *cur_attname;    /* current att for error messages */
-    const char *cur_attval;        /* current att value for error messages */
-    bool        relname_only;    /* don't output line number, att, etc. */
-
-    /*
-     * Working state
-     */
-    MemoryContext copycontext;    /* per-copy execution context */
-
-    AttrNumber    num_defaults;    /* count of att that are missing and have
-                                 * default value */
-    FmgrInfo   *in_functions;    /* array of input functions for each attrs */
-    Oid           *typioparams;    /* array of element types for in_functions */
-    ErrorSaveContext *escontext;    /* soft error trapper during in_functions
-                                     * execution */
-    uint64        num_errors;        /* total number of rows which contained soft
-                                 * errors */
-    int           *defmap;            /* array of default att numbers related to
-                                 * missing att */
-    ExprState **defexprs;        /* array of default att expressions for all
-                                 * att */
-    bool       *defaults;        /* if DEFAULT marker was found for
-                                 * corresponding att */
-    bool        volatile_defexprs;    /* is any of defexprs volatile? */
-    List       *range_table;    /* single element list of RangeTblEntry */
-    List       *rteperminfos;    /* single element list of RTEPermissionInfo */
-    ExprState  *qualexpr;
-
-    TransitionCaptureState *transition_capture;
-
-    /*
-     * These variables are used to reduce overhead in COPY FROM.
-     *
-     * attribute_buf holds the separated, de-escaped text for each field of
-     * the current line.  The CopyReadAttributes functions return arrays of
-     * pointers into this buffer.  We avoid palloc/pfree overhead by re-using
-     * the buffer on each cycle.
-     *
-     * In binary COPY FROM, attribute_buf holds the binary data for the
-     * current field, but the usage is otherwise similar.
-     */
-    StringInfoData attribute_buf;
-
-    /* field raw data pointers found by COPY FROM */
-
-    int            max_fields;
-    char      **raw_fields;
-
-    /*
-     * Similarly, line_buf holds the whole input line being processed. The
-     * input cycle is first to read the whole line into line_buf, and then
-     * extract the individual attribute fields into attribute_buf.  line_buf
-     * is preserved unmodified so that we can display it in error messages if
-     * appropriate.  (In binary mode, line_buf is not used.)
-     */
-    StringInfoData line_buf;
-    bool        line_buf_valid; /* contains the row being processed? */
-
-    /*
-     * input_buf holds input data, already converted to database encoding.
-     *
-     * In text mode, CopyReadLine parses this data sufficiently to locate line
-     * boundaries, then transfers the data to line_buf. We guarantee that
-     * there is a \0 at input_buf[input_buf_len] at all times.  (In binary
-     * mode, input_buf is not used.)
-     *
-     * If encoding conversion is not required, input_buf is not a separate
-     * buffer but points directly to raw_buf.  In that case, input_buf_len
-     * tracks the number of bytes that have been verified as valid in the
-     * database encoding, and raw_buf_len is the total number of bytes stored
-     * in the buffer.
-     */
-#define INPUT_BUF_SIZE 65536    /* we palloc INPUT_BUF_SIZE+1 bytes */
-    char       *input_buf;
-    int            input_buf_index;    /* next byte to process */
-    int            input_buf_len;    /* total # of bytes stored */
-    bool        input_reached_eof;    /* true if we reached EOF */
-    bool        input_reached_error;    /* true if a conversion error happened */
-    /* Shorthand for number of unconsumed bytes available in input_buf */
-#define INPUT_BUF_BYTES(cstate) ((cstate)->input_buf_len - (cstate)->input_buf_index)
-
-    /*
-     * raw_buf holds raw input data read from the data source (file or client
-     * connection), not yet converted to the database encoding.  Like with
-     * 'input_buf', we guarantee that there is a \0 at raw_buf[raw_buf_len].
-     */
-#define RAW_BUF_SIZE 65536        /* we palloc RAW_BUF_SIZE+1 bytes */
-    char       *raw_buf;
-    int            raw_buf_index;    /* next byte to process */
-    int            raw_buf_len;    /* total # of bytes stored */
-    bool        raw_reached_eof;    /* true if we reached EOF */
-
-    /* Shorthand for number of unconsumed bytes available in raw_buf */
-#define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index)
-
-    uint64        bytes_processed;    /* number of bytes processed so far */
-} CopyFromStateData;
-
 extern void ReceiveCopyBegin(CopyFromState cstate);
 extern void ReceiveCopyBinaryHeader(CopyFromState cstate);
 
-- 
2.45.2
From 1500fa121c6d3028db4673ab595d2ba79c55cbf7 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Mon, 25 Nov 2024 14:21:39 +0900
Subject: [PATCH v27 8/9] Add support for implementing custom COPY FROM format
 as extension
* Add CopyFromStateData::opaque that can be used to keep data for
  custom COPY From format implementation
* Export CopyReadBinaryData() to read the next data as
  CopyFromStateRead()
---
 src/backend/commands/copyfromparse.c | 12 ++++++++++++
 src/include/commands/copyapi.h       |  5 +++++
 2 files changed, 17 insertions(+)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 1c68b0d2952..0a7e7255b7d 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -729,6 +729,18 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes)
     return copied_bytes;
 }
 
+/*
+ * Export CopyReadBinaryData() for extensions. We want to keep
+ * CopyReadBinaryData() as a static function for
+ * optimization. CopyReadBinaryData() calls in this file may be optimized by
+ * a compiler.
+ */
+int
+CopyFromStateRead(CopyFromState cstate, char *dest, int nbytes)
+{
+    return CopyReadBinaryData(cstate, dest, nbytes);
+}
+
 /*
  * Read raw fields in the next line for COPY FROM in text or csv mode.
  * Return false if no more lines.
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 335584f8877..caba308533d 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -335,6 +335,11 @@ typedef struct CopyFromStateData
 #define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index)
 
     uint64        bytes_processed;    /* number of bytes processed so far */
+
+    /* For custom format implementation */
+    void       *opaque;            /* private space */
 } CopyFromStateData;
 
+extern int    CopyFromStateRead(CopyFromState cstate, char *dest, int nbytes);
+
 #endif                            /* COPYAPI_H */
-- 
2.45.2
From 407dc990b071b7938606b2f8082c8e9a09652d6b Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Wed, 27 Nov 2024 16:23:55 +0900
Subject: [PATCH v27 9/9] Add CopyFromSkipErrorRow() for custom COPY format
 extension
Extensions must call CopyFromSkipErrorRow() when CopyFromOneRow
callback reports an error by errsave(). CopyFromSkipErrorRow() handles
"ON_ERROR stop" and "LOG_VERBOSITY verbose" cases.
---
 src/backend/commands/copyfromparse.c          | 82 +++++++++++--------
 src/include/commands/copyapi.h                |  2 +
 .../expected/test_copy_format.out             | 47 +++++++++++
 .../test_copy_format/sql/test_copy_format.sql | 24 ++++++
 .../test_copy_format/test_copy_format.c       | 82 ++++++++++++++++++-
 5 files changed, 199 insertions(+), 38 deletions(-)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 0a7e7255b7d..713a000fe5f 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -852,6 +852,51 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields, bool i
     return true;
 }
 
+/*
+ * Call this when you report an error by errsave() in your CopyFromOneRow
+ * callback. This handles "ON_ERROR stop" and "LOG_VERBOSITY verbose" cases
+ * for you.
+ */
+void
+CopyFromSkipErrorRow(CopyFromState cstate)
+{
+    Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
+
+    cstate->num_errors++;
+
+    if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+    {
+        /*
+         * Since we emit line number and column info in the below notice
+         * message, we suppress error context information other than the
+         * relation name.
+         */
+        Assert(!cstate->relname_only);
+        cstate->relname_only = true;
+
+        if (cstate->cur_attval)
+        {
+            char       *attval;
+
+            attval = CopyLimitPrintoutLength(cstate->cur_attval);
+            ereport(NOTICE,
+                    errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": \"%s\"",
+                           (unsigned long long) cstate->cur_lineno,
+                           cstate->cur_attname,
+                           attval));
+            pfree(attval);
+        }
+        else
+            ereport(NOTICE,
+                    errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": null
input",
+                           (unsigned long long) cstate->cur_lineno,
+                           cstate->cur_attname));
+
+        /* reset relname_only */
+        cstate->relname_only = false;
+    }
+}
+
 /*
  * Workhorse for CopyFromTextOneRow() and CopyFromCSVOneRow().
  *
@@ -960,42 +1005,7 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
                                         (Node *) cstate->escontext,
                                         &values[m]))
         {
-            Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
-
-            cstate->num_errors++;
-
-            if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
-            {
-                /*
-                 * Since we emit line number and column info in the below
-                 * notice message, we suppress error context information other
-                 * than the relation name.
-                 */
-                Assert(!cstate->relname_only);
-                cstate->relname_only = true;
-
-                if (cstate->cur_attval)
-                {
-                    char       *attval;
-
-                    attval = CopyLimitPrintoutLength(cstate->cur_attval);
-                    ereport(NOTICE,
-                            errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\":
\"%s\"",
-                                   (unsigned long long) cstate->cur_lineno,
-                                   cstate->cur_attname,
-                                   attval));
-                    pfree(attval);
-                }
-                else
-                    ereport(NOTICE,
-                            errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": null
input",
-                                   (unsigned long long) cstate->cur_lineno,
-                                   cstate->cur_attname));
-
-                /* reset relname_only */
-                cstate->relname_only = false;
-            }
-
+            CopyFromSkipErrorRow(cstate);
             return true;
         }
 
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index caba308533d..09585163e0d 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -342,4 +342,6 @@ typedef struct CopyFromStateData
 
 extern int    CopyFromStateRead(CopyFromState cstate, char *dest, int nbytes);
 
+extern void CopyFromSkipErrorRow(CopyFromState cstate);
+
 #endif                            /* COPYAPI_H */
diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out
b/src/test/modules/test_copy_format/expected/test_copy_format.out
index 016893e7026..b9a6baa85c0 100644
--- a/src/test/modules/test_copy_format/expected/test_copy_format.out
+++ b/src/test/modules/test_copy_format/expected/test_copy_format.out
@@ -1,6 +1,8 @@
 CREATE EXTENSION test_copy_format;
 CREATE TABLE public.test (a smallint, b integer, c bigint);
 INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+-- 987 is accepted.
+-- 654 is a hard error because ON_ERROR is stop by default.
 COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
 NOTICE:  test_copy_format: is_from=true
 NOTICE:  CopyFromInFunc: atttypid=21
@@ -8,7 +10,50 @@ NOTICE:  CopyFromInFunc: atttypid=23
 NOTICE:  CopyFromInFunc: atttypid=20
 NOTICE:  CopyFromStart: natts=3
 NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromOneRow
+ERROR:  invalid value: "6"
+CONTEXT:  COPY test, line 2, column a: "6"
+-- 987 is accepted.
+-- 654 is a soft error because ON_ERROR is ignore.
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore);
+NOTICE:  test_copy_format: is_from=true
+NOTICE:  CopyFromInFunc: atttypid=21
+NOTICE:  CopyFromInFunc: atttypid=23
+NOTICE:  CopyFromInFunc: atttypid=20
+NOTICE:  CopyFromStart: natts=3
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromOneRow
+NOTICE:  1 row was skipped due to data type incompatibility
 NOTICE:  CopyFromEnd
+-- 987 is accepted.
+-- 654 is a soft error because ON_ERROR is ignore.
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore, LOG_VERBOSITY verbose);
+NOTICE:  test_copy_format: is_from=true
+NOTICE:  CopyFromInFunc: atttypid=21
+NOTICE:  CopyFromInFunc: atttypid=23
+NOTICE:  CopyFromInFunc: atttypid=20
+NOTICE:  CopyFromStart: natts=3
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromOneRow
+NOTICE:  skipping row due to data type incompatibility at line 2 for column "a": "6"
+NOTICE:  CopyFromOneRow
+NOTICE:  1 row was skipped due to data type incompatibility
+NOTICE:  CopyFromEnd
+-- 987 is accepted.
+-- 654 is a soft error because ON_ERROR is ignore.
+-- 321 is a hard error.
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore);
+NOTICE:  test_copy_format: is_from=true
+NOTICE:  CopyFromInFunc: atttypid=21
+NOTICE:  CopyFromInFunc: atttypid=23
+NOTICE:  CopyFromInFunc: atttypid=20
+NOTICE:  CopyFromStart: natts=3
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromOneRow
+ERROR:  too much lines: 3
+CONTEXT:  COPY test, line 3
 COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
 NOTICE:  test_copy_format: is_from=false
 NOTICE:  CopyToOutFunc: atttypid=21
@@ -18,4 +63,6 @@ NOTICE:  CopyToStart: natts=3
 NOTICE:  CopyToOneRow: tts_nvalid=3
 NOTICE:  CopyToOneRow: tts_nvalid=3
 NOTICE:  CopyToOneRow: tts_nvalid=3
+NOTICE:  CopyToOneRow: tts_nvalid=3
+NOTICE:  CopyToOneRow: tts_nvalid=3
 NOTICE:  CopyToEnd
diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql
b/src/test/modules/test_copy_format/sql/test_copy_format.sql
index 0dfdfa00080..86db71bce7f 100644
--- a/src/test/modules/test_copy_format/sql/test_copy_format.sql
+++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql
@@ -1,6 +1,30 @@
 CREATE EXTENSION test_copy_format;
 CREATE TABLE public.test (a smallint, b integer, c bigint);
 INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+-- 987 is accepted.
+-- 654 is a hard error because ON_ERROR is stop by default.
 COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
+987
+654
+\.
+-- 987 is accepted.
+-- 654 is a soft error because ON_ERROR is ignore.
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore);
+987
+654
+\.
+-- 987 is accepted.
+-- 654 is a soft error because ON_ERROR is ignore.
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore, LOG_VERBOSITY verbose);
+987
+654
+\.
+-- 987 is accepted.
+-- 654 is a soft error because ON_ERROR is ignore.
+-- 321 is a hard error.
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore);
+987
+654
+321
 \.
 COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
diff --git a/src/test/modules/test_copy_format/test_copy_format.c
b/src/test/modules/test_copy_format/test_copy_format.c
index f6b105659ab..f0f53838aef 100644
--- a/src/test/modules/test_copy_format/test_copy_format.c
+++ b/src/test/modules/test_copy_format/test_copy_format.c
@@ -32,10 +32,88 @@ CopyFromStart(CopyFromState cstate, TupleDesc tupDesc)
 }
 
 static bool
-CopyFromOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls)
+CopyFromOneRow(CopyFromState cstate, ExprContext *econtext,
+               Datum *values, bool *nulls)
 {
+    int            n_attributes = list_length(cstate->attnumlist);
+    char       *line;
+    int            line_size = n_attributes + 1;    /* +1 is for new line */
+    int            read_bytes;
+
     ereport(NOTICE, (errmsg("CopyFromOneRow")));
-    return false;
+
+    cstate->cur_lineno++;
+    line = palloc(line_size);
+    read_bytes = CopyFromStateRead(cstate, line, line_size);
+    if (read_bytes == 0)
+        return false;
+    if (read_bytes != line_size)
+        ereport(ERROR,
+                (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                 errmsg("one line must be %d bytes: %d",
+                        line_size, read_bytes)));
+
+    if (cstate->cur_lineno == 1)
+    {
+        /* Success */
+        TupleDesc    tupDesc = RelationGetDescr(cstate->rel);
+        ListCell   *cur;
+        int            i = 0;
+
+        foreach(cur, cstate->attnumlist)
+        {
+            int            attnum = lfirst_int(cur);
+            int            m = attnum - 1;
+            Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+            if (att->atttypid == INT2OID)
+            {
+                values[i] = Int16GetDatum(line[i] - '0');
+            }
+            else if (att->atttypid == INT4OID)
+            {
+                values[i] = Int32GetDatum(line[i] - '0');
+            }
+            else if (att->atttypid == INT8OID)
+            {
+                values[i] = Int64GetDatum(line[i] - '0');
+            }
+            nulls[i] = false;
+            i++;
+        }
+    }
+    else if (cstate->cur_lineno == 2)
+    {
+        /* Soft error */
+        TupleDesc    tupDesc = RelationGetDescr(cstate->rel);
+        int            attnum = lfirst_int(list_head(cstate->attnumlist));
+        int            m = attnum - 1;
+        Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+        char        value[2];
+
+        cstate->cur_attname = NameStr(att->attname);
+        value[0] = line[0];
+        value[1] = '\0';
+        cstate->cur_attval = value;
+        errsave((Node *) cstate->escontext,
+                (
+                 errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+                 errmsg("invalid value: \"%c\"", line[0])));
+        CopyFromSkipErrorRow(cstate);
+        cstate->cur_attname = NULL;
+        cstate->cur_attval = NULL;
+        return true;
+    }
+    else
+    {
+        /* Hard error */
+        ereport(ERROR,
+                (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                 errmsg("too much lines: %llu",
+                        (unsigned long long) cstate->cur_lineno)));
+    }
+
+    return true;
 }
 
 static void
-- 
2.45.2
			
		Hi, On Mon, Nov 25, 2024 at 2:02 PM Sutou Kouhei <kou@clear-code.com> wrote: > > Hi, > > In <20241125.110620.313152541320718947.kou@clear-code.com> > "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 25 Nov 2024 11:06:20 +0900 (JST), > Sutou Kouhei <kou@clear-code.com> wrote: > > >> I've attached the v25 patches that squashed the minor changes I made > >> in v24 and incorporated all comments I got so far. I think these two > >> patches are in good shape. Could you rebase remaining patches on top > >> of them so that we can see the big picture of this feature? > > > > OK. I'll work on it. > > I've attached the v26 patch set: > > 0001: It's same as 0001 in the v25 patch set. > 0002: It's same as 0002 in the v25 patch set. > 0003: It's same as 0003 in the v23 patch set. > This parses the "format" option and adds support for > custom TO handler. > 0004: It's same as 0004 in the v23 patch set. > This exports CopyToStateData. But the following > enums/structs/functions aren't moved to copyapi.h from > copy.h: > * CopyHeaderChoice > * CopyOnErrorChoice > * CopyLogVerbosityChoice > * CopyFormatOptions > * copy_data_dest_cb() > 0005: It's same as 0005 in the v23 patch set. > This adds missing APIs to implement custom TO handler > as an extension. > 0006: It's same as 0008 in the v23 patch set. > This adds support for custom FROM handler. > 0007: It's same as 0009 in the v23 patch set. > This exports CopyFromStateData. > 0008: It's same as 0010 in the v23 patch set. > This adds missing APIs to implement custom FROM handler > as an extension. > > I've also updated https://github.com/kou/pg-copy-arrow for > the current API. > > I think that we can merge only 0001/0002 as the first > step. Because they don't change the current behavior and > they improve performance. We can merge other patches after > that. > > >> Regarding exposing the structs such as CopyToStateData, v22-0004 patch > >> moves most of all copy-related structs to copyapi.h from copyto.c, > >> copyfrom_internal.h, and copy.h, which seems odd to me. I think we can > >> expose CopyToStateData (and related structs) in a new file > >> copyto_internal.h and keep other structs in the original header files. > > > > Custom COPY format extensions need to use > > CopyToStateData/CopyFromStateData. For example, > > CopyToStateData::rel is used to retrieve table schema. If we > > move CopyToStateData to copyto_internal.h not copyapi.h, > > custom COPY format extensions need to include > > copyto_internal.h. I feel that it's strange that extensions > > need to use internal headers. > > > > What is your real concern? If you don't want to export > > CopyToStateData/CopyFromStateData entirely, we can provide > > accessors only for some members of them. > > The v26 patch set still exports > CopyToStateData/CopyFromStateData in copyapi.h not > copy{to,from}_internal.h. But I didn't move the following > enums/structs/functions: > > * CopyHeaderChoice > * CopyOnErrorChoice > * CopyLogVerbosityChoice > * CopyFormatOptions > * copy_data_dest_cb() > > What do you think about this approach? > > > Thanks, > -- > kou I just gave this another round of benchmarking tests. I'd like to share the number, since COPY TO has some performance drawbacks, I test only COPY TO. I use the run.sh Tomas provided earlier but use pgbench with a custom script, you can find it here[0]. I tested 3 branches: 1. the master branch 2. all v26 patch sets applied 3. Emitting JSON to file using COPY TO v13 patch set[1], this add some if branch in CopyOneRowTo, so I was expecting this slower than master 2 can be about -3%~+3% compared to 1, but what surprised me is that 3 is always better than 1 & 2. I reviewed the patch set of 3 and I don't see any magic. You can see the detailed results here[2], I can not upload files so I just shared the google doc link, ping me if you can not open the link. [0]: https://github.com/pghacking/scripts/tree/main/extensible_copy [1]: https://www.postgresql.org/message-id/CACJufxH8J0uD-inukxAmd3TVwt-b-y7d7hLGSBdEdLXFGJLyDA%40mail.gmail.com [2]: https://docs.google.com/spreadsheets/d/1wJPXZF4LHe34X9IU1pLG7rI9sCkSy2dEkdj7w7avTqM/edit?usp=sharing -- Regards Junwang Zhao
Hi, In <CAEG8a3LUBcvjwqgt6AijJmg67YN_b_NZ4Kzoxc_dH4rpAq0pKg@mail.gmail.com> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Wed, 27 Nov 2024 19:49:17 +0800, Junwang Zhao <zhjwpku@gmail.com> wrote: > I just gave this another round of benchmarking tests. I'd like to > share the number, > since COPY TO has some performance drawbacks, I test only COPY TO. I > use the run.sh Tomas provided earlier but use pgbench with a custom script, you > can find it here[0]. > > I tested 3 branches: > > 1. the master branch > 2. all v26 patch sets applied > 3. Emitting JSON to file using COPY TO v13 patch set[1], this add some > if branch in CopyOneRowTo, so I was expecting this slower than master > > 2 can be about -3%~+3% compared to 1, but what surprised me is that 3 > is always better than 1 & 2. > > I reviewed the patch set of 3 and I don't see any magic. > > You can see the detailed results here[2], I can not upload files so I > just shared the google doc link, ping me if you can not open the link. > > [0]: https://github.com/pghacking/scripts/tree/main/extensible_copy > [1]: https://www.postgresql.org/message-id/CACJufxH8J0uD-inukxAmd3TVwt-b-y7d7hLGSBdEdLXFGJLyDA%40mail.gmail.com > [2]: https://docs.google.com/spreadsheets/d/1wJPXZF4LHe34X9IU1pLG7rI9sCkSy2dEkdj7w7avTqM/edit?usp=sharing Thanks for sharing your numbers. 1. and 2. shows that there is at least no significant performance regression. I see the patch set of 3. and I think that the result (there is no performance difference between 1. and 3.) isn't strange. The patch set adds some if branches but they aren't used with "text" format at least in per row process. Thanks, -- kou
On Thu, Nov 28, 2024 at 2:16 PM Sutou Kouhei <kou@clear-code.com> wrote: > > Hi, > > In <CAEG8a3LUBcvjwqgt6AijJmg67YN_b_NZ4Kzoxc_dH4rpAq0pKg@mail.gmail.com> > "Re: Make COPY format extendable: Extract COPY TO format implementations" on Wed, 27 Nov 2024 19:49:17 +0800, > Junwang Zhao <zhjwpku@gmail.com> wrote: > > > I just gave this another round of benchmarking tests. I'd like to > > share the number, > > since COPY TO has some performance drawbacks, I test only COPY TO. I > > use the run.sh Tomas provided earlier but use pgbench with a custom script, you > > can find it here[0]. > > > > I tested 3 branches: > > > > 1. the master branch > > 2. all v26 patch sets applied > > 3. Emitting JSON to file using COPY TO v13 patch set[1], this add some > > if branch in CopyOneRowTo, so I was expecting this slower than master > > > > 2 can be about -3%~+3% compared to 1, but what surprised me is that 3 > > is always better than 1 & 2. > > > > I reviewed the patch set of 3 and I don't see any magic. > > > > You can see the detailed results here[2], I can not upload files so I > > just shared the google doc link, ping me if you can not open the link. > > > > [0]: https://github.com/pghacking/scripts/tree/main/extensible_copy > > [1]: https://www.postgresql.org/message-id/CACJufxH8J0uD-inukxAmd3TVwt-b-y7d7hLGSBdEdLXFGJLyDA%40mail.gmail.com > > [2]: https://docs.google.com/spreadsheets/d/1wJPXZF4LHe34X9IU1pLG7rI9sCkSy2dEkdj7w7avTqM/edit?usp=sharing > > Thanks for sharing your numbers. > > 1. and 2. shows that there is at least no significant > performance regression. Agreed. > > I see the patch set of 3. and I think that the result > (there is no performance difference between 1. and 3.) isn't > strange. The patch set adds some if branches but they aren't > used with "text" format at least in per row process. It is not used in "text" format, but it adds some assembly code to the CopyOneRowTo function, so this will have some impact on the cpu i cache I guess. There is difference between 1 and 3, 3 is always better than 1 upto 4% improvement, I forgot to mention that the comparisons are in *sheet2*. > > Thanks, > -- > kou -- Regards Junwang Zhao
Hi, In <CAEG8a3+BmNeEOLmApOCyktYbiZW=s95dvpod_FxJS+3ieVZQ7w@mail.gmail.com> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Thu, 28 Nov 2024 19:02:57 +0800, Junwang Zhao <zhjwpku@gmail.com> wrote: >> > I tested 3 branches: >> > >> > 1. the master branch >> > 2. all v26 patch sets applied >> > 3. Emitting JSON to file using COPY TO v13 patch set[1], this add some >> > if branch in CopyOneRowTo, so I was expecting this slower than master >> > >> > You can see the detailed results here[2], I can not upload files so I >> > just shared the google doc link, ping me if you can not open the link. >> > >> > [1]: https://www.postgresql.org/message-id/CACJufxH8J0uD-inukxAmd3TVwt-b-y7d7hLGSBdEdLXFGJLyDA%40mail.gmail.com >> > [2]: https://docs.google.com/spreadsheets/d/1wJPXZF4LHe34X9IU1pLG7rI9sCkSy2dEkdj7w7avTqM/edit?usp=sharing >> >> Thanks for sharing your numbers. >> >> 1. and 2. shows that there is at least no significant >> performance regression. > > Agreed. Can we focus on only 1. and 2. in this thread? >> I see the patch set of 3. and I think that the result >> (there is no performance difference between 1. and 3.) isn't >> strange. The patch set adds some if branches but they aren't >> used with "text" format at least in per row process. > > It is not used in "text" format, but it adds some assembly code > to the CopyOneRowTo function, so this will have some impact > on the cpu i cache I guess. > > There is difference between 1 and 3, 3 is always better than 1 > upto 4% improvement Can we discuss 1. and 3. in the [1] thread? (Anyway, we may want to confirm whether these numbers are reproducible or not as the first step.) > I forgot to mention that the comparisons > are in *sheet2*. Thanks. I missed it. Thanks, -- kou
On Fri, Nov 29, 2024 at 9:07 AM Sutou Kouhei <kou@clear-code.com> wrote: > > Hi, > > In <CAEG8a3+BmNeEOLmApOCyktYbiZW=s95dvpod_FxJS+3ieVZQ7w@mail.gmail.com> > "Re: Make COPY format extendable: Extract COPY TO format implementations" on Thu, 28 Nov 2024 19:02:57 +0800, > Junwang Zhao <zhjwpku@gmail.com> wrote: > > >> > I tested 3 branches: > >> > > >> > 1. the master branch > >> > 2. all v26 patch sets applied > >> > 3. Emitting JSON to file using COPY TO v13 patch set[1], this add some > >> > if branch in CopyOneRowTo, so I was expecting this slower than master > >> > > >> > You can see the detailed results here[2], I can not upload files so I > >> > just shared the google doc link, ping me if you can not open the link. > >> > > >> > [1]: https://www.postgresql.org/message-id/CACJufxH8J0uD-inukxAmd3TVwt-b-y7d7hLGSBdEdLXFGJLyDA%40mail.gmail.com > >> > [2]: https://docs.google.com/spreadsheets/d/1wJPXZF4LHe34X9IU1pLG7rI9sCkSy2dEkdj7w7avTqM/edit?usp=sharing > >> > >> Thanks for sharing your numbers. > >> > >> 1. and 2. shows that there is at least no significant > >> performance regression. > > > > Agreed. > > Can we focus on only 1. and 2. in this thread? > > >> I see the patch set of 3. and I think that the result > >> (there is no performance difference between 1. and 3.) isn't > >> strange. The patch set adds some if branches but they aren't > >> used with "text" format at least in per row process. > > > > It is not used in "text" format, but it adds some assembly code > > to the CopyOneRowTo function, so this will have some impact > > on the cpu i cache I guess. > > > > There is difference between 1 and 3, 3 is always better than 1 > > upto 4% improvement > > Can we discuss 1. and 3. in the [1] thread? This thread and [1] thread are kind of interleaved, I chose this thread to share the numbers because I think this feature should be committed first and then adapt the *copy to json* as a contrib module. Committers on this thread seem worried about the performance drawback, so what I tried to do is that *if 2 is slightly worse than 1, but better than 3*, then we can commit 2 first, but I did not get the expected number. > > (Anyway, we may want to confirm whether these numbers are > reproducible or not as the first step.) > > > I forgot to mention that the comparisons > > are in *sheet2*. > > Thanks. I missed it. > > > Thanks, > -- > kou -- Regards Junwang Zhao
Hi, In <CAEG8a3+-3fAmiwD5NmE7W4j5-=HLs2OEexQNW9-fB=j=mdxgDQ@mail.gmail.com> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 29 Nov 2024 10:15:04 +0800, Junwang Zhao <zhjwpku@gmail.com> wrote: > This thread and [1] thread are kind of interleaved, I chose this thread > to share the numbers because I think this feature should be committed > first and then adapt the *copy to json* as a contrib module. I agree with you. > Committers on this thread seem worried about the performance > drawback, so what I tried to do is that *if 2 is slightly worse than 1, > but better than 3*, then we can commit 2 first, but I did not get > the expected number. Could you break down which patch in the v13 patch set[1] affected? If we can find which change improves performance, we can use the approach in this patch set too. [1]: https://www.postgresql.org/message-id/CACJufxH8J0uD-inukxAmd3TVwt-b-y7d7hLGSBdEdLXFGJLyDA%40mail.gmail.com Thanks, -- kou
Hi, I noticed that the last patch set (v27) can't be applied to the current master. I've rebased on the current master and created v28 patch set. No code change. Thanks, -- kou From 016ccfc63d2faa441a6996e3dcfd3cdbff7c185f Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Sat, 28 Sep 2024 23:24:49 +0900 Subject: [PATCH v28 1/9] Refactor COPY TO to use format callback functions. This commit introduces a new CopyToRoutine struct, which is a set of callback routines to copy tuples in a specific format. It also makes the existing formats (text, CSV, and binary) utilize these format callbacks. This change is a preliminary step towards making the COPY TO command extensible in terms of output formats. Additionally, this refactoring contributes to a performance improvement by reducing the number of "if" branches that need to be checked on a per-row basis when sending field representations in text or CSV mode. The performance benchmark results showed ~5% performance gain in text or CSV mode. Author: Sutou Kouhei Reviewed-by: Michael Paquier, Tomas Vondra, Masahiko Sawada Reviewed-by: Junwang Zhao Discussion: https://postgr.es/m/20231204.153548.2126325458835528809.kou@clear-code.com --- src/backend/commands/copyto.c | 441 +++++++++++++++++++++---------- src/include/commands/copyapi.h | 57 ++++ src/tools/pgindent/typedefs.list | 1 + 3 files changed, 358 insertions(+), 141 deletions(-) create mode 100644 src/include/commands/copyapi.h diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c index 99cb23cb347..a885779666b 100644 --- a/src/backend/commands/copyto.c +++ b/src/backend/commands/copyto.c @@ -19,7 +19,7 @@ #include <sys/stat.h> #include "access/tableam.h" -#include "commands/copy.h" +#include "commands/copyapi.h" #include "commands/progress.h" #include "executor/execdesc.h" #include "executor/executor.h" @@ -64,6 +64,9 @@ typedef enum CopyDest */ typedef struct CopyToStateData { + /* format-specific routines */ + const CopyToRoutine *routine; + /* low-level state data */ CopyDest copy_dest; /* type of copy source/destination */ FILE *copy_file; /* used if copy_dest == COPY_FILE */ @@ -114,6 +117,19 @@ static void CopyAttributeOutText(CopyToState cstate, const char *string); static void CopyAttributeOutCSV(CopyToState cstate, const char *string, bool use_quote); +/* built-in format-specific routines */ +static void CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc); +static void CopyToTextLikeOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo); +static void CopyToTextOneRow(CopyToState cstate, TupleTableSlot *slot); +static void CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot); +static void CopyToTextLikeOneRow(CopyToState cstate, TupleTableSlot *slot, + bool is_csv); +static void CopyToTextLikeEnd(CopyToState cstate); +static void CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc); +static void CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo); +static void CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot); +static void CopyToBinaryEnd(CopyToState cstate); + /* Low-level communications functions */ static void SendCopyBegin(CopyToState cstate); static void SendCopyEnd(CopyToState cstate); @@ -121,9 +137,254 @@ static void CopySendData(CopyToState cstate, const void *databuf, int datasize); static void CopySendString(CopyToState cstate, const char *str); static void CopySendChar(CopyToState cstate, char c); static void CopySendEndOfRow(CopyToState cstate); +static void CopySendTextLikeEndOfRow(CopyToState cstate); static void CopySendInt32(CopyToState cstate, int32 val); static void CopySendInt16(CopyToState cstate, int16 val); +/* + * COPY TO routines for built-in formats. + * + * CSV and text formats share the same TextLike routines except for the + * one-row callback. + */ + +/* text format */ +static const CopyToRoutine CopyToRoutineText = { + .CopyToStart = CopyToTextLikeStart, + .CopyToOutFunc = CopyToTextLikeOutFunc, + .CopyToOneRow = CopyToTextOneRow, + .CopyToEnd = CopyToTextLikeEnd, +}; + +/* CSV format */ +static const CopyToRoutine CopyToRoutineCSV = { + .CopyToStart = CopyToTextLikeStart, + .CopyToOutFunc = CopyToTextLikeOutFunc, + .CopyToOneRow = CopyToCSVOneRow, + .CopyToEnd = CopyToTextLikeEnd, +}; + +/* binary format */ +static const CopyToRoutine CopyToRoutineBinary = { + .CopyToStart = CopyToBinaryStart, + .CopyToOutFunc = CopyToBinaryOutFunc, + .CopyToOneRow = CopyToBinaryOneRow, + .CopyToEnd = CopyToBinaryEnd, +}; + +/* Return COPY TO routines for the given options */ +static const CopyToRoutine * +CopyToGetRoutine(CopyFormatOptions opts) +{ + if (opts.csv_mode) + return &CopyToRoutineCSV; + else if (opts.binary) + return &CopyToRoutineBinary; + + /* default is text */ + return &CopyToRoutineText; +} + +/* Implementation of the start callback for text and CSV formats */ +static void +CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc) +{ + /* + * For non-binary copy, we need to convert null_print to file encoding, + * because it will be sent directly with CopySendString. + */ + if (cstate->need_transcoding) + cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print, + cstate->opts.null_print_len, + cstate->file_encoding); + + /* if a header has been requested send the line */ + if (cstate->opts.header_line) + { + ListCell *cur; + bool hdr_delim = false; + + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + char *colname; + + if (hdr_delim) + CopySendChar(cstate, cstate->opts.delim[0]); + hdr_delim = true; + + colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname); + + if (cstate->opts.csv_mode) + CopyAttributeOutCSV(cstate, colname, false); + else + CopyAttributeOutText(cstate, colname); + } + + CopySendTextLikeEndOfRow(cstate); + } +} + +/* + * Implementation of the outfunc callback for text and CSV formats. Assign + * the output function data to the given *finfo. + */ +static void +CopyToTextLikeOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo) +{ + Oid func_oid; + bool is_varlena; + + /* Set output function for an attribute */ + getTypeOutputInfo(atttypid, &func_oid, &is_varlena); + fmgr_info(func_oid, finfo); +} + +/* Implementation of the per-row callback for text format */ +static void +CopyToTextOneRow(CopyToState cstate, TupleTableSlot *slot) +{ + CopyToTextLikeOneRow(cstate, slot, false); +} + +/* Implementation of the per-row callback for CSV format */ +static void +CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot) +{ + CopyToTextLikeOneRow(cstate, slot, true); +} + +/* + * Workhorse for CopyToTextOneRow() and CopyToCSVOneRow(). + * + * We use pg_attribute_always_inline to reduce function call overheads. + */ +static pg_attribute_always_inline void +CopyToTextLikeOneRow(CopyToState cstate, + TupleTableSlot *slot, + bool is_csv) +{ + bool need_delim = false; + FmgrInfo *out_functions = cstate->out_functions; + + foreach_int(attnum, cstate->attnumlist) + { + Datum value = slot->tts_values[attnum - 1]; + bool isnull = slot->tts_isnull[attnum - 1]; + + if (need_delim) + CopySendChar(cstate, cstate->opts.delim[0]); + need_delim = true; + + if (isnull) + { + CopySendString(cstate, cstate->opts.null_print_client); + } + else + { + char *string; + + string = OutputFunctionCall(&out_functions[attnum - 1], + value); + + /* + * is_csv will be optimized away by compiler, as argument is + * constant at caller. + */ + if (is_csv) + CopyAttributeOutCSV(cstate, string, + cstate->opts.force_quote_flags[attnum - 1]); + else + CopyAttributeOutText(cstate, string); + } + } + + CopySendTextLikeEndOfRow(cstate); +} + +/* Implementation of the end callback for text and CSV formats */ +static void +CopyToTextLikeEnd(CopyToState cstate) +{ + /* Nothing to do here */ +} + +/* + * Implementation of the start callback for binary format. Send a header + * for a binary copy. + */ +static void +CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc) +{ + int32 tmp; + + /* Signature */ + CopySendData(cstate, BinarySignature, 11); + /* Flags field */ + tmp = 0; + CopySendInt32(cstate, tmp); + /* No header extension */ + tmp = 0; + CopySendInt32(cstate, tmp); +} + +/* + * Implementation of the outfunc callback for binary format. Assign + * the binary output function to the given *finfo. + */ +static void +CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo) +{ + Oid func_oid; + bool is_varlena; + + /* Set output function for an attribute */ + getTypeBinaryOutputInfo(atttypid, &func_oid, &is_varlena); + fmgr_info(func_oid, finfo); +} + +/* Implementation of the per-row callback for binary format */ +static void +CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot) +{ + FmgrInfo *out_functions = cstate->out_functions; + + /* Binary per-tuple header */ + CopySendInt16(cstate, list_length(cstate->attnumlist)); + + foreach_int(attnum, cstate->attnumlist) + { + Datum value = slot->tts_values[attnum - 1]; + bool isnull = slot->tts_isnull[attnum - 1]; + + if (isnull) + { + CopySendInt32(cstate, -1); + } + else + { + bytea *outputbytes; + + outputbytes = SendFunctionCall(&out_functions[attnum - 1], + value); + CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ); + CopySendData(cstate, VARDATA(outputbytes), + VARSIZE(outputbytes) - VARHDRSZ); + } + } + + CopySendEndOfRow(cstate); +} + +/* Implementation of the end callback for binary format */ +static void +CopyToBinaryEnd(CopyToState cstate) +{ + /* Generate trailer for a binary copy */ + CopySendInt16(cstate, -1); + /* Need to flush out the trailer */ + CopySendEndOfRow(cstate); +} /* * Send copy start/stop messages for frontend copies. These have changed @@ -191,16 +452,6 @@ CopySendEndOfRow(CopyToState cstate) switch (cstate->copy_dest) { case COPY_FILE: - if (!cstate->opts.binary) - { - /* Default line termination depends on platform */ -#ifndef WIN32 - CopySendChar(cstate, '\n'); -#else - CopySendString(cstate, "\r\n"); -#endif - } - if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1, cstate->copy_file) != 1 || ferror(cstate->copy_file)) @@ -235,10 +486,6 @@ CopySendEndOfRow(CopyToState cstate) } break; case COPY_FRONTEND: - /* The FE/BE protocol uses \n as newline for all platforms */ - if (!cstate->opts.binary) - CopySendChar(cstate, '\n'); - /* Dump the accumulated row as one CopyData message */ (void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len); break; @@ -254,6 +501,35 @@ CopySendEndOfRow(CopyToState cstate) resetStringInfo(fe_msgbuf); } +/* + * Wrapper function of CopySendEndOfRow for text and CSV formats. Sends the + * the line termination and do common appropriate things for the end of row. + */ +static inline void +CopySendTextLikeEndOfRow(CopyToState cstate) +{ + switch (cstate->copy_dest) + { + case COPY_FILE: + /* Default line termination depends on platform */ +#ifndef WIN32 + CopySendChar(cstate, '\n'); +#else + CopySendString(cstate, "\r\n"); +#endif + break; + case COPY_FRONTEND: + /* The FE/BE protocol uses \n as newline for all platforms */ + CopySendChar(cstate, '\n'); + break; + default: + break; + } + + /* Now take the actions related to the end of a row */ + CopySendEndOfRow(cstate); +} + /* * These functions do apply some data conversion */ @@ -426,6 +702,9 @@ BeginCopyTo(ParseState *pstate, /* Extract options from the statement node tree */ ProcessCopyOptions(pstate, &cstate->opts, false /* is_from */ , options); + /* Set format routine */ + cstate->routine = CopyToGetRoutine(cstate->opts); + /* Process the source/target relation or query */ if (rel) { @@ -771,19 +1050,10 @@ DoCopyTo(CopyToState cstate) foreach(cur, cstate->attnumlist) { int attnum = lfirst_int(cur); - Oid out_func_oid; - bool isvarlena; Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1); - if (cstate->opts.binary) - getTypeBinaryOutputInfo(attr->atttypid, - &out_func_oid, - &isvarlena); - else - getTypeOutputInfo(attr->atttypid, - &out_func_oid, - &isvarlena); - fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]); + cstate->routine->CopyToOutFunc(cstate, attr->atttypid, + &cstate->out_functions[attnum - 1]); } /* @@ -796,56 +1066,7 @@ DoCopyTo(CopyToState cstate) "COPY TO", ALLOCSET_DEFAULT_SIZES); - if (cstate->opts.binary) - { - /* Generate header for a binary copy */ - int32 tmp; - - /* Signature */ - CopySendData(cstate, BinarySignature, 11); - /* Flags field */ - tmp = 0; - CopySendInt32(cstate, tmp); - /* No header extension */ - tmp = 0; - CopySendInt32(cstate, tmp); - } - else - { - /* - * For non-binary copy, we need to convert null_print to file - * encoding, because it will be sent directly with CopySendString. - */ - if (cstate->need_transcoding) - cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print, - cstate->opts.null_print_len, - cstate->file_encoding); - - /* if a header has been requested send the line */ - if (cstate->opts.header_line) - { - bool hdr_delim = false; - - foreach(cur, cstate->attnumlist) - { - int attnum = lfirst_int(cur); - char *colname; - - if (hdr_delim) - CopySendChar(cstate, cstate->opts.delim[0]); - hdr_delim = true; - - colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname); - - if (cstate->opts.csv_mode) - CopyAttributeOutCSV(cstate, colname, false); - else - CopyAttributeOutText(cstate, colname); - } - - CopySendEndOfRow(cstate); - } - } + cstate->routine->CopyToStart(cstate, tupDesc); if (cstate->rel) { @@ -884,13 +1105,7 @@ DoCopyTo(CopyToState cstate) processed = ((DR_copy *) cstate->queryDesc->dest)->processed; } - if (cstate->opts.binary) - { - /* Generate trailer for a binary copy */ - CopySendInt16(cstate, -1); - /* Need to flush out the trailer */ - CopySendEndOfRow(cstate); - } + cstate->routine->CopyToEnd(cstate); MemoryContextDelete(cstate->rowcontext); @@ -903,74 +1118,18 @@ DoCopyTo(CopyToState cstate) /* * Emit one row during DoCopyTo(). */ -static void +static inline void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot) { - FmgrInfo *out_functions = cstate->out_functions; MemoryContext oldcontext; MemoryContextReset(cstate->rowcontext); oldcontext = MemoryContextSwitchTo(cstate->rowcontext); - if (cstate->opts.binary) - { - /* Binary per-tuple header */ - CopySendInt16(cstate, list_length(cstate->attnumlist)); - } - /* Make sure the tuple is fully deconstructed */ slot_getallattrs(slot); - if (!cstate->opts.binary) - { - bool need_delim = false; - - foreach_int(attnum, cstate->attnumlist) - { - Datum value = slot->tts_values[attnum - 1]; - bool isnull = slot->tts_isnull[attnum - 1]; - char *string; - - if (need_delim) - CopySendChar(cstate, cstate->opts.delim[0]); - need_delim = true; - - if (isnull) - CopySendString(cstate, cstate->opts.null_print_client); - else - { - string = OutputFunctionCall(&out_functions[attnum - 1], - value); - if (cstate->opts.csv_mode) - CopyAttributeOutCSV(cstate, string, - cstate->opts.force_quote_flags[attnum - 1]); - else - CopyAttributeOutText(cstate, string); - } - } - } - else - { - foreach_int(attnum, cstate->attnumlist) - { - Datum value = slot->tts_values[attnum - 1]; - bool isnull = slot->tts_isnull[attnum - 1]; - bytea *outputbytes; - - if (isnull) - CopySendInt32(cstate, -1); - else - { - outputbytes = SendFunctionCall(&out_functions[attnum - 1], - value); - CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ); - CopySendData(cstate, VARDATA(outputbytes), - VARSIZE(outputbytes) - VARHDRSZ); - } - } - } - - CopySendEndOfRow(cstate); + cstate->routine->CopyToOneRow(cstate, slot); MemoryContextSwitchTo(oldcontext); } diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h new file mode 100644 index 00000000000..eccc875d0e8 --- /dev/null +++ b/src/include/commands/copyapi.h @@ -0,0 +1,57 @@ +/*------------------------------------------------------------------------- + * + * copyapi.h + * API for COPY TO handlers + * + * + * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + * src/include/commands/copyapi.h + * + *------------------------------------------------------------------------- + */ +#ifndef COPYAPI_H +#define COPYAPI_H + +#include "commands/copy.h" +#include "executor/tuptable.h" +#include "nodes/execnodes.h" + +/* + * API structure for a COPY TO format implementation. Note this must be + * allocated in a server-lifetime manner, typically as a static const struct. + */ +typedef struct CopyToRoutine +{ + /* + * Set output function information. This callback is called once at the + * beginning of COPY TO. + * + * 'finfo' can be optionally filled to provide the catalog information of + * the output function. + * + * 'atttypid' is the OID of data type used by the relation's attribute. + */ + void (*CopyToOutFunc) (CopyToState cstate, Oid atttypid, + FmgrInfo *finfo); + + /* + * Start a COPY TO. This callback is called once at the beginning of COPY + * FROM. + * + * 'tupDesc' is the tuple descriptor of the relation from where the data + * is read. + */ + void (*CopyToStart) (CopyToState cstate, TupleDesc tupDesc); + + /* + * Write one row to the 'slot'. + */ + void (*CopyToOneRow) (CopyToState cstate, TupleTableSlot *slot); + + /* End a COPY TO. This callback is called once at the end of COPY FROM */ + void (*CopyToEnd) (CopyToState cstate); +} CopyToRoutine; + +#endif /* COPYAPI_H */ diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list index d5aa5c295ae..18b2595300e 100644 --- a/src/tools/pgindent/typedefs.list +++ b/src/tools/pgindent/typedefs.list @@ -508,6 +508,7 @@ CopyMultiInsertInfo CopyOnErrorChoice CopySource CopyStmt +CopyToRoutine CopyToState CopyToStateData Cost -- 2.47.1 From 366b5d3aea0d727f113bdd6b0388e0ce86e0868c Mon Sep 17 00:00:00 2001 From: Masahiko Sawada <sawada.mshk@gmail.com> Date: Mon, 18 Nov 2024 16:32:43 -0800 Subject: [PATCH v28 2/9] Refactor COPY FROM to use format callback functions. This commit introduces a new CopyFromRoutine struct, which is a set of callback routines to read tuples in a specific format. It also makes COPY FROM with the existing formats (text, CSV, and binary) utilize these format callbacks. This change is a preliminary step towards making the COPY TO command extensible in terms of output formats. Similar to XXXX, this refactoring contributes to a performance improvement by reducing the number of "if" branches that need to be checked on a per-row basis when sending field representations in text or CSV mode. The performance benchmark results showed ~5% performance gain in text or CSV mode. Author: Sutou Kouhei Reviewed-by: Michael Paquier, Tomas Vondra, Masahiko Sawada Reviewed-by: Junwang Zhao Discussion: https://postgr.es/m/20231204.153548.2126325458835528809.kou@clear-code.com --- contrib/file_fdw/file_fdw.c | 1 - src/backend/commands/copyfrom.c | 190 +++++++-- src/backend/commands/copyfromparse.c | 504 +++++++++++++---------- src/include/commands/copy.h | 2 - src/include/commands/copyapi.h | 48 ++- src/include/commands/copyfrom_internal.h | 13 +- src/tools/pgindent/typedefs.list | 1 + 7 files changed, 492 insertions(+), 267 deletions(-) diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c index 678e754b2b9..323c43dca4a 100644 --- a/contrib/file_fdw/file_fdw.c +++ b/contrib/file_fdw/file_fdw.c @@ -21,7 +21,6 @@ #include "access/table.h" #include "catalog/pg_authid.h" #include "catalog/pg_foreign_table.h" -#include "commands/copy.h" #include "commands/copyfrom_internal.h" #include "commands/defrem.h" #include "commands/explain.h" diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index 0cbd05f5602..2e88f19861d 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -106,6 +106,145 @@ typedef struct CopyMultiInsertInfo /* non-export function prototypes */ static void ClosePipeFromProgram(CopyFromState cstate); +/* + * Built-in format-specific routines. One-row callbacks are defined in + * copyfromparse.c + */ +static void CopyFromTextLikeInFunc(CopyFromState cstate, Oid atttypid, FmgrInfo *finfo, + Oid *typioparam); +static void CopyFromTextLikeStart(CopyFromState cstate, TupleDesc tupDesc); +static void CopyFromTextLikeEnd(CopyFromState cstate); +static void CopyFromBinaryInFunc(CopyFromState cstate, Oid atttypid, + FmgrInfo *finfo, Oid *typioparam); +static void CopyFromBinaryStart(CopyFromState cstate, TupleDesc tupDesc); +static void CopyFromBinaryEnd(CopyFromState cstate); + + +/* + * COPY FROM routines for built-in formats. + * + * CSV and text formats share the same TextLike routines except for the + * one-row callback. + */ + +/* text format */ +static const CopyFromRoutine CopyFromRoutineText = { + .CopyFromInFunc = CopyFromTextLikeInFunc, + .CopyFromStart = CopyFromTextLikeStart, + .CopyFromOneRow = CopyFromTextOneRow, + .CopyFromEnd = CopyFromTextLikeEnd, +}; + +/* CSV format */ +static const CopyFromRoutine CopyFromRoutineCSV = { + .CopyFromInFunc = CopyFromTextLikeInFunc, + .CopyFromStart = CopyFromTextLikeStart, + .CopyFromOneRow = CopyFromCSVOneRow, + .CopyFromEnd = CopyFromTextLikeEnd, +}; + +/* binary format */ +static const CopyFromRoutine CopyFromRoutineBinary = { + .CopyFromInFunc = CopyFromBinaryInFunc, + .CopyFromStart = CopyFromBinaryStart, + .CopyFromOneRow = CopyFromBinaryOneRow, + .CopyFromEnd = CopyFromBinaryEnd, +}; + +/* Return COPY FROM routines for the given options */ +static const CopyFromRoutine * +CopyFromGetRoutine(CopyFormatOptions opts) +{ + if (opts.csv_mode) + return &CopyFromRoutineCSV; + else if (opts.binary) + return &CopyFromRoutineBinary; + + /* default is text */ + return &CopyFromRoutineText; +} + +/* Implementation of the start callback for text and CSV formats */ +static void +CopyFromTextLikeStart(CopyFromState cstate, TupleDesc tupDesc) +{ + AttrNumber attr_count; + + /* + * If encoding conversion is needed, we need another buffer to hold the + * converted input data. Otherwise, we can just point input_buf to the + * same buffer as raw_buf. + */ + if (cstate->need_transcoding) + { + cstate->input_buf = (char *) palloc(INPUT_BUF_SIZE + 1); + cstate->input_buf_index = cstate->input_buf_len = 0; + } + else + cstate->input_buf = cstate->raw_buf; + cstate->input_reached_eof = false; + + initStringInfo(&cstate->line_buf); + + /* + * Create workspace for CopyReadAttributes results; used by CSV and text + * format. + */ + attr_count = list_length(cstate->attnumlist); + cstate->max_fields = attr_count; + cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *)); +} + +/* + * Implementation of the infunc callback for text and CSV formats. Assign + * the input function data to the given *finfo. + */ +static void +CopyFromTextLikeInFunc(CopyFromState cstate, Oid atttypid, FmgrInfo *finfo, + Oid *typioparam) +{ + Oid func_oid; + + getTypeInputInfo(atttypid, &func_oid, typioparam); + fmgr_info(func_oid, finfo); +} + +/* Implementation of the end callback for text and CSV formats */ +static void +CopyFromTextLikeEnd(CopyFromState cstate) +{ + /* nothing to do */ +} + +/* Implementation of the start callback for binary format */ +static void +CopyFromBinaryStart(CopyFromState cstate, TupleDesc tupDesc) +{ + /* Read and verify binary header */ + ReceiveCopyBinaryHeader(cstate); +} + +/* + * Implementation of the infunc callback for binary format. Assign + * the binary input function to the given *finfo. + */ +static void +CopyFromBinaryInFunc(CopyFromState cstate, Oid atttypid, + FmgrInfo *finfo, Oid *typioparam) +{ + Oid func_oid; + + getTypeBinaryInputInfo(atttypid, &func_oid, typioparam); + fmgr_info(func_oid, finfo); +} + +/* Implementation of the end callback for binary format */ +static void +CopyFromBinaryEnd(CopyFromState cstate) +{ + /* nothing to do */ +} + /* * error context callback for COPY FROM * @@ -1396,7 +1535,6 @@ BeginCopyFrom(ParseState *pstate, num_defaults; FmgrInfo *in_functions; Oid *typioparams; - Oid in_func_oid; int *defmap; ExprState **defexprs; MemoryContext oldcontext; @@ -1428,6 +1566,9 @@ BeginCopyFrom(ParseState *pstate, /* Extract options from the statement node tree */ ProcessCopyOptions(pstate, &cstate->opts, true /* is_from */ , options); + /* Set the format routine */ + cstate->routine = CopyFromGetRoutine(cstate->opts); + /* Process the target relation */ cstate->rel = rel; @@ -1583,25 +1724,6 @@ BeginCopyFrom(ParseState *pstate, cstate->raw_buf_index = cstate->raw_buf_len = 0; cstate->raw_reached_eof = false; - if (!cstate->opts.binary) - { - /* - * If encoding conversion is needed, we need another buffer to hold - * the converted input data. Otherwise, we can just point input_buf - * to the same buffer as raw_buf. - */ - if (cstate->need_transcoding) - { - cstate->input_buf = (char *) palloc(INPUT_BUF_SIZE + 1); - cstate->input_buf_index = cstate->input_buf_len = 0; - } - else - cstate->input_buf = cstate->raw_buf; - cstate->input_reached_eof = false; - - initStringInfo(&cstate->line_buf); - } - initStringInfo(&cstate->attribute_buf); /* Assign range table and rteperminfos, we'll need them in CopyFrom. */ @@ -1634,13 +1756,9 @@ BeginCopyFrom(ParseState *pstate, continue; /* Fetch the input function and typioparam info */ - if (cstate->opts.binary) - getTypeBinaryInputInfo(att->atttypid, - &in_func_oid, &typioparams[attnum - 1]); - else - getTypeInputInfo(att->atttypid, - &in_func_oid, &typioparams[attnum - 1]); - fmgr_info(in_func_oid, &in_functions[attnum - 1]); + cstate->routine->CopyFromInFunc(cstate, att->atttypid, + &in_functions[attnum - 1], + &typioparams[attnum - 1]); /* Get default info if available */ defexprs[attnum - 1] = NULL; @@ -1775,20 +1893,7 @@ BeginCopyFrom(ParseState *pstate, pgstat_progress_update_multi_param(3, progress_cols, progress_vals); - if (cstate->opts.binary) - { - /* Read and verify binary header */ - ReceiveCopyBinaryHeader(cstate); - } - - /* create workspace for CopyReadAttributes results */ - if (!cstate->opts.binary) - { - AttrNumber attr_count = list_length(cstate->attnumlist); - - cstate->max_fields = attr_count; - cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *)); - } + cstate->routine->CopyFromStart(cstate, tupDesc); MemoryContextSwitchTo(oldcontext); @@ -1801,6 +1906,9 @@ BeginCopyFrom(ParseState *pstate, void EndCopyFrom(CopyFromState cstate) { + /* Invoke the end callback */ + cstate->routine->CopyFromEnd(cstate); + /* No COPY FROM related resources except memory. */ if (cstate->is_program) { diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index caccdc8563c..65f20d332ee 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -62,7 +62,6 @@ #include <unistd.h> #include <sys/stat.h> -#include "commands/copy.h" #include "commands/copyfrom_internal.h" #include "commands/progress.h" #include "executor/executor.h" @@ -140,8 +139,8 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0"; /* non-export function prototypes */ -static bool CopyReadLine(CopyFromState cstate); -static bool CopyReadLineText(CopyFromState cstate); +static bool CopyReadLine(CopyFromState cstate, bool is_csv); +static bool CopyReadLineText(CopyFromState cstate, bool is_csv); static int CopyReadAttributesText(CopyFromState cstate); static int CopyReadAttributesCSV(CopyFromState cstate); static Datum CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo, @@ -740,9 +739,11 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes) * in the relation. * * NOTE: force_not_null option are not applied to the returned fields. + * + * We use pg_attribute_always_inline to reduce function call overheads. */ -bool -NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields) +static pg_attribute_always_inline bool +NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields, bool is_csv) { int fldct; bool done; @@ -759,13 +760,17 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields) tupDesc = RelationGetDescr(cstate->rel); cstate->cur_lineno++; - done = CopyReadLine(cstate); + done = CopyReadLine(cstate, is_csv); if (cstate->opts.header_line == COPY_HEADER_MATCH) { int fldnum; - if (cstate->opts.csv_mode) + /* + * is_csv will be optimized away by compiler, as argument is + * constant at caller. + */ + if (is_csv) fldct = CopyReadAttributesCSV(cstate); else fldct = CopyReadAttributesText(cstate); @@ -809,7 +814,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields) cstate->cur_lineno++; /* Actually read the line into memory here */ - done = CopyReadLine(cstate); + done = CopyReadLine(cstate, is_csv); /* * EOF at start of line means we're done. If we see EOF after some @@ -819,8 +824,13 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields) if (done && cstate->line_buf.len == 0) return false; - /* Parse the line into de-escaped field values */ - if (cstate->opts.csv_mode) + /* + * Parse the line into de-escaped field values + * + * is_csv will be optimized away by compiler, as argument is constant at + * caller. + */ + if (is_csv) fldct = CopyReadAttributesCSV(cstate); else fldct = CopyReadAttributesText(cstate); @@ -830,6 +840,244 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields) return true; } +/* + * Workhorse for CopyFromTextOneRow() and CopyFromCSVOneRow(). + * + * We use pg_attribute_always_inline to reduce function call overheads. + */ +static pg_attribute_always_inline bool +CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext, + Datum *values, bool *nulls, bool is_csv) +{ + TupleDesc tupDesc; + AttrNumber attr_count; + FmgrInfo *in_functions = cstate->in_functions; + Oid *typioparams = cstate->typioparams; + ExprState **defexprs = cstate->defexprs; + char **field_strings; + ListCell *cur; + int fldct; + int fieldno; + char *string; + + tupDesc = RelationGetDescr(cstate->rel); + attr_count = list_length(cstate->attnumlist); + + /* read raw fields in the next line */ + if (!NextCopyFromRawFields(cstate, &field_strings, &fldct, is_csv)) + return false; + + /* check for overflowing fields */ + if (attr_count > 0 && fldct > attr_count) + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("extra data after last expected column"))); + + fieldno = 0; + + /* Loop to read the user attributes on the line. */ + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + int m = attnum - 1; + Form_pg_attribute att = TupleDescAttr(tupDesc, m); + + if (fieldno >= fldct) + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("missing data for column \"%s\"", + NameStr(att->attname)))); + string = field_strings[fieldno++]; + + if (cstate->convert_select_flags && + !cstate->convert_select_flags[m]) + { + /* ignore input field, leaving column as NULL */ + continue; + } + + if (is_csv) + { + if (string == NULL && + cstate->opts.force_notnull_flags[m]) + { + /* + * FORCE_NOT_NULL option is set and column is NULL - convert + * it to the NULL string. + */ + string = cstate->opts.null_print; + } + else if (string != NULL && cstate->opts.force_null_flags[m] + && strcmp(string, cstate->opts.null_print) == 0) + { + /* + * FORCE_NULL option is set and column matches the NULL + * string. It must have been quoted, or otherwise the string + * would already have been set to NULL. Convert it to NULL as + * specified. + */ + string = NULL; + } + } + + cstate->cur_attname = NameStr(att->attname); + cstate->cur_attval = string; + + if (string != NULL) + nulls[m] = false; + + if (cstate->defaults[m]) + { + /* + * The caller must supply econtext and have switched into the + * per-tuple memory context in it. + */ + Assert(econtext != NULL); + Assert(CurrentMemoryContext == econtext->ecxt_per_tuple_memory); + + values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]); + } + + /* + * If ON_ERROR is specified with IGNORE, skip rows with soft errors + */ + else if (!InputFunctionCallSafe(&in_functions[m], + string, + typioparams[m], + att->atttypmod, + (Node *) cstate->escontext, + &values[m])) + { + Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP); + + cstate->num_errors++; + + if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE) + { + /* + * Since we emit line number and column info in the below + * notice message, we suppress error context information other + * than the relation name. + */ + Assert(!cstate->relname_only); + cstate->relname_only = true; + + if (cstate->cur_attval) + { + char *attval; + + attval = CopyLimitPrintoutLength(cstate->cur_attval); + ereport(NOTICE, + errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": \"%s\"", + (unsigned long long) cstate->cur_lineno, + cstate->cur_attname, + attval)); + pfree(attval); + } + else + ereport(NOTICE, + errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": null input", + (unsigned long long) cstate->cur_lineno, + cstate->cur_attname)); + + /* reset relname_only */ + cstate->relname_only = false; + } + + return true; + } + + cstate->cur_attname = NULL; + cstate->cur_attval = NULL; + } + + Assert(fieldno == attr_count); + + return true; +} + +/* Implementation of the per-row callback for text format */ +bool +CopyFromTextOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, + bool *nulls) +{ + return CopyFromTextLikeOneRow(cstate, econtext, values, nulls, false); +} + +/* Implementation of the per-row callback for CSV format */ +bool +CopyFromCSVOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, + bool *nulls) +{ + return CopyFromTextLikeOneRow(cstate, econtext, values, nulls, true); +} + +/* Implementation of the per-row callback for binary format */ +bool +CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, + bool *nulls) +{ + TupleDesc tupDesc; + AttrNumber attr_count; + FmgrInfo *in_functions = cstate->in_functions; + Oid *typioparams = cstate->typioparams; + int16 fld_count; + ListCell *cur; + + tupDesc = RelationGetDescr(cstate->rel); + attr_count = list_length(cstate->attnumlist); + + cstate->cur_lineno++; + + if (!CopyGetInt16(cstate, &fld_count)) + { + /* EOF detected (end of file, or protocol-level EOF) */ + return false; + } + + if (fld_count == -1) + { + /* + * Received EOF marker. Wait for the protocol-level EOF, and complain + * if it doesn't come immediately. In COPY FROM STDIN, this ensures + * that we correctly handle CopyFail, if client chooses to send that + * now. When copying from file, we could ignore the rest of the file + * like in text mode, but we choose to be consistent with the COPY + * FROM STDIN case. + */ + char dummy; + + if (CopyReadBinaryData(cstate, &dummy, 1) > 0) + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("received copy data after EOF marker"))); + return false; + } + + if (fld_count != attr_count) + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("row field count is %d, expected %d", + (int) fld_count, attr_count))); + + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + int m = attnum - 1; + Form_pg_attribute att = TupleDescAttr(tupDesc, m); + + cstate->cur_attname = NameStr(att->attname); + values[m] = CopyReadBinaryAttribute(cstate, + &in_functions[m], + typioparams[m], + att->atttypmod, + &nulls[m]); + cstate->cur_attname = NULL; + } + + return true; +} + /* * Read next tuple from file for COPY FROM. Return false if no more tuples. * @@ -847,216 +1095,22 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext, { TupleDesc tupDesc; AttrNumber num_phys_attrs, - attr_count, num_defaults = cstate->num_defaults; - FmgrInfo *in_functions = cstate->in_functions; - Oid *typioparams = cstate->typioparams; int i; int *defmap = cstate->defmap; ExprState **defexprs = cstate->defexprs; tupDesc = RelationGetDescr(cstate->rel); num_phys_attrs = tupDesc->natts; - attr_count = list_length(cstate->attnumlist); /* Initialize all values for row to NULL */ MemSet(values, 0, num_phys_attrs * sizeof(Datum)); MemSet(nulls, true, num_phys_attrs * sizeof(bool)); MemSet(cstate->defaults, false, num_phys_attrs * sizeof(bool)); - if (!cstate->opts.binary) - { - char **field_strings; - ListCell *cur; - int fldct; - int fieldno; - char *string; - - /* read raw fields in the next line */ - if (!NextCopyFromRawFields(cstate, &field_strings, &fldct)) - return false; - - /* check for overflowing fields */ - if (attr_count > 0 && fldct > attr_count) - ereport(ERROR, - (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - errmsg("extra data after last expected column"))); - - fieldno = 0; - - /* Loop to read the user attributes on the line. */ - foreach(cur, cstate->attnumlist) - { - int attnum = lfirst_int(cur); - int m = attnum - 1; - Form_pg_attribute att = TupleDescAttr(tupDesc, m); - - if (fieldno >= fldct) - ereport(ERROR, - (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - errmsg("missing data for column \"%s\"", - NameStr(att->attname)))); - string = field_strings[fieldno++]; - - if (cstate->convert_select_flags && - !cstate->convert_select_flags[m]) - { - /* ignore input field, leaving column as NULL */ - continue; - } - - if (cstate->opts.csv_mode) - { - if (string == NULL && - cstate->opts.force_notnull_flags[m]) - { - /* - * FORCE_NOT_NULL option is set and column is NULL - - * convert it to the NULL string. - */ - string = cstate->opts.null_print; - } - else if (string != NULL && cstate->opts.force_null_flags[m] - && strcmp(string, cstate->opts.null_print) == 0) - { - /* - * FORCE_NULL option is set and column matches the NULL - * string. It must have been quoted, or otherwise the - * string would already have been set to NULL. Convert it - * to NULL as specified. - */ - string = NULL; - } - } - - cstate->cur_attname = NameStr(att->attname); - cstate->cur_attval = string; - - if (string != NULL) - nulls[m] = false; - - if (cstate->defaults[m]) - { - /* - * The caller must supply econtext and have switched into the - * per-tuple memory context in it. - */ - Assert(econtext != NULL); - Assert(CurrentMemoryContext == econtext->ecxt_per_tuple_memory); - - values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]); - } - - /* - * If ON_ERROR is specified with IGNORE, skip rows with soft - * errors - */ - else if (!InputFunctionCallSafe(&in_functions[m], - string, - typioparams[m], - att->atttypmod, - (Node *) cstate->escontext, - &values[m])) - { - Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP); - - cstate->num_errors++; - - if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE) - { - /* - * Since we emit line number and column info in the below - * notice message, we suppress error context information - * other than the relation name. - */ - Assert(!cstate->relname_only); - cstate->relname_only = true; - - if (cstate->cur_attval) - { - char *attval; - - attval = CopyLimitPrintoutLength(cstate->cur_attval); - ereport(NOTICE, - errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": \"%s\"", - (unsigned long long) cstate->cur_lineno, - cstate->cur_attname, - attval)); - pfree(attval); - } - else - ereport(NOTICE, - errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": nullinput", - (unsigned long long) cstate->cur_lineno, - cstate->cur_attname)); - - /* reset relname_only */ - cstate->relname_only = false; - } - - return true; - } - - cstate->cur_attname = NULL; - cstate->cur_attval = NULL; - } - - Assert(fieldno == attr_count); - } - else - { - /* binary */ - int16 fld_count; - ListCell *cur; - - cstate->cur_lineno++; - - if (!CopyGetInt16(cstate, &fld_count)) - { - /* EOF detected (end of file, or protocol-level EOF) */ - return false; - } - - if (fld_count == -1) - { - /* - * Received EOF marker. Wait for the protocol-level EOF, and - * complain if it doesn't come immediately. In COPY FROM STDIN, - * this ensures that we correctly handle CopyFail, if client - * chooses to send that now. When copying from file, we could - * ignore the rest of the file like in text mode, but we choose to - * be consistent with the COPY FROM STDIN case. - */ - char dummy; - - if (CopyReadBinaryData(cstate, &dummy, 1) > 0) - ereport(ERROR, - (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - errmsg("received copy data after EOF marker"))); - return false; - } - - if (fld_count != attr_count) - ereport(ERROR, - (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - errmsg("row field count is %d, expected %d", - (int) fld_count, attr_count))); - - foreach(cur, cstate->attnumlist) - { - int attnum = lfirst_int(cur); - int m = attnum - 1; - Form_pg_attribute att = TupleDescAttr(tupDesc, m); - - cstate->cur_attname = NameStr(att->attname); - values[m] = CopyReadBinaryAttribute(cstate, - &in_functions[m], - typioparams[m], - att->atttypmod, - &nulls[m]); - cstate->cur_attname = NULL; - } - } + /* Get one row from source */ + if (!cstate->routine->CopyFromOneRow(cstate, econtext, values, nulls)) + return false; /* * Now compute and insert any defaults available for the columns not @@ -1087,7 +1141,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext, * in the final value of line_buf. */ static bool -CopyReadLine(CopyFromState cstate) +CopyReadLine(CopyFromState cstate, bool is_csv) { bool result; @@ -1095,7 +1149,7 @@ CopyReadLine(CopyFromState cstate) cstate->line_buf_valid = false; /* Parse data and transfer into line_buf */ - result = CopyReadLineText(cstate); + result = CopyReadLineText(cstate, is_csv); if (result) { @@ -1163,7 +1217,7 @@ CopyReadLine(CopyFromState cstate) * CopyReadLineText - inner loop of CopyReadLine for text mode */ static bool -CopyReadLineText(CopyFromState cstate) +CopyReadLineText(CopyFromState cstate, bool is_csv) { char *copy_input_buf; int input_buf_ptr; @@ -1178,7 +1232,11 @@ CopyReadLineText(CopyFromState cstate) char quotec = '\0'; char escapec = '\0'; - if (cstate->opts.csv_mode) + /* + * is_csv will be optimized away by compiler, as argument is constant at + * caller. + */ + if (is_csv) { quotec = cstate->opts.quote[0]; escapec = cstate->opts.escape[0]; @@ -1255,7 +1313,11 @@ CopyReadLineText(CopyFromState cstate) prev_raw_ptr = input_buf_ptr; c = copy_input_buf[input_buf_ptr++]; - if (cstate->opts.csv_mode) + /* + * is_csv will be optimized away by compiler, as argument is constant + * at caller. + */ + if (is_csv) { /* * If character is '\r', we may need to look ahead below. Force @@ -1294,7 +1356,7 @@ CopyReadLineText(CopyFromState cstate) } /* Process \r */ - if (c == '\r' && (!cstate->opts.csv_mode || !in_quote)) + if (c == '\r' && (!is_csv || !in_quote)) { /* Check for \r\n on first line, _and_ handle \r\n. */ if (cstate->eol_type == EOL_UNKNOWN || @@ -1322,10 +1384,10 @@ CopyReadLineText(CopyFromState cstate) if (cstate->eol_type == EOL_CRNL) ereport(ERROR, (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - !cstate->opts.csv_mode ? + !is_csv ? errmsg("literal carriage return found in data") : errmsg("unquoted carriage return found in data"), - !cstate->opts.csv_mode ? + !is_csv ? errhint("Use \"\\r\" to represent carriage return.") : errhint("Use quoted CSV field to represent carriage return."))); @@ -1339,10 +1401,10 @@ CopyReadLineText(CopyFromState cstate) else if (cstate->eol_type == EOL_NL) ereport(ERROR, (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - !cstate->opts.csv_mode ? + !is_csv ? errmsg("literal carriage return found in data") : errmsg("unquoted carriage return found in data"), - !cstate->opts.csv_mode ? + !is_csv ? errhint("Use \"\\r\" to represent carriage return.") : errhint("Use quoted CSV field to represent carriage return."))); /* If reach here, we have found the line terminator */ @@ -1350,15 +1412,15 @@ CopyReadLineText(CopyFromState cstate) } /* Process \n */ - if (c == '\n' && (!cstate->opts.csv_mode || !in_quote)) + if (c == '\n' && (!is_csv || !in_quote)) { if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL) ereport(ERROR, (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - !cstate->opts.csv_mode ? + !is_csv ? errmsg("literal newline found in data") : errmsg("unquoted newline found in data"), - !cstate->opts.csv_mode ? + !is_csv ? errhint("Use \"\\n\" to represent newline.") : errhint("Use quoted CSV field to represent newline."))); cstate->eol_type = EOL_NL; /* in case not set yet */ @@ -1370,7 +1432,7 @@ CopyReadLineText(CopyFromState cstate) * Process backslash, except in CSV mode where backslash is a normal * character. */ - if (c == '\\' && !cstate->opts.csv_mode) + if (c == '\\' && !is_csv) { char c2; diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h index 06dfdfef721..7bc044e2816 100644 --- a/src/include/commands/copy.h +++ b/src/include/commands/copy.h @@ -107,8 +107,6 @@ extern CopyFromState BeginCopyFrom(ParseState *pstate, Relation rel, Node *where extern void EndCopyFrom(CopyFromState cstate); extern bool NextCopyFrom(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls); -extern bool NextCopyFromRawFields(CopyFromState cstate, - char ***fields, int *nfields); extern void CopyFromErrorCallback(void *arg); extern char *CopyLimitPrintoutLength(const char *str); diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index eccc875d0e8..19aacc8ddd3 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -1,7 +1,7 @@ /*------------------------------------------------------------------------- * * copyapi.h - * API for COPY TO handlers + * API for COPY TO/FROM handlers * * * Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group @@ -54,4 +54,50 @@ typedef struct CopyToRoutine void (*CopyToEnd) (CopyToState cstate); } CopyToRoutine; +/* + * API structure for a COPY FROM format implementation. Note this must be + * allocated in a server-lifetime manner, typically as a static const struct. + */ +typedef struct CopyFromRoutine +{ + /* + * Set input function information. This callback is called once at the + * beginning of COPY FROM. + * + * 'finfo' can be optionally filled to provide the catalog information of + * the input function. + * + * 'typioparam' can be optionally filled to define the OID of the type to + * pass to the input function.'atttypid' is the OID of data type used by + * the relation's attribute. + */ + void (*CopyFromInFunc) (CopyFromState cstate, Oid atttypid, + FmgrInfo *finfo, Oid *typioparam); + + /* + * Start a COPY FROM. This callback is called once at the beginning of + * COPY FROM. + * + * 'tupDesc' is the tuple descriptor of the relation where the data needs + * to be copied. This can be used for any initialization steps required + * by a format. + */ + void (*CopyFromStart) (CopyFromState cstate, TupleDesc tupDesc); + + /* + * Read one row from the source and fill *values and *nulls. + * + * 'econtext' is used to evaluate default expression for each column that + * is either not read from the file or is using the DEFAULT option of COPY + * FROM. It is NULL if no default values are used. + * + * Returns false if there are no more tuples to read. + */ + bool (*CopyFromOneRow) (CopyFromState cstate, ExprContext *econtext, + Datum *values, bool *nulls); + + /* End a COPY FROM. This callback is called once at the end of COPY FROM */ + void (*CopyFromEnd) (CopyFromState cstate); +} CopyFromRoutine; + #endif /* COPYAPI_H */ diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h index 1d8ac8f62e6..e1affe3dfa7 100644 --- a/src/include/commands/copyfrom_internal.h +++ b/src/include/commands/copyfrom_internal.h @@ -14,7 +14,7 @@ #ifndef COPYFROM_INTERNAL_H #define COPYFROM_INTERNAL_H -#include "commands/copy.h" +#include "commands/copyapi.h" #include "commands/trigger.h" #include "nodes/miscnodes.h" @@ -58,6 +58,9 @@ typedef enum CopyInsertMethod */ typedef struct CopyFromStateData { + /* format routine */ + const CopyFromRoutine *routine; + /* low-level state data */ CopySource copy_src; /* type of copy source */ FILE *copy_file; /* used if copy_src == COPY_FILE */ @@ -183,4 +186,12 @@ typedef struct CopyFromStateData extern void ReceiveCopyBegin(CopyFromState cstate); extern void ReceiveCopyBinaryHeader(CopyFromState cstate); +/* One-row callbacks for built-in formats defined in copyfromparse.c */ +extern bool CopyFromTextOneRow(CopyFromState cstate, ExprContext *econtext, + Datum *values, bool *nulls); +extern bool CopyFromCSVOneRow(CopyFromState cstate, ExprContext *econtext, + Datum *values, bool *nulls); +extern bool CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext, + Datum *values, bool *nulls); + #endif /* COPYFROM_INTERNAL_H */ diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list index 18b2595300e..25a96d65e91 100644 --- a/src/tools/pgindent/typedefs.list +++ b/src/tools/pgindent/typedefs.list @@ -497,6 +497,7 @@ ConvertRowtypeExpr CookedConstraint CopyDest CopyFormatOptions +CopyFromRoutine CopyFromState CopyFromStateData CopyHeaderChoice -- 2.47.1 From 8e757ddfc36be227f76f6739632b41cb227ad374 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Mon, 25 Nov 2024 12:19:15 +0900 Subject: [PATCH v28 3/9] Add support for adding custom COPY TO format This uses the handler approach like tablesample. The approach creates an internal function that returns an internal struct. In this case, a COPY TO handler returns a CopyToRoutine. This also add a test module for custom COPY TO handler. --- src/backend/commands/copy.c | 82 ++++++++++++++++--- src/backend/commands/copyto.c | 4 +- src/backend/nodes/Makefile | 1 + src/backend/nodes/gen_node_support.pl | 2 + src/backend/utils/adt/pseudotypes.c | 1 + src/include/catalog/pg_proc.dat | 6 ++ src/include/catalog/pg_type.dat | 6 ++ src/include/commands/copy.h | 1 + src/include/commands/copyapi.h | 2 + src/include/nodes/meson.build | 1 + src/test/modules/Makefile | 1 + src/test/modules/meson.build | 1 + src/test/modules/test_copy_format/.gitignore | 4 + src/test/modules/test_copy_format/Makefile | 23 ++++++ .../expected/test_copy_format.out | 17 ++++ src/test/modules/test_copy_format/meson.build | 33 ++++++++ .../test_copy_format/sql/test_copy_format.sql | 5 ++ .../test_copy_format--1.0.sql | 8 ++ .../test_copy_format/test_copy_format.c | 63 ++++++++++++++ .../test_copy_format/test_copy_format.control | 4 + 20 files changed, 251 insertions(+), 14 deletions(-) mode change 100644 => 100755 src/backend/nodes/gen_node_support.pl create mode 100644 src/test/modules/test_copy_format/.gitignore create mode 100644 src/test/modules/test_copy_format/Makefile create mode 100644 src/test/modules/test_copy_format/expected/test_copy_format.out create mode 100644 src/test/modules/test_copy_format/meson.build create mode 100644 src/test/modules/test_copy_format/sql/test_copy_format.sql create mode 100644 src/test/modules/test_copy_format/test_copy_format--1.0.sql create mode 100644 src/test/modules/test_copy_format/test_copy_format.c create mode 100644 src/test/modules/test_copy_format/test_copy_format.control diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index cfca9d9dc29..332dfdc9a53 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -32,6 +32,7 @@ #include "parser/parse_coerce.h" #include "parser/parse_collate.h" #include "parser/parse_expr.h" +#include "parser/parse_func.h" #include "parser/parse_relation.h" #include "utils/acl.h" #include "utils/builtins.h" @@ -476,6 +477,73 @@ defGetCopyLogVerbosityChoice(DefElem *def, ParseState *pstate) return COPY_LOG_VERBOSITY_DEFAULT; /* keep compiler quiet */ } +/* + * Process the "format" option. + * + * This function checks whether the option value is a built-in format such as + * "text" and "csv" or not. If the option value isn't a built-in format, this + * function finds a COPY format handler that returns a CopyToRoutine (for + * is_from == false). If no COPY format handler is found, this function + * reports an error. + */ +static void +ProcessCopyOptionFormat(ParseState *pstate, + CopyFormatOptions *opts_out, + bool is_from, + DefElem *defel) +{ + char *format; + Oid funcargtypes[1]; + Oid handlerOid = InvalidOid; + Datum datum; + Node *routine; + + format = defGetString(defel); + + /* built-in formats */ + if (strcmp(format, "text") == 0) + /* default format */ return; + else if (strcmp(format, "csv") == 0) + { + opts_out->csv_mode = true; + return; + } + else if (strcmp(format, "binary") == 0) + { + opts_out->binary = true; + return; + } + + /* custom format */ + if (!is_from) + { + funcargtypes[0] = INTERNALOID; + handlerOid = LookupFuncName(list_make1(makeString(format)), 1, + funcargtypes, true); + } + if (!OidIsValid(handlerOid)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("COPY format \"%s\" not recognized", format), + parser_errposition(pstate, defel->location))); + + datum = OidFunctionCall1(handlerOid, BoolGetDatum(is_from)); + routine = (Node *) DatumGetPointer(datum); + if (routine == NULL || !IsA(routine, CopyToRoutine)) + ereport( + ERROR, + (errcode( + ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("COPY handler function " + "%s(%u) did not return a " + "CopyToRoutine struct", + format, handlerOid), + parser_errposition( + pstate, defel->location))); + + opts_out->routine = routine; +} + /* * Process the statement option list for COPY. * @@ -519,22 +587,10 @@ ProcessCopyOptions(ParseState *pstate, if (strcmp(defel->defname, "format") == 0) { - char *fmt = defGetString(defel); - if (format_specified) errorConflictingDefElem(defel, pstate); format_specified = true; - if (strcmp(fmt, "text") == 0) - /* default format */ ; - else if (strcmp(fmt, "csv") == 0) - opts_out->csv_mode = true; - else if (strcmp(fmt, "binary") == 0) - opts_out->binary = true; - else - ereport(ERROR, - (errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("COPY format \"%s\" not recognized", fmt), - parser_errposition(pstate, defel->location))); + ProcessCopyOptionFormat(pstate, opts_out, is_from, defel); } else if (strcmp(defel->defname, "freeze") == 0) { diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c index a885779666b..58ffded4370 100644 --- a/src/backend/commands/copyto.c +++ b/src/backend/commands/copyto.c @@ -176,7 +176,9 @@ static const CopyToRoutine CopyToRoutineBinary = { static const CopyToRoutine * CopyToGetRoutine(CopyFormatOptions opts) { - if (opts.csv_mode) + if (opts.routine) + return (const CopyToRoutine *) opts.routine; + else if (opts.csv_mode) return &CopyToRoutineCSV; else if (opts.binary) return &CopyToRoutineBinary; diff --git a/src/backend/nodes/Makefile b/src/backend/nodes/Makefile index 66bbad8e6e0..173ee11811c 100644 --- a/src/backend/nodes/Makefile +++ b/src/backend/nodes/Makefile @@ -49,6 +49,7 @@ node_headers = \ access/sdir.h \ access/tableam.h \ access/tsmapi.h \ + commands/copyapi.h \ commands/event_trigger.h \ commands/trigger.h \ executor/tuptable.h \ diff --git a/src/backend/nodes/gen_node_support.pl b/src/backend/nodes/gen_node_support.pl old mode 100644 new mode 100755 index 7c012c27f88..5d53d32c4a7 --- a/src/backend/nodes/gen_node_support.pl +++ b/src/backend/nodes/gen_node_support.pl @@ -61,6 +61,7 @@ my @all_input_files = qw( access/sdir.h access/tableam.h access/tsmapi.h + commands/copyapi.h commands/event_trigger.h commands/trigger.h executor/tuptable.h @@ -85,6 +86,7 @@ my @nodetag_only_files = qw( access/sdir.h access/tableam.h access/tsmapi.h + commands/copyapi.h commands/event_trigger.h commands/trigger.h executor/tuptable.h diff --git a/src/backend/utils/adt/pseudotypes.c b/src/backend/utils/adt/pseudotypes.c index 317a1f2b282..f2ebc21ca56 100644 --- a/src/backend/utils/adt/pseudotypes.c +++ b/src/backend/utils/adt/pseudotypes.c @@ -370,6 +370,7 @@ PSEUDOTYPE_DUMMY_IO_FUNCS(fdw_handler); PSEUDOTYPE_DUMMY_IO_FUNCS(table_am_handler); PSEUDOTYPE_DUMMY_IO_FUNCS(index_am_handler); PSEUDOTYPE_DUMMY_IO_FUNCS(tsm_handler); +PSEUDOTYPE_DUMMY_IO_FUNCS(copy_handler); PSEUDOTYPE_DUMMY_IO_FUNCS(internal); PSEUDOTYPE_DUMMY_IO_FUNCS(anyelement); PSEUDOTYPE_DUMMY_IO_FUNCS(anynonarray); diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index 18560755d26..74884eb9d34 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -7784,6 +7784,12 @@ { oid => '3312', descr => 'I/O', proname => 'tsm_handler_out', prorettype => 'cstring', proargtypes => 'tsm_handler', prosrc => 'tsm_handler_out' }, +{ oid => '8753', descr => 'I/O', + proname => 'copy_handler_in', proisstrict => 'f', prorettype => 'copy_handler', + proargtypes => 'cstring', prosrc => 'copy_handler_in' }, +{ oid => '8754', descr => 'I/O', + proname => 'copy_handler_out', prorettype => 'cstring', + proargtypes => 'copy_handler', prosrc => 'copy_handler_out' }, { oid => '267', descr => 'I/O', proname => 'table_am_handler_in', proisstrict => 'f', prorettype => 'table_am_handler', proargtypes => 'cstring', diff --git a/src/include/catalog/pg_type.dat b/src/include/catalog/pg_type.dat index 6dca77e0a22..340e0cd0a8d 100644 --- a/src/include/catalog/pg_type.dat +++ b/src/include/catalog/pg_type.dat @@ -633,6 +633,12 @@ typcategory => 'P', typinput => 'tsm_handler_in', typoutput => 'tsm_handler_out', typreceive => '-', typsend => '-', typalign => 'i' }, +{ oid => '8752', + descr => 'pseudo-type for the result of a copy to method function', + typname => 'copy_handler', typlen => '4', typbyval => 't', typtype => 'p', + typcategory => 'P', typinput => 'copy_handler_in', + typoutput => 'copy_handler_out', typreceive => '-', typsend => '-', + typalign => 'i' }, { oid => '269', descr => 'pseudo-type for the result of a table AM handler function', typname => 'table_am_handler', typlen => '4', typbyval => 't', typtype => 'p', diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h index 7bc044e2816..b869431f086 100644 --- a/src/include/commands/copy.h +++ b/src/include/commands/copy.h @@ -87,6 +87,7 @@ typedef struct CopyFormatOptions CopyLogVerbosityChoice log_verbosity; /* verbosity of logged messages */ int64 reject_limit; /* maximum tolerable number of errors */ List *convert_select; /* list of column names (can be NIL) */ + Node *routine; /* CopyToRoutine (can be NULL) */ } CopyFormatOptions; /* These are private in commands/copy[from|to].c */ diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index 19aacc8ddd3..36057b92417 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -24,6 +24,8 @@ */ typedef struct CopyToRoutine { + NodeTag type; + /* * Set output function information. This callback is called once at the * beginning of COPY TO. diff --git a/src/include/nodes/meson.build b/src/include/nodes/meson.build index f3dd5461fef..09f7443195f 100644 --- a/src/include/nodes/meson.build +++ b/src/include/nodes/meson.build @@ -11,6 +11,7 @@ node_support_input_i = [ 'access/sdir.h', 'access/tableam.h', 'access/tsmapi.h', + 'commands/copyapi.h', 'commands/event_trigger.h', 'commands/trigger.h', 'executor/tuptable.h', diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile index c0d3cf0e14b..33e3a49a4fb 100644 --- a/src/test/modules/Makefile +++ b/src/test/modules/Makefile @@ -15,6 +15,7 @@ SUBDIRS = \ spgist_name_ops \ test_bloomfilter \ test_copy_callbacks \ + test_copy_format \ test_custom_rmgrs \ test_ddl_deparse \ test_dsa \ diff --git a/src/test/modules/meson.build b/src/test/modules/meson.build index 4f544a042d4..bf25658793d 100644 --- a/src/test/modules/meson.build +++ b/src/test/modules/meson.build @@ -14,6 +14,7 @@ subdir('spgist_name_ops') subdir('ssl_passphrase_callback') subdir('test_bloomfilter') subdir('test_copy_callbacks') +subdir('test_copy_format') subdir('test_custom_rmgrs') subdir('test_ddl_deparse') subdir('test_dsa') diff --git a/src/test/modules/test_copy_format/.gitignore b/src/test/modules/test_copy_format/.gitignore new file mode 100644 index 00000000000..5dcb3ff9723 --- /dev/null +++ b/src/test/modules/test_copy_format/.gitignore @@ -0,0 +1,4 @@ +# Generated subdirectories +/log/ +/results/ +/tmp_check/ diff --git a/src/test/modules/test_copy_format/Makefile b/src/test/modules/test_copy_format/Makefile new file mode 100644 index 00000000000..8497f91624d --- /dev/null +++ b/src/test/modules/test_copy_format/Makefile @@ -0,0 +1,23 @@ +# src/test/modules/test_copy_format/Makefile + +MODULE_big = test_copy_format +OBJS = \ + $(WIN32RES) \ + test_copy_format.o +PGFILEDESC = "test_copy_format - test custom COPY FORMAT" + +EXTENSION = test_copy_format +DATA = test_copy_format--1.0.sql + +REGRESS = test_copy_format + +ifdef USE_PGXS +PG_CONFIG = pg_config +PGXS := $(shell $(PG_CONFIG) --pgxs) +include $(PGXS) +else +subdir = src/test/modules/test_copy_format +top_builddir = ../../../.. +include $(top_builddir)/src/Makefile.global +include $(top_srcdir)/contrib/contrib-global.mk +endif diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out b/src/test/modules/test_copy_format/expected/test_copy_format.out new file mode 100644 index 00000000000..adfe7d1572a --- /dev/null +++ b/src/test/modules/test_copy_format/expected/test_copy_format.out @@ -0,0 +1,17 @@ +CREATE EXTENSION test_copy_format; +CREATE TABLE public.test (a smallint, b integer, c bigint); +INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789); +COPY public.test FROM stdin WITH (FORMAT 'test_copy_format'); +ERROR: COPY format "test_copy_format" not recognized +LINE 1: COPY public.test FROM stdin WITH (FORMAT 'test_copy_format')... + ^ +COPY public.test TO stdout WITH (FORMAT 'test_copy_format'); +NOTICE: test_copy_format: is_from=false +NOTICE: CopyToOutFunc: atttypid=21 +NOTICE: CopyToOutFunc: atttypid=23 +NOTICE: CopyToOutFunc: atttypid=20 +NOTICE: CopyToStart: natts=3 +NOTICE: CopyToOneRow: tts_nvalid=3 +NOTICE: CopyToOneRow: tts_nvalid=3 +NOTICE: CopyToOneRow: tts_nvalid=3 +NOTICE: CopyToEnd diff --git a/src/test/modules/test_copy_format/meson.build b/src/test/modules/test_copy_format/meson.build new file mode 100644 index 00000000000..4cefe7b709a --- /dev/null +++ b/src/test/modules/test_copy_format/meson.build @@ -0,0 +1,33 @@ +# Copyright (c) 2024, PostgreSQL Global Development Group + +test_copy_format_sources = files( + 'test_copy_format.c', +) + +if host_system == 'windows' + test_copy_format_sources += rc_lib_gen.process(win32ver_rc, extra_args: [ + '--NAME', 'test_copy_format', + '--FILEDESC', 'test_copy_format - test custom COPY FORMAT',]) +endif + +test_copy_format = shared_module('test_copy_format', + test_copy_format_sources, + kwargs: pg_test_mod_args, +) +test_install_libs += test_copy_format + +test_install_data += files( + 'test_copy_format.control', + 'test_copy_format--1.0.sql', +) + +tests += { + 'name': 'test_copy_format', + 'sd': meson.current_source_dir(), + 'bd': meson.current_build_dir(), + 'regress': { + 'sql': [ + 'test_copy_format', + ], + }, +} diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql b/src/test/modules/test_copy_format/sql/test_copy_format.sql new file mode 100644 index 00000000000..810b3d8cedc --- /dev/null +++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql @@ -0,0 +1,5 @@ +CREATE EXTENSION test_copy_format; +CREATE TABLE public.test (a smallint, b integer, c bigint); +INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789); +COPY public.test FROM stdin WITH (FORMAT 'test_copy_format'); +COPY public.test TO stdout WITH (FORMAT 'test_copy_format'); diff --git a/src/test/modules/test_copy_format/test_copy_format--1.0.sql b/src/test/modules/test_copy_format/test_copy_format--1.0.sql new file mode 100644 index 00000000000..d24ea03ce99 --- /dev/null +++ b/src/test/modules/test_copy_format/test_copy_format--1.0.sql @@ -0,0 +1,8 @@ +/* src/test/modules/test_copy_format/test_copy_format--1.0.sql */ + +-- complain if script is sourced in psql, rather than via CREATE EXTENSION +\echo Use "CREATE EXTENSION test_copy_format" to load this file. \quit + +CREATE FUNCTION test_copy_format(internal) + RETURNS copy_handler + AS 'MODULE_PATHNAME' LANGUAGE C; diff --git a/src/test/modules/test_copy_format/test_copy_format.c b/src/test/modules/test_copy_format/test_copy_format.c new file mode 100644 index 00000000000..e064f40473b --- /dev/null +++ b/src/test/modules/test_copy_format/test_copy_format.c @@ -0,0 +1,63 @@ +/*-------------------------------------------------------------------------- + * + * test_copy_format.c + * Code for testing custom COPY format. + * + * Portions Copyright (c) 2024, PostgreSQL Global Development Group + * + * IDENTIFICATION + * src/test/modules/test_copy_format/test_copy_format.c + * + * ------------------------------------------------------------------------- + */ + +#include "postgres.h" + +#include "commands/copyapi.h" +#include "commands/defrem.h" + +PG_MODULE_MAGIC; + +static void +CopyToOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo) +{ + ereport(NOTICE, (errmsg("CopyToOutFunc: atttypid=%d", atttypid))); +} + +static void +CopyToStart(CopyToState cstate, TupleDesc tupDesc) +{ + ereport(NOTICE, (errmsg("CopyToStart: natts=%d", tupDesc->natts))); +} + +static void +CopyToOneRow(CopyToState cstate, TupleTableSlot *slot) +{ + ereport(NOTICE, (errmsg("CopyToOneRow: tts_nvalid=%u", slot->tts_nvalid))); +} + +static void +CopyToEnd(CopyToState cstate) +{ + ereport(NOTICE, (errmsg("CopyToEnd"))); +} + +static const CopyToRoutine CopyToRoutineTestCopyFormat = { + .type = T_CopyToRoutine, + .CopyToOutFunc = CopyToOutFunc, + .CopyToStart = CopyToStart, + .CopyToOneRow = CopyToOneRow, + .CopyToEnd = CopyToEnd, +}; + +PG_FUNCTION_INFO_V1(test_copy_format); +Datum +test_copy_format(PG_FUNCTION_ARGS) +{ + bool is_from = PG_GETARG_BOOL(0); + + ereport(NOTICE, + (errmsg("test_copy_format: is_from=%s", is_from ? "true" : "false"))); + + PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat); +} diff --git a/src/test/modules/test_copy_format/test_copy_format.control b/src/test/modules/test_copy_format/test_copy_format.control new file mode 100644 index 00000000000..f05a6362358 --- /dev/null +++ b/src/test/modules/test_copy_format/test_copy_format.control @@ -0,0 +1,4 @@ +comment = 'Test code for custom COPY format' +default_version = '1.0' +module_pathname = '$libdir/test_copy_format' +relocatable = true -- 2.47.1 From ed93a031732deabd7fb923b11f74b42f393d9172 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Mon, 25 Nov 2024 13:58:33 +0900 Subject: [PATCH v28 4/9] Export CopyToStateData It's for custom COPY TO format handlers implemented as extension. This just moves codes. This doesn't change codes except CopyDest enum values. CopyDest/CopyFrom enum values such as COPY_FILE are conflicted each other. So COPY_DEST_ prefix instead of COPY_ prefix is used for CopyDest enum values. For example, COPY_FILE in CopyDest is renamed to COPY_DEST_FILE. Note that this isn't enough to implement custom COPY TO format handlers as extension. We'll do the followings in a subsequent commit: 1. Add an opaque space for custom COPY TO format handler 2. Export CopySendEndOfRow() to flush buffer --- src/backend/commands/copyto.c | 77 ++++------------------------------ src/include/commands/copy.h | 2 +- src/include/commands/copyapi.h | 62 +++++++++++++++++++++++++++ 3 files changed, 71 insertions(+), 70 deletions(-) diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c index 58ffded4370..1e75e07dc0b 100644 --- a/src/backend/commands/copyto.c +++ b/src/backend/commands/copyto.c @@ -36,67 +36,6 @@ #include "utils/rel.h" #include "utils/snapmgr.h" -/* - * Represents the different dest cases we need to worry about at - * the bottom level - */ -typedef enum CopyDest -{ - COPY_FILE, /* to file (or a piped program) */ - COPY_FRONTEND, /* to frontend */ - COPY_CALLBACK, /* to callback function */ -} CopyDest; - -/* - * This struct contains all the state variables used throughout a COPY TO - * operation. - * - * Multi-byte encodings: all supported client-side encodings encode multi-byte - * characters by having the first byte's high bit set. Subsequent bytes of the - * character can have the high bit not set. When scanning data in such an - * encoding to look for a match to a single-byte (ie ASCII) character, we must - * use the full pg_encoding_mblen() machinery to skip over multibyte - * characters, else we might find a false match to a trailing byte. In - * supported server encodings, there is no possibility of a false match, and - * it's faster to make useless comparisons to trailing bytes than it is to - * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true - * when we have to do it the hard way. - */ -typedef struct CopyToStateData -{ - /* format-specific routines */ - const CopyToRoutine *routine; - - /* low-level state data */ - CopyDest copy_dest; /* type of copy source/destination */ - FILE *copy_file; /* used if copy_dest == COPY_FILE */ - StringInfo fe_msgbuf; /* used for all dests during COPY TO */ - - int file_encoding; /* file or remote side's character encoding */ - bool need_transcoding; /* file encoding diff from server? */ - bool encoding_embeds_ascii; /* ASCII can be non-first byte? */ - - /* parameters from the COPY command */ - Relation rel; /* relation to copy to */ - QueryDesc *queryDesc; /* executable query to copy from */ - List *attnumlist; /* integer list of attnums to copy */ - char *filename; /* filename, or NULL for STDOUT */ - bool is_program; /* is 'filename' a program to popen? */ - copy_data_dest_cb data_dest_cb; /* function for writing data */ - - CopyFormatOptions opts; - Node *whereClause; /* WHERE condition (or NULL) */ - - /* - * Working state - */ - MemoryContext copycontext; /* per-copy execution context */ - - FmgrInfo *out_functions; /* lookup info for output functions */ - MemoryContext rowcontext; /* per-row evaluation context */ - uint64 bytes_processed; /* number of bytes processed so far */ -} CopyToStateData; - /* DestReceiver for COPY (query) TO */ typedef struct { @@ -406,7 +345,7 @@ SendCopyBegin(CopyToState cstate) for (i = 0; i < natts; i++) pq_sendint16(&buf, format); /* per-column formats */ pq_endmessage(&buf); - cstate->copy_dest = COPY_FRONTEND; + cstate->copy_dest = COPY_DEST_FRONTEND; } static void @@ -453,7 +392,7 @@ CopySendEndOfRow(CopyToState cstate) switch (cstate->copy_dest) { - case COPY_FILE: + case COPY_DEST_FILE: if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1, cstate->copy_file) != 1 || ferror(cstate->copy_file)) @@ -487,11 +426,11 @@ CopySendEndOfRow(CopyToState cstate) errmsg("could not write to COPY file: %m"))); } break; - case COPY_FRONTEND: + case COPY_DEST_FRONTEND: /* Dump the accumulated row as one CopyData message */ (void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len); break; - case COPY_CALLBACK: + case COPY_DEST_CALLBACK: cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len); break; } @@ -512,7 +451,7 @@ CopySendTextLikeEndOfRow(CopyToState cstate) { switch (cstate->copy_dest) { - case COPY_FILE: + case COPY_DEST_FILE: /* Default line termination depends on platform */ #ifndef WIN32 CopySendChar(cstate, '\n'); @@ -520,7 +459,7 @@ CopySendTextLikeEndOfRow(CopyToState cstate) CopySendString(cstate, "\r\n"); #endif break; - case COPY_FRONTEND: + case COPY_DEST_FRONTEND: /* The FE/BE protocol uses \n as newline for all platforms */ CopySendChar(cstate, '\n'); break; @@ -904,12 +843,12 @@ BeginCopyTo(ParseState *pstate, /* See Multibyte encoding comment above */ cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding); - cstate->copy_dest = COPY_FILE; /* default */ + cstate->copy_dest = COPY_DEST_FILE; /* default */ if (data_dest_cb) { progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK; - cstate->copy_dest = COPY_CALLBACK; + cstate->copy_dest = COPY_DEST_CALLBACK; cstate->data_dest_cb = data_dest_cb; } else if (pipe) diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h index b869431f086..26b0f410918 100644 --- a/src/include/commands/copy.h +++ b/src/include/commands/copy.h @@ -90,7 +90,7 @@ typedef struct CopyFormatOptions Node *routine; /* CopyToRoutine (can be NULL) */ } CopyFormatOptions; -/* These are private in commands/copy[from|to].c */ +/* This is private in commands/copyfrom.c */ typedef struct CopyFromStateData *CopyFromState; typedef struct CopyToStateData *CopyToState; diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index 36057b92417..1cb2815deab 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -15,6 +15,7 @@ #define COPYAPI_H #include "commands/copy.h" +#include "executor/execdesc.h" #include "executor/tuptable.h" #include "nodes/execnodes.h" @@ -56,6 +57,67 @@ typedef struct CopyToRoutine void (*CopyToEnd) (CopyToState cstate); } CopyToRoutine; +/* + * Represents the different dest cases we need to worry about at + * the bottom level + */ +typedef enum CopyDest +{ + COPY_DEST_FILE, /* to file (or a piped program) */ + COPY_DEST_FRONTEND, /* to frontend */ + COPY_DEST_CALLBACK, /* to callback function */ +} CopyDest; + +/* + * This struct contains all the state variables used throughout a COPY TO + * operation. + * + * Multi-byte encodings: all supported client-side encodings encode multi-byte + * characters by having the first byte's high bit set. Subsequent bytes of the + * character can have the high bit not set. When scanning data in such an + * encoding to look for a match to a single-byte (ie ASCII) character, we must + * use the full pg_encoding_mblen() machinery to skip over multibyte + * characters, else we might find a false match to a trailing byte. In + * supported server encodings, there is no possibility of a false match, and + * it's faster to make useless comparisons to trailing bytes than it is to + * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true + * when we have to do it the hard way. + */ +typedef struct CopyToStateData +{ + /* format-specific routines */ + const CopyToRoutine *routine; + + /* low-level state data */ + CopyDest copy_dest; /* type of copy source/destination */ + FILE *copy_file; /* used if copy_dest == COPY_FILE */ + StringInfo fe_msgbuf; /* used for all dests during COPY TO */ + + int file_encoding; /* file or remote side's character encoding */ + bool need_transcoding; /* file encoding diff from server? */ + bool encoding_embeds_ascii; /* ASCII can be non-first byte? */ + + /* parameters from the COPY command */ + Relation rel; /* relation to copy to */ + QueryDesc *queryDesc; /* executable query to copy from */ + List *attnumlist; /* integer list of attnums to copy */ + char *filename; /* filename, or NULL for STDOUT */ + bool is_program; /* is 'filename' a program to popen? */ + copy_data_dest_cb data_dest_cb; /* function for writing data */ + + CopyFormatOptions opts; + Node *whereClause; /* WHERE condition (or NULL) */ + + /* + * Working state + */ + MemoryContext copycontext; /* per-copy execution context */ + + FmgrInfo *out_functions; /* lookup info for output functions */ + MemoryContext rowcontext; /* per-row evaluation context */ + uint64 bytes_processed; /* number of bytes processed so far */ +} CopyToStateData; + /* * API structure for a COPY FROM format implementation. Note this must be * allocated in a server-lifetime manner, typically as a static const struct. -- 2.47.1 From 45eb64e68925fa895dc7efcbe38ba11f371e04b2 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Mon, 25 Nov 2024 14:01:18 +0900 Subject: [PATCH v28 5/9] Add support for implementing custom COPY TO format as extension * Add CopyToStateData::opaque that can be used to keep data for custom COPY TO format implementation * Export CopySendEndOfRow() to flush data in CopyToStateData::fe_msgbuf as CopyToStateFlush() --- src/backend/commands/copyto.c | 12 ++++++++++++ src/include/commands/copyapi.h | 5 +++++ 2 files changed, 17 insertions(+) diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c index 1e75e07dc0b..7487190bdd6 100644 --- a/src/backend/commands/copyto.c +++ b/src/backend/commands/copyto.c @@ -442,6 +442,18 @@ CopySendEndOfRow(CopyToState cstate) resetStringInfo(fe_msgbuf); } +/* + * Export CopySendEndOfRow() for extensions. We want to keep + * CopySendEndOfRow() as a static function for + * optimization. CopySendEndOfRow() calls in this file may be optimized by a + * compiler. + */ +void +CopyToStateFlush(CopyToState cstate) +{ + CopySendEndOfRow(cstate); +} + /* * Wrapper function of CopySendEndOfRow for text and CSV formats. Sends the * the line termination and do common appropriate things for the end of row. diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index 1cb2815deab..030a82aca7f 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -116,8 +116,13 @@ typedef struct CopyToStateData FmgrInfo *out_functions; /* lookup info for output functions */ MemoryContext rowcontext; /* per-row evaluation context */ uint64 bytes_processed; /* number of bytes processed so far */ + + /* For custom format implementation */ + void *opaque; /* private space */ } CopyToStateData; +extern void CopyToStateFlush(CopyToState cstate); + /* * API structure for a COPY FROM format implementation. Note this must be * allocated in a server-lifetime manner, typically as a static const struct. -- 2.47.1 From f0ac9317f3af79369f51ad793211aed606bf23de Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Mon, 25 Nov 2024 14:11:55 +0900 Subject: [PATCH v28 6/9] Add support for adding custom COPY FROM format This uses the same handler for COPY TO and COPY FROM but uses different routine. This uses CopyToRoutine for COPY TO and CopyFromRoutine for COPY FROM. PostgreSQL calls a COPY TO/FROM handler with "is_from" argument. It's true for COPY FROM and false for COPY TO: copy_handler(true) returns CopyToRoutine copy_handler(false) returns CopyFromRoutine This also add a test module for custom COPY FROM handler. --- src/backend/commands/copy.c | 52 ++++++++++++------- src/backend/commands/copyfrom.c | 4 +- src/include/catalog/pg_type.dat | 2 +- src/include/commands/copy.h | 3 +- src/include/commands/copyapi.h | 2 + .../expected/test_copy_format.out | 10 ++-- .../test_copy_format/sql/test_copy_format.sql | 1 + .../test_copy_format/test_copy_format.c | 39 +++++++++++++- 8 files changed, 87 insertions(+), 26 deletions(-) diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index 332dfdc9a53..a50f99ab60d 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -483,8 +483,8 @@ defGetCopyLogVerbosityChoice(DefElem *def, ParseState *pstate) * This function checks whether the option value is a built-in format such as * "text" and "csv" or not. If the option value isn't a built-in format, this * function finds a COPY format handler that returns a CopyToRoutine (for - * is_from == false). If no COPY format handler is found, this function - * reports an error. + * is_from == false) or CopyFromRountine (for is_from == true). If no COPY + * format handler is found, this function reports an error. */ static void ProcessCopyOptionFormat(ParseState *pstate, @@ -515,12 +515,9 @@ ProcessCopyOptionFormat(ParseState *pstate, } /* custom format */ - if (!is_from) - { - funcargtypes[0] = INTERNALOID; - handlerOid = LookupFuncName(list_make1(makeString(format)), 1, - funcargtypes, true); - } + funcargtypes[0] = INTERNALOID; + handlerOid = LookupFuncName(list_make1(makeString(format)), 1, + funcargtypes, true); if (!OidIsValid(handlerOid)) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), @@ -529,17 +526,34 @@ ProcessCopyOptionFormat(ParseState *pstate, datum = OidFunctionCall1(handlerOid, BoolGetDatum(is_from)); routine = (Node *) DatumGetPointer(datum); - if (routine == NULL || !IsA(routine, CopyToRoutine)) - ereport( - ERROR, - (errcode( - ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("COPY handler function " - "%s(%u) did not return a " - "CopyToRoutine struct", - format, handlerOid), - parser_errposition( - pstate, defel->location))); + if (is_from) + { + if (routine == NULL || !IsA(routine, CopyFromRoutine)) + ereport( + ERROR, + (errcode( + ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("COPY handler function " + "%s(%u) did not return a " + "CopyFromRoutine struct", + format, handlerOid), + parser_errposition( + pstate, defel->location))); + } + else + { + if (routine == NULL || !IsA(routine, CopyToRoutine)) + ereport( + ERROR, + (errcode( + ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("COPY handler function " + "%s(%u) did not return a " + "CopyToRoutine struct", + format, handlerOid), + parser_errposition( + pstate, defel->location))); + } opts_out->routine = routine; } diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index 2e88f19861d..1f502d746f9 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -155,7 +155,9 @@ static const CopyFromRoutine CopyFromRoutineBinary = { static const CopyFromRoutine * CopyFromGetRoutine(CopyFormatOptions opts) { - if (opts.csv_mode) + if (opts.routine) + return (const CopyFromRoutine *) opts.routine; + else if (opts.csv_mode) return &CopyFromRoutineCSV; else if (opts.binary) return &CopyFromRoutineBinary; diff --git a/src/include/catalog/pg_type.dat b/src/include/catalog/pg_type.dat index 340e0cd0a8d..63b7d65f982 100644 --- a/src/include/catalog/pg_type.dat +++ b/src/include/catalog/pg_type.dat @@ -634,7 +634,7 @@ typoutput => 'tsm_handler_out', typreceive => '-', typsend => '-', typalign => 'i' }, { oid => '8752', - descr => 'pseudo-type for the result of a copy to method function', + descr => 'pseudo-type for the result of a copy to/from method function', typname => 'copy_handler', typlen => '4', typbyval => 't', typtype => 'p', typcategory => 'P', typinput => 'copy_handler_in', typoutput => 'copy_handler_out', typreceive => '-', typsend => '-', diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h index 26b0f410918..f764a6ac829 100644 --- a/src/include/commands/copy.h +++ b/src/include/commands/copy.h @@ -87,7 +87,8 @@ typedef struct CopyFormatOptions CopyLogVerbosityChoice log_verbosity; /* verbosity of logged messages */ int64 reject_limit; /* maximum tolerable number of errors */ List *convert_select; /* list of column names (can be NIL) */ - Node *routine; /* CopyToRoutine (can be NULL) */ + Node *routine; /* CopyToRoutine or CopyFromRoutine (can be + * NULL) */ } CopyFormatOptions; /* This is private in commands/copyfrom.c */ diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index 030a82aca7f..fa3d8d87760 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -129,6 +129,8 @@ extern void CopyToStateFlush(CopyToState cstate); */ typedef struct CopyFromRoutine { + NodeTag type; + /* * Set input function information. This callback is called once at the * beginning of COPY FROM. diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out b/src/test/modules/test_copy_format/expected/test_copy_format.out index adfe7d1572a..016893e7026 100644 --- a/src/test/modules/test_copy_format/expected/test_copy_format.out +++ b/src/test/modules/test_copy_format/expected/test_copy_format.out @@ -2,9 +2,13 @@ CREATE EXTENSION test_copy_format; CREATE TABLE public.test (a smallint, b integer, c bigint); INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789); COPY public.test FROM stdin WITH (FORMAT 'test_copy_format'); -ERROR: COPY format "test_copy_format" not recognized -LINE 1: COPY public.test FROM stdin WITH (FORMAT 'test_copy_format')... - ^ +NOTICE: test_copy_format: is_from=true +NOTICE: CopyFromInFunc: atttypid=21 +NOTICE: CopyFromInFunc: atttypid=23 +NOTICE: CopyFromInFunc: atttypid=20 +NOTICE: CopyFromStart: natts=3 +NOTICE: CopyFromOneRow +NOTICE: CopyFromEnd COPY public.test TO stdout WITH (FORMAT 'test_copy_format'); NOTICE: test_copy_format: is_from=false NOTICE: CopyToOutFunc: atttypid=21 diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql b/src/test/modules/test_copy_format/sql/test_copy_format.sql index 810b3d8cedc..0dfdfa00080 100644 --- a/src/test/modules/test_copy_format/sql/test_copy_format.sql +++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql @@ -2,4 +2,5 @@ CREATE EXTENSION test_copy_format; CREATE TABLE public.test (a smallint, b integer, c bigint); INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789); COPY public.test FROM stdin WITH (FORMAT 'test_copy_format'); +\. COPY public.test TO stdout WITH (FORMAT 'test_copy_format'); diff --git a/src/test/modules/test_copy_format/test_copy_format.c b/src/test/modules/test_copy_format/test_copy_format.c index e064f40473b..f6b105659ab 100644 --- a/src/test/modules/test_copy_format/test_copy_format.c +++ b/src/test/modules/test_copy_format/test_copy_format.c @@ -18,6 +18,40 @@ PG_MODULE_MAGIC; +static void +CopyFromInFunc(CopyFromState cstate, Oid atttypid, + FmgrInfo *finfo, Oid *typioparam) +{ + ereport(NOTICE, (errmsg("CopyFromInFunc: atttypid=%d", atttypid))); +} + +static void +CopyFromStart(CopyFromState cstate, TupleDesc tupDesc) +{ + ereport(NOTICE, (errmsg("CopyFromStart: natts=%d", tupDesc->natts))); +} + +static bool +CopyFromOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls) +{ + ereport(NOTICE, (errmsg("CopyFromOneRow"))); + return false; +} + +static void +CopyFromEnd(CopyFromState cstate) +{ + ereport(NOTICE, (errmsg("CopyFromEnd"))); +} + +static const CopyFromRoutine CopyFromRoutineTestCopyFormat = { + .type = T_CopyFromRoutine, + .CopyFromInFunc = CopyFromInFunc, + .CopyFromStart = CopyFromStart, + .CopyFromOneRow = CopyFromOneRow, + .CopyFromEnd = CopyFromEnd, +}; + static void CopyToOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo) { @@ -59,5 +93,8 @@ test_copy_format(PG_FUNCTION_ARGS) ereport(NOTICE, (errmsg("test_copy_format: is_from=%s", is_from ? "true" : "false"))); - PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat); + if (is_from) + PG_RETURN_POINTER(&CopyFromRoutineTestCopyFormat); + else + PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat); } -- 2.47.1 From ff83119977d4a87bd1e15315ca077042c5e37732 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Mon, 25 Nov 2024 14:19:34 +0900 Subject: [PATCH v28 7/9] Export CopyFromStateData It's for custom COPY FROM format handlers implemented as extension. This just moves codes. This doesn't change codes except CopySource enum values. This changes COPY_ prefix of CopySource enum values to COPY_SOURCE_ prefix like the CopyDest enum values prefix change. For example, COPY_FILE in CopySource is renamed to COPY_SOURCE_FILE. Note that this isn't enough to implement custom COPY FROM format handlers as extension. We'll do the followings in a subsequent commit: 1. Add an opaque space for custom COPY FROM format handler 2. Export CopyReadBinaryData() to read the next data --- src/backend/commands/copyfrom.c | 4 +- src/backend/commands/copyfromparse.c | 10 +- src/include/commands/copy.h | 1 - src/include/commands/copyapi.h | 166 +++++++++++++++++++++++ src/include/commands/copyfrom_internal.h | 166 ----------------------- 5 files changed, 173 insertions(+), 174 deletions(-) diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index 1f502d746f9..401cef7cf99 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -1704,7 +1704,7 @@ BeginCopyFrom(ParseState *pstate, pg_encoding_to_char(GetDatabaseEncoding())))); } - cstate->copy_src = COPY_FILE; /* default */ + cstate->copy_src = COPY_SOURCE_FILE; /* default */ cstate->whereClause = whereClause; @@ -1832,7 +1832,7 @@ BeginCopyFrom(ParseState *pstate, if (data_source_cb) { progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK; - cstate->copy_src = COPY_CALLBACK; + cstate->copy_src = COPY_SOURCE_CALLBACK; cstate->data_source_cb = data_source_cb; } else if (pipe) diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index 65f20d332ee..5fcdbea2c2a 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -170,7 +170,7 @@ ReceiveCopyBegin(CopyFromState cstate) for (i = 0; i < natts; i++) pq_sendint16(&buf, format); /* per-column formats */ pq_endmessage(&buf); - cstate->copy_src = COPY_FRONTEND; + cstate->copy_src = COPY_SOURCE_FRONTEND; cstate->fe_msgbuf = makeStringInfo(); /* We *must* flush here to ensure FE knows it can send. */ pq_flush(); @@ -238,7 +238,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread) switch (cstate->copy_src) { - case COPY_FILE: + case COPY_SOURCE_FILE: bytesread = fread(databuf, 1, maxread, cstate->copy_file); if (ferror(cstate->copy_file)) ereport(ERROR, @@ -247,7 +247,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread) if (bytesread == 0) cstate->raw_reached_eof = true; break; - case COPY_FRONTEND: + case COPY_SOURCE_FRONTEND: while (maxread > 0 && bytesread < minread && !cstate->raw_reached_eof) { int avail; @@ -330,7 +330,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread) bytesread += avail; } break; - case COPY_CALLBACK: + case COPY_SOURCE_CALLBACK: bytesread = cstate->data_source_cb(databuf, minread, maxread); break; } @@ -1158,7 +1158,7 @@ CopyReadLine(CopyFromState cstate, bool is_csv) * after \. up to the protocol end of copy data. (XXX maybe better * not to treat \. as special?) */ - if (cstate->copy_src == COPY_FRONTEND) + if (cstate->copy_src == COPY_SOURCE_FRONTEND) { int inbytes; diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h index f764a6ac829..029a1538f7c 100644 --- a/src/include/commands/copy.h +++ b/src/include/commands/copy.h @@ -91,7 +91,6 @@ typedef struct CopyFormatOptions * NULL) */ } CopyFormatOptions; -/* This is private in commands/copyfrom.c */ typedef struct CopyFromStateData *CopyFromState; typedef struct CopyToStateData *CopyToState; diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index fa3d8d87760..9358515c6f6 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -15,6 +15,7 @@ #define COPYAPI_H #include "commands/copy.h" +#include "commands/trigger.h" #include "executor/execdesc.h" #include "executor/tuptable.h" #include "nodes/execnodes.h" @@ -171,4 +172,169 @@ typedef struct CopyFromRoutine void (*CopyFromEnd) (CopyFromState cstate); } CopyFromRoutine; +/* + * Represents the different source cases we need to worry about at + * the bottom level + */ +typedef enum CopySource +{ + COPY_SOURCE_FILE, /* from file (or a piped program) */ + COPY_SOURCE_FRONTEND, /* from frontend */ + COPY_SOURCE_CALLBACK, /* from callback function */ +} CopySource; + +/* + * Represents the end-of-line terminator type of the input + */ +typedef enum EolType +{ + EOL_UNKNOWN, + EOL_NL, + EOL_CR, + EOL_CRNL, +} EolType; + +/* + * Represents the insert method to be used during COPY FROM. + */ +typedef enum CopyInsertMethod +{ + CIM_SINGLE, /* use table_tuple_insert or ExecForeignInsert */ + CIM_MULTI, /* always use table_multi_insert or + * ExecForeignBatchInsert */ + CIM_MULTI_CONDITIONAL, /* use table_multi_insert or + * ExecForeignBatchInsert only if valid */ +} CopyInsertMethod; + +/* + * This struct contains all the state variables used throughout a COPY FROM + * operation. + */ +typedef struct CopyFromStateData +{ + /* format routine */ + const CopyFromRoutine *routine; + + /* low-level state data */ + CopySource copy_src; /* type of copy source */ + FILE *copy_file; /* used if copy_src == COPY_FILE */ + StringInfo fe_msgbuf; /* used if copy_src == COPY_FRONTEND */ + + EolType eol_type; /* EOL type of input */ + int file_encoding; /* file or remote side's character encoding */ + bool need_transcoding; /* file encoding diff from server? */ + Oid conversion_proc; /* encoding conversion function */ + + /* parameters from the COPY command */ + Relation rel; /* relation to copy from */ + List *attnumlist; /* integer list of attnums to copy */ + char *filename; /* filename, or NULL for STDIN */ + bool is_program; /* is 'filename' a program to popen? */ + copy_data_source_cb data_source_cb; /* function for reading data */ + + CopyFormatOptions opts; + bool *convert_select_flags; /* per-column CSV/TEXT CS flags */ + Node *whereClause; /* WHERE condition (or NULL) */ + + /* these are just for error messages, see CopyFromErrorCallback */ + const char *cur_relname; /* table name for error messages */ + uint64 cur_lineno; /* line number for error messages */ + const char *cur_attname; /* current att for error messages */ + const char *cur_attval; /* current att value for error messages */ + bool relname_only; /* don't output line number, att, etc. */ + + /* + * Working state + */ + MemoryContext copycontext; /* per-copy execution context */ + + AttrNumber num_defaults; /* count of att that are missing and have + * default value */ + FmgrInfo *in_functions; /* array of input functions for each attrs */ + Oid *typioparams; /* array of element types for in_functions */ + ErrorSaveContext *escontext; /* soft error trapped during in_functions + * execution */ + uint64 num_errors; /* total number of rows which contained soft + * errors */ + int *defmap; /* array of default att numbers related to + * missing att */ + ExprState **defexprs; /* array of default att expressions for all + * att */ + bool *defaults; /* if DEFAULT marker was found for + * corresponding att */ + bool volatile_defexprs; /* is any of defexprs volatile? */ + List *range_table; /* single element list of RangeTblEntry */ + List *rteperminfos; /* single element list of RTEPermissionInfo */ + ExprState *qualexpr; + + TransitionCaptureState *transition_capture; + + /* + * These variables are used to reduce overhead in COPY FROM. + * + * attribute_buf holds the separated, de-escaped text for each field of + * the current line. The CopyReadAttributes functions return arrays of + * pointers into this buffer. We avoid palloc/pfree overhead by re-using + * the buffer on each cycle. + * + * In binary COPY FROM, attribute_buf holds the binary data for the + * current field, but the usage is otherwise similar. + */ + StringInfoData attribute_buf; + + /* field raw data pointers found by COPY FROM */ + + int max_fields; + char **raw_fields; + + /* + * Similarly, line_buf holds the whole input line being processed. The + * input cycle is first to read the whole line into line_buf, and then + * extract the individual attribute fields into attribute_buf. line_buf + * is preserved unmodified so that we can display it in error messages if + * appropriate. (In binary mode, line_buf is not used.) + */ + StringInfoData line_buf; + bool line_buf_valid; /* contains the row being processed? */ + + /* + * input_buf holds input data, already converted to database encoding. + * + * In text mode, CopyReadLine parses this data sufficiently to locate line + * boundaries, then transfers the data to line_buf. We guarantee that + * there is a \0 at input_buf[input_buf_len] at all times. (In binary + * mode, input_buf is not used.) + * + * If encoding conversion is not required, input_buf is not a separate + * buffer but points directly to raw_buf. In that case, input_buf_len + * tracks the number of bytes that have been verified as valid in the + * database encoding, and raw_buf_len is the total number of bytes stored + * in the buffer. + */ +#define INPUT_BUF_SIZE 65536 /* we palloc INPUT_BUF_SIZE+1 bytes */ + char *input_buf; + int input_buf_index; /* next byte to process */ + int input_buf_len; /* total # of bytes stored */ + bool input_reached_eof; /* true if we reached EOF */ + bool input_reached_error; /* true if a conversion error happened */ + /* Shorthand for number of unconsumed bytes available in input_buf */ +#define INPUT_BUF_BYTES(cstate) ((cstate)->input_buf_len - (cstate)->input_buf_index) + + /* + * raw_buf holds raw input data read from the data source (file or client + * connection), not yet converted to the database encoding. Like with + * 'input_buf', we guarantee that there is a \0 at raw_buf[raw_buf_len]. + */ +#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */ + char *raw_buf; + int raw_buf_index; /* next byte to process */ + int raw_buf_len; /* total # of bytes stored */ + bool raw_reached_eof; /* true if we reached EOF */ + + /* Shorthand for number of unconsumed bytes available in raw_buf */ +#define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index) + + uint64 bytes_processed; /* number of bytes processed so far */ +} CopyFromStateData; + #endif /* COPYAPI_H */ diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h index e1affe3dfa7..8cc71c12b5d 100644 --- a/src/include/commands/copyfrom_internal.h +++ b/src/include/commands/copyfrom_internal.h @@ -15,174 +15,8 @@ #define COPYFROM_INTERNAL_H #include "commands/copyapi.h" -#include "commands/trigger.h" #include "nodes/miscnodes.h" -/* - * Represents the different source cases we need to worry about at - * the bottom level - */ -typedef enum CopySource -{ - COPY_FILE, /* from file (or a piped program) */ - COPY_FRONTEND, /* from frontend */ - COPY_CALLBACK, /* from callback function */ -} CopySource; - -/* - * Represents the end-of-line terminator type of the input - */ -typedef enum EolType -{ - EOL_UNKNOWN, - EOL_NL, - EOL_CR, - EOL_CRNL, -} EolType; - -/* - * Represents the insert method to be used during COPY FROM. - */ -typedef enum CopyInsertMethod -{ - CIM_SINGLE, /* use table_tuple_insert or ExecForeignInsert */ - CIM_MULTI, /* always use table_multi_insert or - * ExecForeignBatchInsert */ - CIM_MULTI_CONDITIONAL, /* use table_multi_insert or - * ExecForeignBatchInsert only if valid */ -} CopyInsertMethod; - -/* - * This struct contains all the state variables used throughout a COPY FROM - * operation. - */ -typedef struct CopyFromStateData -{ - /* format routine */ - const CopyFromRoutine *routine; - - /* low-level state data */ - CopySource copy_src; /* type of copy source */ - FILE *copy_file; /* used if copy_src == COPY_FILE */ - StringInfo fe_msgbuf; /* used if copy_src == COPY_FRONTEND */ - - EolType eol_type; /* EOL type of input */ - int file_encoding; /* file or remote side's character encoding */ - bool need_transcoding; /* file encoding diff from server? */ - Oid conversion_proc; /* encoding conversion function */ - - /* parameters from the COPY command */ - Relation rel; /* relation to copy from */ - List *attnumlist; /* integer list of attnums to copy */ - char *filename; /* filename, or NULL for STDIN */ - bool is_program; /* is 'filename' a program to popen? */ - copy_data_source_cb data_source_cb; /* function for reading data */ - - CopyFormatOptions opts; - bool *convert_select_flags; /* per-column CSV/TEXT CS flags */ - Node *whereClause; /* WHERE condition (or NULL) */ - - /* these are just for error messages, see CopyFromErrorCallback */ - const char *cur_relname; /* table name for error messages */ - uint64 cur_lineno; /* line number for error messages */ - const char *cur_attname; /* current att for error messages */ - const char *cur_attval; /* current att value for error messages */ - bool relname_only; /* don't output line number, att, etc. */ - - /* - * Working state - */ - MemoryContext copycontext; /* per-copy execution context */ - - AttrNumber num_defaults; /* count of att that are missing and have - * default value */ - FmgrInfo *in_functions; /* array of input functions for each attrs */ - Oid *typioparams; /* array of element types for in_functions */ - ErrorSaveContext *escontext; /* soft error trapped during in_functions - * execution */ - uint64 num_errors; /* total number of rows which contained soft - * errors */ - int *defmap; /* array of default att numbers related to - * missing att */ - ExprState **defexprs; /* array of default att expressions for all - * att */ - bool *defaults; /* if DEFAULT marker was found for - * corresponding att */ - bool volatile_defexprs; /* is any of defexprs volatile? */ - List *range_table; /* single element list of RangeTblEntry */ - List *rteperminfos; /* single element list of RTEPermissionInfo */ - ExprState *qualexpr; - - TransitionCaptureState *transition_capture; - - /* - * These variables are used to reduce overhead in COPY FROM. - * - * attribute_buf holds the separated, de-escaped text for each field of - * the current line. The CopyReadAttributes functions return arrays of - * pointers into this buffer. We avoid palloc/pfree overhead by re-using - * the buffer on each cycle. - * - * In binary COPY FROM, attribute_buf holds the binary data for the - * current field, but the usage is otherwise similar. - */ - StringInfoData attribute_buf; - - /* field raw data pointers found by COPY FROM */ - - int max_fields; - char **raw_fields; - - /* - * Similarly, line_buf holds the whole input line being processed. The - * input cycle is first to read the whole line into line_buf, and then - * extract the individual attribute fields into attribute_buf. line_buf - * is preserved unmodified so that we can display it in error messages if - * appropriate. (In binary mode, line_buf is not used.) - */ - StringInfoData line_buf; - bool line_buf_valid; /* contains the row being processed? */ - - /* - * input_buf holds input data, already converted to database encoding. - * - * In text mode, CopyReadLine parses this data sufficiently to locate line - * boundaries, then transfers the data to line_buf. We guarantee that - * there is a \0 at input_buf[input_buf_len] at all times. (In binary - * mode, input_buf is not used.) - * - * If encoding conversion is not required, input_buf is not a separate - * buffer but points directly to raw_buf. In that case, input_buf_len - * tracks the number of bytes that have been verified as valid in the - * database encoding, and raw_buf_len is the total number of bytes stored - * in the buffer. - */ -#define INPUT_BUF_SIZE 65536 /* we palloc INPUT_BUF_SIZE+1 bytes */ - char *input_buf; - int input_buf_index; /* next byte to process */ - int input_buf_len; /* total # of bytes stored */ - bool input_reached_eof; /* true if we reached EOF */ - bool input_reached_error; /* true if a conversion error happened */ - /* Shorthand for number of unconsumed bytes available in input_buf */ -#define INPUT_BUF_BYTES(cstate) ((cstate)->input_buf_len - (cstate)->input_buf_index) - - /* - * raw_buf holds raw input data read from the data source (file or client - * connection), not yet converted to the database encoding. Like with - * 'input_buf', we guarantee that there is a \0 at raw_buf[raw_buf_len]. - */ -#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */ - char *raw_buf; - int raw_buf_index; /* next byte to process */ - int raw_buf_len; /* total # of bytes stored */ - bool raw_reached_eof; /* true if we reached EOF */ - - /* Shorthand for number of unconsumed bytes available in raw_buf */ -#define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index) - - uint64 bytes_processed; /* number of bytes processed so far */ -} CopyFromStateData; - extern void ReceiveCopyBegin(CopyFromState cstate); extern void ReceiveCopyBinaryHeader(CopyFromState cstate); -- 2.47.1 From 4a52c7c6917f4fe23880aacf3ada904d7bdb3943 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Mon, 25 Nov 2024 14:21:39 +0900 Subject: [PATCH v28 8/9] Add support for implementing custom COPY FROM format as extension * Add CopyFromStateData::opaque that can be used to keep data for custom COPY From format implementation * Export CopyReadBinaryData() to read the next data as CopyFromStateRead() --- src/backend/commands/copyfromparse.c | 12 ++++++++++++ src/include/commands/copyapi.h | 5 +++++ 2 files changed, 17 insertions(+) diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index 5fcdbea2c2a..d79b6ebe8a3 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -729,6 +729,18 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes) return copied_bytes; } +/* + * Export CopyReadBinaryData() for extensions. We want to keep + * CopyReadBinaryData() as a static function for + * optimization. CopyReadBinaryData() calls in this file may be optimized by + * a compiler. + */ +int +CopyFromStateRead(CopyFromState cstate, char *dest, int nbytes) +{ + return CopyReadBinaryData(cstate, dest, nbytes); +} + /* * Read raw fields in the next line for COPY FROM in text or csv mode. * Return false if no more lines. diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index 9358515c6f6..6f158272829 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -335,6 +335,11 @@ typedef struct CopyFromStateData #define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index) uint64 bytes_processed; /* number of bytes processed so far */ + + /* For custom format implementation */ + void *opaque; /* private space */ } CopyFromStateData; +extern int CopyFromStateRead(CopyFromState cstate, char *dest, int nbytes); + #endif /* COPYAPI_H */ -- 2.47.1 From c16a63d3b3db3e5ee236f05affee2261a8f86e97 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Wed, 27 Nov 2024 16:23:55 +0900 Subject: [PATCH v28 9/9] Add CopyFromSkipErrorRow() for custom COPY format extension Extensions must call CopyFromSkipErrorRow() when CopyFromOneRow callback reports an error by errsave(). CopyFromSkipErrorRow() handles "ON_ERROR stop" and "LOG_VERBOSITY verbose" cases. --- src/backend/commands/copyfromparse.c | 82 +++++++++++-------- src/include/commands/copyapi.h | 2 + .../expected/test_copy_format.out | 47 +++++++++++ .../test_copy_format/sql/test_copy_format.sql | 24 ++++++ .../test_copy_format/test_copy_format.c | 82 ++++++++++++++++++- 5 files changed, 199 insertions(+), 38 deletions(-) diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index d79b6ebe8a3..1f78c5b6a9a 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -852,6 +852,51 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields, bool i return true; } +/* + * Call this when you report an error by errsave() in your CopyFromOneRow + * callback. This handles "ON_ERROR stop" and "LOG_VERBOSITY verbose" cases + * for you. + */ +void +CopyFromSkipErrorRow(CopyFromState cstate) +{ + Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP); + + cstate->num_errors++; + + if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE) + { + /* + * Since we emit line number and column info in the below notice + * message, we suppress error context information other than the + * relation name. + */ + Assert(!cstate->relname_only); + cstate->relname_only = true; + + if (cstate->cur_attval) + { + char *attval; + + attval = CopyLimitPrintoutLength(cstate->cur_attval); + ereport(NOTICE, + errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": \"%s\"", + (unsigned long long) cstate->cur_lineno, + cstate->cur_attname, + attval)); + pfree(attval); + } + else + ereport(NOTICE, + errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": null input", + (unsigned long long) cstate->cur_lineno, + cstate->cur_attname)); + + /* reset relname_only */ + cstate->relname_only = false; + } +} + /* * Workhorse for CopyFromTextOneRow() and CopyFromCSVOneRow(). * @@ -960,42 +1005,7 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext, (Node *) cstate->escontext, &values[m])) { - Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP); - - cstate->num_errors++; - - if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE) - { - /* - * Since we emit line number and column info in the below - * notice message, we suppress error context information other - * than the relation name. - */ - Assert(!cstate->relname_only); - cstate->relname_only = true; - - if (cstate->cur_attval) - { - char *attval; - - attval = CopyLimitPrintoutLength(cstate->cur_attval); - ereport(NOTICE, - errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": \"%s\"", - (unsigned long long) cstate->cur_lineno, - cstate->cur_attname, - attval)); - pfree(attval); - } - else - ereport(NOTICE, - errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": null input", - (unsigned long long) cstate->cur_lineno, - cstate->cur_attname)); - - /* reset relname_only */ - cstate->relname_only = false; - } - + CopyFromSkipErrorRow(cstate); return true; } diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index 6f158272829..9aba51a242b 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -342,4 +342,6 @@ typedef struct CopyFromStateData extern int CopyFromStateRead(CopyFromState cstate, char *dest, int nbytes); +extern void CopyFromSkipErrorRow(CopyFromState cstate); + #endif /* COPYAPI_H */ diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out b/src/test/modules/test_copy_format/expected/test_copy_format.out index 016893e7026..b9a6baa85c0 100644 --- a/src/test/modules/test_copy_format/expected/test_copy_format.out +++ b/src/test/modules/test_copy_format/expected/test_copy_format.out @@ -1,6 +1,8 @@ CREATE EXTENSION test_copy_format; CREATE TABLE public.test (a smallint, b integer, c bigint); INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789); +-- 987 is accepted. +-- 654 is a hard error because ON_ERROR is stop by default. COPY public.test FROM stdin WITH (FORMAT 'test_copy_format'); NOTICE: test_copy_format: is_from=true NOTICE: CopyFromInFunc: atttypid=21 @@ -8,7 +10,50 @@ NOTICE: CopyFromInFunc: atttypid=23 NOTICE: CopyFromInFunc: atttypid=20 NOTICE: CopyFromStart: natts=3 NOTICE: CopyFromOneRow +NOTICE: CopyFromOneRow +ERROR: invalid value: "6" +CONTEXT: COPY test, line 2, column a: "6" +-- 987 is accepted. +-- 654 is a soft error because ON_ERROR is ignore. +COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore); +NOTICE: test_copy_format: is_from=true +NOTICE: CopyFromInFunc: atttypid=21 +NOTICE: CopyFromInFunc: atttypid=23 +NOTICE: CopyFromInFunc: atttypid=20 +NOTICE: CopyFromStart: natts=3 +NOTICE: CopyFromOneRow +NOTICE: CopyFromOneRow +NOTICE: CopyFromOneRow +NOTICE: 1 row was skipped due to data type incompatibility NOTICE: CopyFromEnd +-- 987 is accepted. +-- 654 is a soft error because ON_ERROR is ignore. +COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore, LOG_VERBOSITY verbose); +NOTICE: test_copy_format: is_from=true +NOTICE: CopyFromInFunc: atttypid=21 +NOTICE: CopyFromInFunc: atttypid=23 +NOTICE: CopyFromInFunc: atttypid=20 +NOTICE: CopyFromStart: natts=3 +NOTICE: CopyFromOneRow +NOTICE: CopyFromOneRow +NOTICE: skipping row due to data type incompatibility at line 2 for column "a": "6" +NOTICE: CopyFromOneRow +NOTICE: 1 row was skipped due to data type incompatibility +NOTICE: CopyFromEnd +-- 987 is accepted. +-- 654 is a soft error because ON_ERROR is ignore. +-- 321 is a hard error. +COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore); +NOTICE: test_copy_format: is_from=true +NOTICE: CopyFromInFunc: atttypid=21 +NOTICE: CopyFromInFunc: atttypid=23 +NOTICE: CopyFromInFunc: atttypid=20 +NOTICE: CopyFromStart: natts=3 +NOTICE: CopyFromOneRow +NOTICE: CopyFromOneRow +NOTICE: CopyFromOneRow +ERROR: too much lines: 3 +CONTEXT: COPY test, line 3 COPY public.test TO stdout WITH (FORMAT 'test_copy_format'); NOTICE: test_copy_format: is_from=false NOTICE: CopyToOutFunc: atttypid=21 @@ -18,4 +63,6 @@ NOTICE: CopyToStart: natts=3 NOTICE: CopyToOneRow: tts_nvalid=3 NOTICE: CopyToOneRow: tts_nvalid=3 NOTICE: CopyToOneRow: tts_nvalid=3 +NOTICE: CopyToOneRow: tts_nvalid=3 +NOTICE: CopyToOneRow: tts_nvalid=3 NOTICE: CopyToEnd diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql b/src/test/modules/test_copy_format/sql/test_copy_format.sql index 0dfdfa00080..86db71bce7f 100644 --- a/src/test/modules/test_copy_format/sql/test_copy_format.sql +++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql @@ -1,6 +1,30 @@ CREATE EXTENSION test_copy_format; CREATE TABLE public.test (a smallint, b integer, c bigint); INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789); +-- 987 is accepted. +-- 654 is a hard error because ON_ERROR is stop by default. COPY public.test FROM stdin WITH (FORMAT 'test_copy_format'); +987 +654 +\. +-- 987 is accepted. +-- 654 is a soft error because ON_ERROR is ignore. +COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore); +987 +654 +\. +-- 987 is accepted. +-- 654 is a soft error because ON_ERROR is ignore. +COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore, LOG_VERBOSITY verbose); +987 +654 +\. +-- 987 is accepted. +-- 654 is a soft error because ON_ERROR is ignore. +-- 321 is a hard error. +COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore); +987 +654 +321 \. COPY public.test TO stdout WITH (FORMAT 'test_copy_format'); diff --git a/src/test/modules/test_copy_format/test_copy_format.c b/src/test/modules/test_copy_format/test_copy_format.c index f6b105659ab..f0f53838aef 100644 --- a/src/test/modules/test_copy_format/test_copy_format.c +++ b/src/test/modules/test_copy_format/test_copy_format.c @@ -32,10 +32,88 @@ CopyFromStart(CopyFromState cstate, TupleDesc tupDesc) } static bool -CopyFromOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls) +CopyFromOneRow(CopyFromState cstate, ExprContext *econtext, + Datum *values, bool *nulls) { + int n_attributes = list_length(cstate->attnumlist); + char *line; + int line_size = n_attributes + 1; /* +1 is for new line */ + int read_bytes; + ereport(NOTICE, (errmsg("CopyFromOneRow"))); - return false; + + cstate->cur_lineno++; + line = palloc(line_size); + read_bytes = CopyFromStateRead(cstate, line, line_size); + if (read_bytes == 0) + return false; + if (read_bytes != line_size) + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("one line must be %d bytes: %d", + line_size, read_bytes))); + + if (cstate->cur_lineno == 1) + { + /* Success */ + TupleDesc tupDesc = RelationGetDescr(cstate->rel); + ListCell *cur; + int i = 0; + + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + int m = attnum - 1; + Form_pg_attribute att = TupleDescAttr(tupDesc, m); + + if (att->atttypid == INT2OID) + { + values[i] = Int16GetDatum(line[i] - '0'); + } + else if (att->atttypid == INT4OID) + { + values[i] = Int32GetDatum(line[i] - '0'); + } + else if (att->atttypid == INT8OID) + { + values[i] = Int64GetDatum(line[i] - '0'); + } + nulls[i] = false; + i++; + } + } + else if (cstate->cur_lineno == 2) + { + /* Soft error */ + TupleDesc tupDesc = RelationGetDescr(cstate->rel); + int attnum = lfirst_int(list_head(cstate->attnumlist)); + int m = attnum - 1; + Form_pg_attribute att = TupleDescAttr(tupDesc, m); + char value[2]; + + cstate->cur_attname = NameStr(att->attname); + value[0] = line[0]; + value[1] = '\0'; + cstate->cur_attval = value; + errsave((Node *) cstate->escontext, + ( + errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), + errmsg("invalid value: \"%c\"", line[0]))); + CopyFromSkipErrorRow(cstate); + cstate->cur_attname = NULL; + cstate->cur_attval = NULL; + return true; + } + else + { + /* Hard error */ + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("too much lines: %llu", + (unsigned long long) cstate->cur_lineno))); + } + + return true; } static void -- 2.47.1
On Thu, Jan 23, 2025 at 1:12 AM Sutou Kouhei <kou@clear-code.com> wrote: > > Hi, > > I noticed that the last patch set (v27) can't be applied to > the current master. I've rebased on the current master and > created v28 patch set. No code change. Thank you for updating the patch! While 0001 and 0002 look good to me overall, we still need to polish subsequent patches. Here are review comments: --- I still find that it would not be a good idea to move all copy-related struct definitions to copyapi.h because we need to include copyapi.h file into a .c file even if the file is not related to the custom copy format routines. I think that copyapi.h should have only the definitions of CopyToRoutine and CopyFromRoutine as well as some functions related to the custom copy format. Here is an idea: - CopyToState and CopyFromState are defined in copyto_internal.h (new file) and copyfrom_internal.h, respectively. - These two files #include's copy.h and other necessary header files. - copyapi.h has only CopyToRoutine and CopyFromRoutine and #include's both copyfrom_internal.h and copyto_internal.h. - copyto.c, copyfrom.c and copyfromparse.c #include copyapi.h Some advantages of this idea: - we can keep both CopyToState and CopyFromState private in _internal.h files. - custom format extension can include copyapi.h to provide a custom copy format routine and to access the copy state data. - copy-related .c files won't need to include copyapi.h if they don't use custom copy format routines. --- The 0008 patch introduces CopyFromStateRead(). While it would be a good start, I think we can consider sorting out low-level communication functions more. For example, CopyReadBinaryData() uses the internal 64kB buffer but some custom copy format extensions might want to use a larger buffer in its own implementation, which would require exposing CopyGetData() etc. Given that we might expose more functions to provide more ways for extensions, we might want to rename CopyFromStateRead(). --- While we get the format routines for custom formats in ProcessCopyOptionFormat(), we do that for built-in formats in BeginCopyTo(), which seems odd to me. I think we can have CopyToGetRoutine() responsible for getting CopyToRoutine for built-in formats as well as custom format. The same is true for CopyFromRoutine. --- Copy[To|From]Routine for built-in formats are missing to set the node type. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Hi,
In <CAD21AoDyBJrCsh5vNFWcRmS0_XKCCCP4gLzZnLCayYccLpaBfw@mail.gmail.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Tue, 28 Jan 2025 15:00:03 -0800,
  Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> While 0001 and 0002 look good to me overall, we still need to polish
> subsequent patches. Here are review comments:
I attached the v29 patch set that applied your suggestions:
Refactoring:
0001-0002: There are some trivial changes (copyright year
           change and some comment fixes)
COPY TO related:
0003: Applied your copyto_internal.h related,
      CopyToGetRoutine() related and built-in CopyToRoutine
      suggestions
0004: Applied your copyto_internal.h related suggestion
0005: No change
COPY FROM related:
0006: Applied your copyfrom_internal.h related,
      CopyFromGetRoutine() related and built-in CopyFromRoutine
      suggestions
0007: Applied your copyfrom_internal.h related suggestion
0008: Applied your CopyFromStateRead() related suggestion
0009: No change
> I still find that it would not be a good idea to move all copy-related
> struct definitions to copyapi.h because we need to include copyapi.h
> file into a .c file even if the file is not related to the custom copy
> format routines. I think that copyapi.h should have only the
> definitions of CopyToRoutine and CopyFromRoutine as well as some
> functions related to the custom copy format. Here is an idea:
> 
> - CopyToState and CopyFromState are defined in copyto_internal.h (new
> file) and copyfrom_internal.h, respectively.
> - These two files #include's copy.h and other necessary header files.
> - copyapi.h has only CopyToRoutine and CopyFromRoutine and #include's
> both copyfrom_internal.h and copyto_internal.h.
> - copyto.c, copyfrom.c and copyfromparse.c #include copyapi.h
> 
> Some advantages of this idea:
> 
> - we can keep both CopyToState and CopyFromState private in _internal.h files.
> - custom format extension can include copyapi.h to provide a custom
> copy format routine and to access the copy state data.
> - copy-related .c files won't need to include copyapi.h if they don't
> use custom copy format routines.
Hmm. I thought Copy{To,From}State are "public" API not
"private" API for extensions. Because extensions need to use
at least Copy{To,From}State::opaque directly. If we want to
make Copy{To,From}State private, I think that we should
provide getter/setter for needed members of
Copy{To,From}State such as
Copy{To,From}State{Get,Set}Opaque().
It's a design in the v2 patch set:
https://www.postgresql.org/message-id/20231221.183504.1240642084042888377.kou%40clear-code.com
We discussed that we can make CopyToState public:
https://www.postgresql.org/message-id/CAD21AoD%3DUapH4Wh06G6H5XAzPJ0iJg9YcW8r7E2UEJkZ8QsosA%40mail.gmail.com
What does "private" mean here? I thought that it means that
"PostgreSQL itself can use it". But it seems that you mean
that "PostgreSQL itself and custom format extensions can use
it but other extensions can't use it".
I'm not familiar with "_internal.h" in PostgreSQL but is
"_internal.h" for the latter "private" mean?
> The 0008 patch introduces CopyFromStateRead(). While it would be a
> good start, I think we can consider sorting out low-level
> communication functions more. For example, CopyReadBinaryData() uses
> the internal 64kB buffer but some custom copy format extensions might
> want to use a larger buffer in its own implementation, which would
> require exposing CopyGetData() etc. Given that we might expose more
> functions to provide more ways for extensions, we might want to rename
> CopyFromStateRead().
This suggests that we just need a low-level CopyGetData()
not a high-level CopyReadBinaryData() as the first step,
right?
I agree that we should start from a minimal API set.
I've renamed CopyFromStateRead() to CopyFromStateGetData()
because it wraps CopyGetData() now.
> While we get the format routines for custom formats in
> ProcessCopyOptionFormat(), we do that for built-in formats in
> BeginCopyTo(), which seems odd to me. I think we can have
> CopyToGetRoutine() responsible for getting CopyToRoutine for built-in
> formats as well as custom format. The same is true for
> CopyFromRoutine.
I like the current design because we don't need to export
CopyToGetBuiltinRoutine() (we can use static for
CopyToGetBuiltinRoutine()) but I applied your
suggestion. Because it's not a strong opinion.
> Copy[To|From]Routine for built-in formats are missing to set the node type.
Oh, sorry. I missed this.
Thanks,
-- 
kou
From eef8c0bc18a489fea352db242dd9e16003132243 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Sat, 28 Sep 2024 23:24:49 +0900
Subject: [PATCH v29 1/9] Refactor COPY TO to use format callback functions.
This commit introduces a new CopyToRoutine struct, which is a set of
callback routines to copy tuples in a specific format. It also makes
the existing formats (text, CSV, and binary) utilize these format
callbacks.
This change is a preliminary step towards making the COPY TO command
extensible in terms of output formats.
Additionally, this refactoring contributes to a performance
improvement by reducing the number of "if" branches that need to be
checked on a per-row basis when sending field representations in text
or CSV mode. The performance benchmark results showed ~5% performance
gain in text or CSV mode.
Author: Sutou Kouhei
Reviewed-by: Michael Paquier, Tomas Vondra, Masahiko Sawada
Reviewed-by: Junwang Zhao
Discussion: https://postgr.es/m/20231204.153548.2126325458835528809.kou@clear-code.com
---
 src/backend/commands/copyto.c    | 441 +++++++++++++++++++++----------
 src/include/commands/copyapi.h   |  57 ++++
 src/tools/pgindent/typedefs.list |   1 +
 3 files changed, 358 insertions(+), 141 deletions(-)
 create mode 100644 src/include/commands/copyapi.h
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 99cb23cb347..26c67ddc351 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -19,7 +19,7 @@
 #include <sys/stat.h>
 
 #include "access/tableam.h"
-#include "commands/copy.h"
+#include "commands/copyapi.h"
 #include "commands/progress.h"
 #include "executor/execdesc.h"
 #include "executor/executor.h"
@@ -64,6 +64,9 @@ typedef enum CopyDest
  */
 typedef struct CopyToStateData
 {
+    /* format-specific routines */
+    const CopyToRoutine *routine;
+
     /* low-level state data */
     CopyDest    copy_dest;        /* type of copy source/destination */
     FILE       *copy_file;        /* used if copy_dest == COPY_FILE */
@@ -114,6 +117,19 @@ static void CopyAttributeOutText(CopyToState cstate, const char *string);
 static void CopyAttributeOutCSV(CopyToState cstate, const char *string,
                                 bool use_quote);
 
+/* built-in format-specific routines */
+static void CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc);
+static void CopyToTextLikeOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo);
+static void CopyToTextOneRow(CopyToState cstate, TupleTableSlot *slot);
+static void CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot);
+static void CopyToTextLikeOneRow(CopyToState cstate, TupleTableSlot *slot,
+                                 bool is_csv);
+static void CopyToTextLikeEnd(CopyToState cstate);
+static void CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc);
+static void CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo);
+static void CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot);
+static void CopyToBinaryEnd(CopyToState cstate);
+
 /* Low-level communications functions */
 static void SendCopyBegin(CopyToState cstate);
 static void SendCopyEnd(CopyToState cstate);
@@ -121,9 +137,254 @@ static void CopySendData(CopyToState cstate, const void *databuf, int datasize);
 static void CopySendString(CopyToState cstate, const char *str);
 static void CopySendChar(CopyToState cstate, char c);
 static void CopySendEndOfRow(CopyToState cstate);
+static void CopySendTextLikeEndOfRow(CopyToState cstate);
 static void CopySendInt32(CopyToState cstate, int32 val);
 static void CopySendInt16(CopyToState cstate, int16 val);
 
+/*
+ * COPY TO routines for built-in formats.
+ *
+ * CSV and text formats share the same TextLike routines except for the
+ * one-row callback.
+ */
+
+/* text format */
+static const CopyToRoutine CopyToRoutineText = {
+    .CopyToStart = CopyToTextLikeStart,
+    .CopyToOutFunc = CopyToTextLikeOutFunc,
+    .CopyToOneRow = CopyToTextOneRow,
+    .CopyToEnd = CopyToTextLikeEnd,
+};
+
+/* CSV format */
+static const CopyToRoutine CopyToRoutineCSV = {
+    .CopyToStart = CopyToTextLikeStart,
+    .CopyToOutFunc = CopyToTextLikeOutFunc,
+    .CopyToOneRow = CopyToCSVOneRow,
+    .CopyToEnd = CopyToTextLikeEnd,
+};
+
+/* binary format */
+static const CopyToRoutine CopyToRoutineBinary = {
+    .CopyToStart = CopyToBinaryStart,
+    .CopyToOutFunc = CopyToBinaryOutFunc,
+    .CopyToOneRow = CopyToBinaryOneRow,
+    .CopyToEnd = CopyToBinaryEnd,
+};
+
+/* Return a COPY TO routine for the given options */
+static const CopyToRoutine *
+CopyToGetRoutine(CopyFormatOptions opts)
+{
+    if (opts.csv_mode)
+        return &CopyToRoutineCSV;
+    else if (opts.binary)
+        return &CopyToRoutineBinary;
+
+    /* default is text */
+    return &CopyToRoutineText;
+}
+
+/* Implementation of the start callback for text and CSV formats */
+static void
+CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
+{
+    /*
+     * For non-binary copy, we need to convert null_print to file encoding,
+     * because it will be sent directly with CopySendString.
+     */
+    if (cstate->need_transcoding)
+        cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print,
+                                                          cstate->opts.null_print_len,
+                                                          cstate->file_encoding);
+
+    /* if a header has been requested send the line */
+    if (cstate->opts.header_line)
+    {
+        ListCell   *cur;
+        bool        hdr_delim = false;
+
+        foreach(cur, cstate->attnumlist)
+        {
+            int            attnum = lfirst_int(cur);
+            char       *colname;
+
+            if (hdr_delim)
+                CopySendChar(cstate, cstate->opts.delim[0]);
+            hdr_delim = true;
+
+            colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
+
+            if (cstate->opts.csv_mode)
+                CopyAttributeOutCSV(cstate, colname, false);
+            else
+                CopyAttributeOutText(cstate, colname);
+        }
+
+        CopySendTextLikeEndOfRow(cstate);
+    }
+}
+
+/*
+ * Implementation of the outfunc callback for text and CSV formats. Assign
+ * the output function data to the given *finfo.
+ */
+static void
+CopyToTextLikeOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo)
+{
+    Oid            func_oid;
+    bool        is_varlena;
+
+    /* Set output function for an attribute */
+    getTypeOutputInfo(atttypid, &func_oid, &is_varlena);
+    fmgr_info(func_oid, finfo);
+}
+
+/* Implementation of the per-row callback for text format */
+static void
+CopyToTextOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+    CopyToTextLikeOneRow(cstate, slot, false);
+}
+
+/* Implementation of the per-row callback for CSV format */
+static void
+CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+    CopyToTextLikeOneRow(cstate, slot, true);
+}
+
+/*
+ * Workhorse for CopyToTextOneRow() and CopyToCSVOneRow().
+ *
+ * We use pg_attribute_always_inline to reduce function call overheads.
+ */
+static pg_attribute_always_inline void
+CopyToTextLikeOneRow(CopyToState cstate,
+                     TupleTableSlot *slot,
+                     bool is_csv)
+{
+    bool        need_delim = false;
+    FmgrInfo   *out_functions = cstate->out_functions;
+
+    foreach_int(attnum, cstate->attnumlist)
+    {
+        Datum        value = slot->tts_values[attnum - 1];
+        bool        isnull = slot->tts_isnull[attnum - 1];
+
+        if (need_delim)
+            CopySendChar(cstate, cstate->opts.delim[0]);
+        need_delim = true;
+
+        if (isnull)
+        {
+            CopySendString(cstate, cstate->opts.null_print_client);
+        }
+        else
+        {
+            char       *string;
+
+            string = OutputFunctionCall(&out_functions[attnum - 1],
+                                        value);
+
+            /*
+             * is_csv will be optimized away by compiler, as argument is
+             * constant at caller.
+             */
+            if (is_csv)
+                CopyAttributeOutCSV(cstate, string,
+                                    cstate->opts.force_quote_flags[attnum - 1]);
+            else
+                CopyAttributeOutText(cstate, string);
+        }
+    }
+
+    CopySendTextLikeEndOfRow(cstate);
+}
+
+/* Implementation of the end callback for text and CSV formats */
+static void
+CopyToTextLikeEnd(CopyToState cstate)
+{
+    /* Nothing to do here */
+}
+
+/*
+ * Implementation of the start callback for binary format. Send a header
+ * for a binary copy.
+ */
+static void
+CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc)
+{
+    int32        tmp;
+
+    /* Signature */
+    CopySendData(cstate, BinarySignature, 11);
+    /* Flags field */
+    tmp = 0;
+    CopySendInt32(cstate, tmp);
+    /* No header extension */
+    tmp = 0;
+    CopySendInt32(cstate, tmp);
+}
+
+/*
+ * Implementation of the outfunc callback for binary format. Assign
+ * the binary output function to the given *finfo.
+ */
+static void
+CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo)
+{
+    Oid            func_oid;
+    bool        is_varlena;
+
+    /* Set output function for an attribute */
+    getTypeBinaryOutputInfo(atttypid, &func_oid, &is_varlena);
+    fmgr_info(func_oid, finfo);
+}
+
+/* Implementation of the per-row callback for binary format */
+static void
+CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+    FmgrInfo   *out_functions = cstate->out_functions;
+
+    /* Binary per-tuple header */
+    CopySendInt16(cstate, list_length(cstate->attnumlist));
+
+    foreach_int(attnum, cstate->attnumlist)
+    {
+        Datum        value = slot->tts_values[attnum - 1];
+        bool        isnull = slot->tts_isnull[attnum - 1];
+
+        if (isnull)
+        {
+            CopySendInt32(cstate, -1);
+        }
+        else
+        {
+            bytea       *outputbytes;
+
+            outputbytes = SendFunctionCall(&out_functions[attnum - 1],
+                                           value);
+            CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
+            CopySendData(cstate, VARDATA(outputbytes),
+                         VARSIZE(outputbytes) - VARHDRSZ);
+        }
+    }
+
+    CopySendEndOfRow(cstate);
+}
+
+/* Implementation of the end callback for binary format */
+static void
+CopyToBinaryEnd(CopyToState cstate)
+{
+    /* Generate trailer for a binary copy */
+    CopySendInt16(cstate, -1);
+    /* Need to flush out the trailer */
+    CopySendEndOfRow(cstate);
+}
 
 /*
  * Send copy start/stop messages for frontend copies.  These have changed
@@ -191,16 +452,6 @@ CopySendEndOfRow(CopyToState cstate)
     switch (cstate->copy_dest)
     {
         case COPY_FILE:
-            if (!cstate->opts.binary)
-            {
-                /* Default line termination depends on platform */
-#ifndef WIN32
-                CopySendChar(cstate, '\n');
-#else
-                CopySendString(cstate, "\r\n");
-#endif
-            }
-
             if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1,
                        cstate->copy_file) != 1 ||
                 ferror(cstate->copy_file))
@@ -235,10 +486,6 @@ CopySendEndOfRow(CopyToState cstate)
             }
             break;
         case COPY_FRONTEND:
-            /* The FE/BE protocol uses \n as newline for all platforms */
-            if (!cstate->opts.binary)
-                CopySendChar(cstate, '\n');
-
             /* Dump the accumulated row as one CopyData message */
             (void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len);
             break;
@@ -254,6 +501,35 @@ CopySendEndOfRow(CopyToState cstate)
     resetStringInfo(fe_msgbuf);
 }
 
+/*
+ * Wrapper function of CopySendEndOfRow for text and CSV formats. Sends the
+ * the line termination and do common appropriate things for the end of row.
+ */
+static inline void
+CopySendTextLikeEndOfRow(CopyToState cstate)
+{
+    switch (cstate->copy_dest)
+    {
+        case COPY_FILE:
+            /* Default line termination depends on platform */
+#ifndef WIN32
+            CopySendChar(cstate, '\n');
+#else
+            CopySendString(cstate, "\r\n");
+#endif
+            break;
+        case COPY_FRONTEND:
+            /* The FE/BE protocol uses \n as newline for all platforms */
+            CopySendChar(cstate, '\n');
+            break;
+        default:
+            break;
+    }
+
+    /* Now take the actions related to the end of a row */
+    CopySendEndOfRow(cstate);
+}
+
 /*
  * These functions do apply some data conversion
  */
@@ -426,6 +702,9 @@ BeginCopyTo(ParseState *pstate,
     /* Extract options from the statement node tree */
     ProcessCopyOptions(pstate, &cstate->opts, false /* is_from */ , options);
 
+    /* Set format routine */
+    cstate->routine = CopyToGetRoutine(cstate->opts);
+
     /* Process the source/target relation or query */
     if (rel)
     {
@@ -771,19 +1050,10 @@ DoCopyTo(CopyToState cstate)
     foreach(cur, cstate->attnumlist)
     {
         int            attnum = lfirst_int(cur);
-        Oid            out_func_oid;
-        bool        isvarlena;
         Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
 
-        if (cstate->opts.binary)
-            getTypeBinaryOutputInfo(attr->atttypid,
-                                    &out_func_oid,
-                                    &isvarlena);
-        else
-            getTypeOutputInfo(attr->atttypid,
-                              &out_func_oid,
-                              &isvarlena);
-        fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
+        cstate->routine->CopyToOutFunc(cstate, attr->atttypid,
+                                       &cstate->out_functions[attnum - 1]);
     }
 
     /*
@@ -796,56 +1066,7 @@ DoCopyTo(CopyToState cstate)
                                                "COPY TO",
                                                ALLOCSET_DEFAULT_SIZES);
 
-    if (cstate->opts.binary)
-    {
-        /* Generate header for a binary copy */
-        int32        tmp;
-
-        /* Signature */
-        CopySendData(cstate, BinarySignature, 11);
-        /* Flags field */
-        tmp = 0;
-        CopySendInt32(cstate, tmp);
-        /* No header extension */
-        tmp = 0;
-        CopySendInt32(cstate, tmp);
-    }
-    else
-    {
-        /*
-         * For non-binary copy, we need to convert null_print to file
-         * encoding, because it will be sent directly with CopySendString.
-         */
-        if (cstate->need_transcoding)
-            cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print,
-                                                              cstate->opts.null_print_len,
-                                                              cstate->file_encoding);
-
-        /* if a header has been requested send the line */
-        if (cstate->opts.header_line)
-        {
-            bool        hdr_delim = false;
-
-            foreach(cur, cstate->attnumlist)
-            {
-                int            attnum = lfirst_int(cur);
-                char       *colname;
-
-                if (hdr_delim)
-                    CopySendChar(cstate, cstate->opts.delim[0]);
-                hdr_delim = true;
-
-                colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
-
-                if (cstate->opts.csv_mode)
-                    CopyAttributeOutCSV(cstate, colname, false);
-                else
-                    CopyAttributeOutText(cstate, colname);
-            }
-
-            CopySendEndOfRow(cstate);
-        }
-    }
+    cstate->routine->CopyToStart(cstate, tupDesc);
 
     if (cstate->rel)
     {
@@ -884,13 +1105,7 @@ DoCopyTo(CopyToState cstate)
         processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
     }
 
-    if (cstate->opts.binary)
-    {
-        /* Generate trailer for a binary copy */
-        CopySendInt16(cstate, -1);
-        /* Need to flush out the trailer */
-        CopySendEndOfRow(cstate);
-    }
+    cstate->routine->CopyToEnd(cstate);
 
     MemoryContextDelete(cstate->rowcontext);
 
@@ -903,74 +1118,18 @@ DoCopyTo(CopyToState cstate)
 /*
  * Emit one row during DoCopyTo().
  */
-static void
+static inline void
 CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 {
-    FmgrInfo   *out_functions = cstate->out_functions;
     MemoryContext oldcontext;
 
     MemoryContextReset(cstate->rowcontext);
     oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
 
-    if (cstate->opts.binary)
-    {
-        /* Binary per-tuple header */
-        CopySendInt16(cstate, list_length(cstate->attnumlist));
-    }
-
     /* Make sure the tuple is fully deconstructed */
     slot_getallattrs(slot);
 
-    if (!cstate->opts.binary)
-    {
-        bool        need_delim = false;
-
-        foreach_int(attnum, cstate->attnumlist)
-        {
-            Datum        value = slot->tts_values[attnum - 1];
-            bool        isnull = slot->tts_isnull[attnum - 1];
-            char       *string;
-
-            if (need_delim)
-                CopySendChar(cstate, cstate->opts.delim[0]);
-            need_delim = true;
-
-            if (isnull)
-                CopySendString(cstate, cstate->opts.null_print_client);
-            else
-            {
-                string = OutputFunctionCall(&out_functions[attnum - 1],
-                                            value);
-                if (cstate->opts.csv_mode)
-                    CopyAttributeOutCSV(cstate, string,
-                                        cstate->opts.force_quote_flags[attnum - 1]);
-                else
-                    CopyAttributeOutText(cstate, string);
-            }
-        }
-    }
-    else
-    {
-        foreach_int(attnum, cstate->attnumlist)
-        {
-            Datum        value = slot->tts_values[attnum - 1];
-            bool        isnull = slot->tts_isnull[attnum - 1];
-            bytea       *outputbytes;
-
-            if (isnull)
-                CopySendInt32(cstate, -1);
-            else
-            {
-                outputbytes = SendFunctionCall(&out_functions[attnum - 1],
-                                               value);
-                CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
-                CopySendData(cstate, VARDATA(outputbytes),
-                             VARSIZE(outputbytes) - VARHDRSZ);
-            }
-        }
-    }
-
-    CopySendEndOfRow(cstate);
+    cstate->routine->CopyToOneRow(cstate, slot);
 
     MemoryContextSwitchTo(oldcontext);
 }
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
new file mode 100644
index 00000000000..be29e3fbdef
--- /dev/null
+++ b/src/include/commands/copyapi.h
@@ -0,0 +1,57 @@
+/*-------------------------------------------------------------------------
+ *
+ * copyapi.h
+ *      API for COPY TO handlers
+ *
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/commands/copyapi.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef COPYAPI_H
+#define COPYAPI_H
+
+#include "commands/copy.h"
+#include "executor/tuptable.h"
+#include "nodes/execnodes.h"
+
+/*
+ * API structure for a COPY TO format implementation. Note this must be
+ * allocated in a server-lifetime manner, typically as a static const struct.
+ */
+typedef struct CopyToRoutine
+{
+    /*
+     * Set output function information. This callback is called once at the
+     * beginning of COPY TO.
+     *
+     * 'finfo' can be optionally filled to provide the catalog information of
+     * the output function.
+     *
+     * 'atttypid' is the OID of data type used by the relation's attribute.
+     */
+    void        (*CopyToOutFunc) (CopyToState cstate, Oid atttypid,
+                                  FmgrInfo *finfo);
+
+    /*
+     * Start a COPY TO. This callback is called once at the beginning of COPY
+     * FROM.
+     *
+     * 'tupDesc' is the tuple descriptor of the relation from where the data
+     * is read.
+     */
+    void        (*CopyToStart) (CopyToState cstate, TupleDesc tupDesc);
+
+    /*
+     * Write one row to the 'slot'.
+     */
+    void        (*CopyToOneRow) (CopyToState cstate, TupleTableSlot *slot);
+
+    /* End a COPY TO. This callback is called once at the end of COPY FROM */
+    void        (*CopyToEnd) (CopyToState cstate);
+} CopyToRoutine;
+
+#endif                            /* COPYAPI_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index a2644a2e653..1cbb3628857 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -508,6 +508,7 @@ CopyMultiInsertInfo
 CopyOnErrorChoice
 CopySource
 CopyStmt
+CopyToRoutine
 CopyToState
 CopyToStateData
 Cost
-- 
2.47.1
From a4e1392e26f96a645bb327119838830c553a7c69 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 18 Nov 2024 16:32:43 -0800
Subject: [PATCH v29 2/9] Refactor COPY FROM to use format callback functions.
This commit introduces a new CopyFromRoutine struct, which is a set of
callback routines to read tuples in a specific format. It also makes
COPY FROM with the existing formats (text, CSV, and binary) utilize
these format callbacks.
This change is a preliminary step towards making the COPY TO command
extensible in terms of output formats.
Similar to XXXX, this refactoring contributes to a performance
improvement by reducing the number of "if" branches that need to be
checked on a per-row basis when sending field representations in text
or CSV mode. The performance benchmark results showed ~5% performance
gain in text or CSV mode.
Author: Sutou Kouhei
Reviewed-by: Michael Paquier, Tomas Vondra, Masahiko Sawada
Reviewed-by: Junwang Zhao
Discussion: https://postgr.es/m/20231204.153548.2126325458835528809.kou@clear-code.com
---
 contrib/file_fdw/file_fdw.c              |   1 -
 src/backend/commands/copyfrom.c          | 190 +++++++--
 src/backend/commands/copyfromparse.c     | 504 +++++++++++++----------
 src/include/commands/copy.h              |   2 -
 src/include/commands/copyapi.h           |  48 ++-
 src/include/commands/copyfrom_internal.h |  13 +-
 src/tools/pgindent/typedefs.list         |   1 +
 7 files changed, 492 insertions(+), 267 deletions(-)
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 678e754b2b9..323c43dca4a 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -21,7 +21,6 @@
 #include "access/table.h"
 #include "catalog/pg_authid.h"
 #include "catalog/pg_foreign_table.h"
-#include "commands/copy.h"
 #include "commands/copyfrom_internal.h"
 #include "commands/defrem.h"
 #include "commands/explain.h"
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 0cbd05f5602..917fa6605ef 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -106,6 +106,145 @@ typedef struct CopyMultiInsertInfo
 /* non-export function prototypes */
 static void ClosePipeFromProgram(CopyFromState cstate);
 
+/*
+ * Built-in format-specific routines. One-row callbacks are defined in
+ * copyfromparse.c
+ */
+static void CopyFromTextLikeInFunc(CopyFromState cstate, Oid atttypid, FmgrInfo *finfo,
+                                   Oid *typioparam);
+static void CopyFromTextLikeStart(CopyFromState cstate, TupleDesc tupDesc);
+static void CopyFromTextLikeEnd(CopyFromState cstate);
+static void CopyFromBinaryInFunc(CopyFromState cstate, Oid atttypid,
+                                 FmgrInfo *finfo, Oid *typioparam);
+static void CopyFromBinaryStart(CopyFromState cstate, TupleDesc tupDesc);
+static void CopyFromBinaryEnd(CopyFromState cstate);
+
+
+/*
+ * COPY FROM routines for built-in formats.
+ *
+ * CSV and text formats share the same TextLike routines except for the
+ * one-row callback.
+ */
+
+/* text format */
+static const CopyFromRoutine CopyFromRoutineText = {
+    .CopyFromInFunc = CopyFromTextLikeInFunc,
+    .CopyFromStart = CopyFromTextLikeStart,
+    .CopyFromOneRow = CopyFromTextOneRow,
+    .CopyFromEnd = CopyFromTextLikeEnd,
+};
+
+/* CSV format */
+static const CopyFromRoutine CopyFromRoutineCSV = {
+    .CopyFromInFunc = CopyFromTextLikeInFunc,
+    .CopyFromStart = CopyFromTextLikeStart,
+    .CopyFromOneRow = CopyFromCSVOneRow,
+    .CopyFromEnd = CopyFromTextLikeEnd,
+};
+
+/* binary format */
+static const CopyFromRoutine CopyFromRoutineBinary = {
+    .CopyFromInFunc = CopyFromBinaryInFunc,
+    .CopyFromStart = CopyFromBinaryStart,
+    .CopyFromOneRow = CopyFromBinaryOneRow,
+    .CopyFromEnd = CopyFromBinaryEnd,
+};
+
+/* Return a COPY FROM routine for the given options */
+static const CopyFromRoutine *
+CopyFromGetRoutine(CopyFormatOptions opts)
+{
+    if (opts.csv_mode)
+        return &CopyFromRoutineCSV;
+    else if (opts.binary)
+        return &CopyFromRoutineBinary;
+
+    /* default is text */
+    return &CopyFromRoutineText;
+}
+
+/* Implementation of the start callback for text and CSV formats */
+static void
+CopyFromTextLikeStart(CopyFromState cstate, TupleDesc tupDesc)
+{
+    AttrNumber    attr_count;
+
+    /*
+     * If encoding conversion is needed, we need another buffer to hold the
+     * converted input data.  Otherwise, we can just point input_buf to the
+     * same buffer as raw_buf.
+     */
+    if (cstate->need_transcoding)
+    {
+        cstate->input_buf = (char *) palloc(INPUT_BUF_SIZE + 1);
+        cstate->input_buf_index = cstate->input_buf_len = 0;
+    }
+    else
+        cstate->input_buf = cstate->raw_buf;
+    cstate->input_reached_eof = false;
+
+    initStringInfo(&cstate->line_buf);
+
+    /*
+     * Create workspace for CopyReadAttributes results; used by CSV and text
+     * format.
+     */
+    attr_count = list_length(cstate->attnumlist);
+    cstate->max_fields = attr_count;
+    cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *));
+}
+
+/*
+ * Implementation of the infunc callback for text and CSV formats. Assign
+ * the input function data to the given *finfo.
+ */
+static void
+CopyFromTextLikeInFunc(CopyFromState cstate, Oid atttypid, FmgrInfo *finfo,
+                       Oid *typioparam)
+{
+    Oid            func_oid;
+
+    getTypeInputInfo(atttypid, &func_oid, typioparam);
+    fmgr_info(func_oid, finfo);
+}
+
+/* Implementation of the end callback for text and CSV formats */
+static void
+CopyFromTextLikeEnd(CopyFromState cstate)
+{
+    /* nothing to do */
+}
+
+/* Implementation of the start callback for binary format */
+static void
+CopyFromBinaryStart(CopyFromState cstate, TupleDesc tupDesc)
+{
+    /* Read and verify binary header */
+    ReceiveCopyBinaryHeader(cstate);
+}
+
+/*
+ * Implementation of the infunc callback for binary format. Assign
+ * the binary input function to the given *finfo.
+ */
+static void
+CopyFromBinaryInFunc(CopyFromState cstate, Oid atttypid,
+                     FmgrInfo *finfo, Oid *typioparam)
+{
+    Oid            func_oid;
+
+    getTypeBinaryInputInfo(atttypid, &func_oid, typioparam);
+    fmgr_info(func_oid, finfo);
+}
+
+/* Implementation of the end callback for binary format */
+static void
+CopyFromBinaryEnd(CopyFromState cstate)
+{
+    /* nothing to do */
+}
+
 /*
  * error context callback for COPY FROM
  *
@@ -1396,7 +1535,6 @@ BeginCopyFrom(ParseState *pstate,
                 num_defaults;
     FmgrInfo   *in_functions;
     Oid           *typioparams;
-    Oid            in_func_oid;
     int           *defmap;
     ExprState **defexprs;
     MemoryContext oldcontext;
@@ -1428,6 +1566,9 @@ BeginCopyFrom(ParseState *pstate,
     /* Extract options from the statement node tree */
     ProcessCopyOptions(pstate, &cstate->opts, true /* is_from */ , options);
 
+    /* Set the format routine */
+    cstate->routine = CopyFromGetRoutine(cstate->opts);
+
     /* Process the target relation */
     cstate->rel = rel;
 
@@ -1583,25 +1724,6 @@ BeginCopyFrom(ParseState *pstate,
     cstate->raw_buf_index = cstate->raw_buf_len = 0;
     cstate->raw_reached_eof = false;
 
-    if (!cstate->opts.binary)
-    {
-        /*
-         * If encoding conversion is needed, we need another buffer to hold
-         * the converted input data.  Otherwise, we can just point input_buf
-         * to the same buffer as raw_buf.
-         */
-        if (cstate->need_transcoding)
-        {
-            cstate->input_buf = (char *) palloc(INPUT_BUF_SIZE + 1);
-            cstate->input_buf_index = cstate->input_buf_len = 0;
-        }
-        else
-            cstate->input_buf = cstate->raw_buf;
-        cstate->input_reached_eof = false;
-
-        initStringInfo(&cstate->line_buf);
-    }
-
     initStringInfo(&cstate->attribute_buf);
 
     /* Assign range table and rteperminfos, we'll need them in CopyFrom. */
@@ -1634,13 +1756,9 @@ BeginCopyFrom(ParseState *pstate,
             continue;
 
         /* Fetch the input function and typioparam info */
-        if (cstate->opts.binary)
-            getTypeBinaryInputInfo(att->atttypid,
-                                   &in_func_oid, &typioparams[attnum - 1]);
-        else
-            getTypeInputInfo(att->atttypid,
-                             &in_func_oid, &typioparams[attnum - 1]);
-        fmgr_info(in_func_oid, &in_functions[attnum - 1]);
+        cstate->routine->CopyFromInFunc(cstate, att->atttypid,
+                                        &in_functions[attnum - 1],
+                                        &typioparams[attnum - 1]);
 
         /* Get default info if available */
         defexprs[attnum - 1] = NULL;
@@ -1775,20 +1893,7 @@ BeginCopyFrom(ParseState *pstate,
 
     pgstat_progress_update_multi_param(3, progress_cols, progress_vals);
 
-    if (cstate->opts.binary)
-    {
-        /* Read and verify binary header */
-        ReceiveCopyBinaryHeader(cstate);
-    }
-
-    /* create workspace for CopyReadAttributes results */
-    if (!cstate->opts.binary)
-    {
-        AttrNumber    attr_count = list_length(cstate->attnumlist);
-
-        cstate->max_fields = attr_count;
-        cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *));
-    }
+    cstate->routine->CopyFromStart(cstate, tupDesc);
 
     MemoryContextSwitchTo(oldcontext);
 
@@ -1801,6 +1906,9 @@ BeginCopyFrom(ParseState *pstate,
 void
 EndCopyFrom(CopyFromState cstate)
 {
+    /* Invoke the end callback */
+    cstate->routine->CopyFromEnd(cstate);
+
     /* No COPY FROM related resources except memory. */
     if (cstate->is_program)
     {
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index caccdc8563c..65f20d332ee 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -62,7 +62,6 @@
 #include <unistd.h>
 #include <sys/stat.h>
 
-#include "commands/copy.h"
 #include "commands/copyfrom_internal.h"
 #include "commands/progress.h"
 #include "executor/executor.h"
@@ -140,8 +139,8 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 
 
 /* non-export function prototypes */
-static bool CopyReadLine(CopyFromState cstate);
-static bool CopyReadLineText(CopyFromState cstate);
+static bool CopyReadLine(CopyFromState cstate, bool is_csv);
+static bool CopyReadLineText(CopyFromState cstate, bool is_csv);
 static int    CopyReadAttributesText(CopyFromState cstate);
 static int    CopyReadAttributesCSV(CopyFromState cstate);
 static Datum CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
@@ -740,9 +739,11 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes)
  * in the relation.
  *
  * NOTE: force_not_null option are not applied to the returned fields.
+ *
+ * We use pg_attribute_always_inline to reduce function call overheads.
  */
-bool
-NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
+static pg_attribute_always_inline bool
+NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields, bool is_csv)
 {
     int            fldct;
     bool        done;
@@ -759,13 +760,17 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
         tupDesc = RelationGetDescr(cstate->rel);
 
         cstate->cur_lineno++;
-        done = CopyReadLine(cstate);
+        done = CopyReadLine(cstate, is_csv);
 
         if (cstate->opts.header_line == COPY_HEADER_MATCH)
         {
             int            fldnum;
 
-            if (cstate->opts.csv_mode)
+            /*
+             * is_csv will be optimized away by compiler, as argument is
+             * constant at caller.
+             */
+            if (is_csv)
                 fldct = CopyReadAttributesCSV(cstate);
             else
                 fldct = CopyReadAttributesText(cstate);
@@ -809,7 +814,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
     cstate->cur_lineno++;
 
     /* Actually read the line into memory here */
-    done = CopyReadLine(cstate);
+    done = CopyReadLine(cstate, is_csv);
 
     /*
      * EOF at start of line means we're done.  If we see EOF after some
@@ -819,8 +824,13 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
     if (done && cstate->line_buf.len == 0)
         return false;
 
-    /* Parse the line into de-escaped field values */
-    if (cstate->opts.csv_mode)
+    /*
+     * Parse the line into de-escaped field values
+     *
+     * is_csv will be optimized away by compiler, as argument is constant at
+     * caller.
+     */
+    if (is_csv)
         fldct = CopyReadAttributesCSV(cstate);
     else
         fldct = CopyReadAttributesText(cstate);
@@ -830,6 +840,244 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
     return true;
 }
 
+/*
+ * Workhorse for CopyFromTextOneRow() and CopyFromCSVOneRow().
+ *
+ * We use pg_attribute_always_inline to reduce function call overheads.
+ */
+static pg_attribute_always_inline bool
+CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
+                       Datum *values, bool *nulls, bool is_csv)
+{
+    TupleDesc    tupDesc;
+    AttrNumber    attr_count;
+    FmgrInfo   *in_functions = cstate->in_functions;
+    Oid           *typioparams = cstate->typioparams;
+    ExprState **defexprs = cstate->defexprs;
+    char      **field_strings;
+    ListCell   *cur;
+    int            fldct;
+    int            fieldno;
+    char       *string;
+
+    tupDesc = RelationGetDescr(cstate->rel);
+    attr_count = list_length(cstate->attnumlist);
+
+    /* read raw fields in the next line */
+    if (!NextCopyFromRawFields(cstate, &field_strings, &fldct, is_csv))
+        return false;
+
+    /* check for overflowing fields */
+    if (attr_count > 0 && fldct > attr_count)
+        ereport(ERROR,
+                (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                 errmsg("extra data after last expected column")));
+
+    fieldno = 0;
+
+    /* Loop to read the user attributes on the line. */
+    foreach(cur, cstate->attnumlist)
+    {
+        int            attnum = lfirst_int(cur);
+        int            m = attnum - 1;
+        Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+        if (fieldno >= fldct)
+            ereport(ERROR,
+                    (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                     errmsg("missing data for column \"%s\"",
+                            NameStr(att->attname))));
+        string = field_strings[fieldno++];
+
+        if (cstate->convert_select_flags &&
+            !cstate->convert_select_flags[m])
+        {
+            /* ignore input field, leaving column as NULL */
+            continue;
+        }
+
+        if (is_csv)
+        {
+            if (string == NULL &&
+                cstate->opts.force_notnull_flags[m])
+            {
+                /*
+                 * FORCE_NOT_NULL option is set and column is NULL - convert
+                 * it to the NULL string.
+                 */
+                string = cstate->opts.null_print;
+            }
+            else if (string != NULL && cstate->opts.force_null_flags[m]
+                     && strcmp(string, cstate->opts.null_print) == 0)
+            {
+                /*
+                 * FORCE_NULL option is set and column matches the NULL
+                 * string. It must have been quoted, or otherwise the string
+                 * would already have been set to NULL. Convert it to NULL as
+                 * specified.
+                 */
+                string = NULL;
+            }
+        }
+
+        cstate->cur_attname = NameStr(att->attname);
+        cstate->cur_attval = string;
+
+        if (string != NULL)
+            nulls[m] = false;
+
+        if (cstate->defaults[m])
+        {
+            /*
+             * The caller must supply econtext and have switched into the
+             * per-tuple memory context in it.
+             */
+            Assert(econtext != NULL);
+            Assert(CurrentMemoryContext == econtext->ecxt_per_tuple_memory);
+
+            values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]);
+        }
+
+        /*
+         * If ON_ERROR is specified with IGNORE, skip rows with soft errors
+         */
+        else if (!InputFunctionCallSafe(&in_functions[m],
+                                        string,
+                                        typioparams[m],
+                                        att->atttypmod,
+                                        (Node *) cstate->escontext,
+                                        &values[m]))
+        {
+            Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
+
+            cstate->num_errors++;
+
+            if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+            {
+                /*
+                 * Since we emit line number and column info in the below
+                 * notice message, we suppress error context information other
+                 * than the relation name.
+                 */
+                Assert(!cstate->relname_only);
+                cstate->relname_only = true;
+
+                if (cstate->cur_attval)
+                {
+                    char       *attval;
+
+                    attval = CopyLimitPrintoutLength(cstate->cur_attval);
+                    ereport(NOTICE,
+                            errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\":
\"%s\"",
+                                   (unsigned long long) cstate->cur_lineno,
+                                   cstate->cur_attname,
+                                   attval));
+                    pfree(attval);
+                }
+                else
+                    ereport(NOTICE,
+                            errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": null
input",
+                                   (unsigned long long) cstate->cur_lineno,
+                                   cstate->cur_attname));
+
+                /* reset relname_only */
+                cstate->relname_only = false;
+            }
+
+            return true;
+        }
+
+        cstate->cur_attname = NULL;
+        cstate->cur_attval = NULL;
+    }
+
+    Assert(fieldno == attr_count);
+
+    return true;
+}
+
+/* Implementation of the per-row callback for text format */
+bool
+CopyFromTextOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values,
+                   bool *nulls)
+{
+    return CopyFromTextLikeOneRow(cstate, econtext, values, nulls, false);
+}
+
+/* Implementation of the per-row callback for CSV format */
+bool
+CopyFromCSVOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values,
+                  bool *nulls)
+{
+    return CopyFromTextLikeOneRow(cstate, econtext, values, nulls, true);
+}
+
+/* Implementation of the per-row callback for binary format */
+bool
+CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values,
+                     bool *nulls)
+{
+    TupleDesc    tupDesc;
+    AttrNumber    attr_count;
+    FmgrInfo   *in_functions = cstate->in_functions;
+    Oid           *typioparams = cstate->typioparams;
+    int16        fld_count;
+    ListCell   *cur;
+
+    tupDesc = RelationGetDescr(cstate->rel);
+    attr_count = list_length(cstate->attnumlist);
+
+    cstate->cur_lineno++;
+
+    if (!CopyGetInt16(cstate, &fld_count))
+    {
+        /* EOF detected (end of file, or protocol-level EOF) */
+        return false;
+    }
+
+    if (fld_count == -1)
+    {
+        /*
+         * Received EOF marker.  Wait for the protocol-level EOF, and complain
+         * if it doesn't come immediately.  In COPY FROM STDIN, this ensures
+         * that we correctly handle CopyFail, if client chooses to send that
+         * now.  When copying from file, we could ignore the rest of the file
+         * like in text mode, but we choose to be consistent with the COPY
+         * FROM STDIN case.
+         */
+        char        dummy;
+
+        if (CopyReadBinaryData(cstate, &dummy, 1) > 0)
+            ereport(ERROR,
+                    (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                     errmsg("received copy data after EOF marker")));
+        return false;
+    }
+
+    if (fld_count != attr_count)
+        ereport(ERROR,
+                (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                 errmsg("row field count is %d, expected %d",
+                        (int) fld_count, attr_count)));
+
+    foreach(cur, cstate->attnumlist)
+    {
+        int            attnum = lfirst_int(cur);
+        int            m = attnum - 1;
+        Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+        cstate->cur_attname = NameStr(att->attname);
+        values[m] = CopyReadBinaryAttribute(cstate,
+                                            &in_functions[m],
+                                            typioparams[m],
+                                            att->atttypmod,
+                                            &nulls[m]);
+        cstate->cur_attname = NULL;
+    }
+
+    return true;
+}
+
 /*
  * Read next tuple from file for COPY FROM. Return false if no more tuples.
  *
@@ -847,216 +1095,22 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 {
     TupleDesc    tupDesc;
     AttrNumber    num_phys_attrs,
-                attr_count,
                 num_defaults = cstate->num_defaults;
-    FmgrInfo   *in_functions = cstate->in_functions;
-    Oid           *typioparams = cstate->typioparams;
     int            i;
     int           *defmap = cstate->defmap;
     ExprState **defexprs = cstate->defexprs;
 
     tupDesc = RelationGetDescr(cstate->rel);
     num_phys_attrs = tupDesc->natts;
-    attr_count = list_length(cstate->attnumlist);
 
     /* Initialize all values for row to NULL */
     MemSet(values, 0, num_phys_attrs * sizeof(Datum));
     MemSet(nulls, true, num_phys_attrs * sizeof(bool));
     MemSet(cstate->defaults, false, num_phys_attrs * sizeof(bool));
 
-    if (!cstate->opts.binary)
-    {
-        char      **field_strings;
-        ListCell   *cur;
-        int            fldct;
-        int            fieldno;
-        char       *string;
-
-        /* read raw fields in the next line */
-        if (!NextCopyFromRawFields(cstate, &field_strings, &fldct))
-            return false;
-
-        /* check for overflowing fields */
-        if (attr_count > 0 && fldct > attr_count)
-            ereport(ERROR,
-                    (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                     errmsg("extra data after last expected column")));
-
-        fieldno = 0;
-
-        /* Loop to read the user attributes on the line. */
-        foreach(cur, cstate->attnumlist)
-        {
-            int            attnum = lfirst_int(cur);
-            int            m = attnum - 1;
-            Form_pg_attribute att = TupleDescAttr(tupDesc, m);
-
-            if (fieldno >= fldct)
-                ereport(ERROR,
-                        (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                         errmsg("missing data for column \"%s\"",
-                                NameStr(att->attname))));
-            string = field_strings[fieldno++];
-
-            if (cstate->convert_select_flags &&
-                !cstate->convert_select_flags[m])
-            {
-                /* ignore input field, leaving column as NULL */
-                continue;
-            }
-
-            if (cstate->opts.csv_mode)
-            {
-                if (string == NULL &&
-                    cstate->opts.force_notnull_flags[m])
-                {
-                    /*
-                     * FORCE_NOT_NULL option is set and column is NULL -
-                     * convert it to the NULL string.
-                     */
-                    string = cstate->opts.null_print;
-                }
-                else if (string != NULL && cstate->opts.force_null_flags[m]
-                         && strcmp(string, cstate->opts.null_print) == 0)
-                {
-                    /*
-                     * FORCE_NULL option is set and column matches the NULL
-                     * string. It must have been quoted, or otherwise the
-                     * string would already have been set to NULL. Convert it
-                     * to NULL as specified.
-                     */
-                    string = NULL;
-                }
-            }
-
-            cstate->cur_attname = NameStr(att->attname);
-            cstate->cur_attval = string;
-
-            if (string != NULL)
-                nulls[m] = false;
-
-            if (cstate->defaults[m])
-            {
-                /*
-                 * The caller must supply econtext and have switched into the
-                 * per-tuple memory context in it.
-                 */
-                Assert(econtext != NULL);
-                Assert(CurrentMemoryContext == econtext->ecxt_per_tuple_memory);
-
-                values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]);
-            }
-
-            /*
-             * If ON_ERROR is specified with IGNORE, skip rows with soft
-             * errors
-             */
-            else if (!InputFunctionCallSafe(&in_functions[m],
-                                            string,
-                                            typioparams[m],
-                                            att->atttypmod,
-                                            (Node *) cstate->escontext,
-                                            &values[m]))
-            {
-                Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
-
-                cstate->num_errors++;
-
-                if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
-                {
-                    /*
-                     * Since we emit line number and column info in the below
-                     * notice message, we suppress error context information
-                     * other than the relation name.
-                     */
-                    Assert(!cstate->relname_only);
-                    cstate->relname_only = true;
-
-                    if (cstate->cur_attval)
-                    {
-                        char       *attval;
-
-                        attval = CopyLimitPrintoutLength(cstate->cur_attval);
-                        ereport(NOTICE,
-                                errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\":
\"%s\"",
-                                       (unsigned long long) cstate->cur_lineno,
-                                       cstate->cur_attname,
-                                       attval));
-                        pfree(attval);
-                    }
-                    else
-                        ereport(NOTICE,
-                                errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\":
nullinput",
 
-                                       (unsigned long long) cstate->cur_lineno,
-                                       cstate->cur_attname));
-
-                    /* reset relname_only */
-                    cstate->relname_only = false;
-                }
-
-                return true;
-            }
-
-            cstate->cur_attname = NULL;
-            cstate->cur_attval = NULL;
-        }
-
-        Assert(fieldno == attr_count);
-    }
-    else
-    {
-        /* binary */
-        int16        fld_count;
-        ListCell   *cur;
-
-        cstate->cur_lineno++;
-
-        if (!CopyGetInt16(cstate, &fld_count))
-        {
-            /* EOF detected (end of file, or protocol-level EOF) */
-            return false;
-        }
-
-        if (fld_count == -1)
-        {
-            /*
-             * Received EOF marker.  Wait for the protocol-level EOF, and
-             * complain if it doesn't come immediately.  In COPY FROM STDIN,
-             * this ensures that we correctly handle CopyFail, if client
-             * chooses to send that now.  When copying from file, we could
-             * ignore the rest of the file like in text mode, but we choose to
-             * be consistent with the COPY FROM STDIN case.
-             */
-            char        dummy;
-
-            if (CopyReadBinaryData(cstate, &dummy, 1) > 0)
-                ereport(ERROR,
-                        (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                         errmsg("received copy data after EOF marker")));
-            return false;
-        }
-
-        if (fld_count != attr_count)
-            ereport(ERROR,
-                    (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                     errmsg("row field count is %d, expected %d",
-                            (int) fld_count, attr_count)));
-
-        foreach(cur, cstate->attnumlist)
-        {
-            int            attnum = lfirst_int(cur);
-            int            m = attnum - 1;
-            Form_pg_attribute att = TupleDescAttr(tupDesc, m);
-
-            cstate->cur_attname = NameStr(att->attname);
-            values[m] = CopyReadBinaryAttribute(cstate,
-                                                &in_functions[m],
-                                                typioparams[m],
-                                                att->atttypmod,
-                                                &nulls[m]);
-            cstate->cur_attname = NULL;
-        }
-    }
+    /* Get one row from source */
+    if (!cstate->routine->CopyFromOneRow(cstate, econtext, values, nulls))
+        return false;
 
     /*
      * Now compute and insert any defaults available for the columns not
@@ -1087,7 +1141,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
  * in the final value of line_buf.
  */
 static bool
-CopyReadLine(CopyFromState cstate)
+CopyReadLine(CopyFromState cstate, bool is_csv)
 {
     bool        result;
 
@@ -1095,7 +1149,7 @@ CopyReadLine(CopyFromState cstate)
     cstate->line_buf_valid = false;
 
     /* Parse data and transfer into line_buf */
-    result = CopyReadLineText(cstate);
+    result = CopyReadLineText(cstate, is_csv);
 
     if (result)
     {
@@ -1163,7 +1217,7 @@ CopyReadLine(CopyFromState cstate)
  * CopyReadLineText - inner loop of CopyReadLine for text mode
  */
 static bool
-CopyReadLineText(CopyFromState cstate)
+CopyReadLineText(CopyFromState cstate, bool is_csv)
 {
     char       *copy_input_buf;
     int            input_buf_ptr;
@@ -1178,7 +1232,11 @@ CopyReadLineText(CopyFromState cstate)
     char        quotec = '\0';
     char        escapec = '\0';
 
-    if (cstate->opts.csv_mode)
+    /*
+     * is_csv will be optimized away by compiler, as argument is constant at
+     * caller.
+     */
+    if (is_csv)
     {
         quotec = cstate->opts.quote[0];
         escapec = cstate->opts.escape[0];
@@ -1255,7 +1313,11 @@ CopyReadLineText(CopyFromState cstate)
         prev_raw_ptr = input_buf_ptr;
         c = copy_input_buf[input_buf_ptr++];
 
-        if (cstate->opts.csv_mode)
+        /*
+         * is_csv will be optimized away by compiler, as argument is constant
+         * at caller.
+         */
+        if (is_csv)
         {
             /*
              * If character is '\r', we may need to look ahead below.  Force
@@ -1294,7 +1356,7 @@ CopyReadLineText(CopyFromState cstate)
         }
 
         /* Process \r */
-        if (c == '\r' && (!cstate->opts.csv_mode || !in_quote))
+        if (c == '\r' && (!is_csv || !in_quote))
         {
             /* Check for \r\n on first line, _and_ handle \r\n. */
             if (cstate->eol_type == EOL_UNKNOWN ||
@@ -1322,10 +1384,10 @@ CopyReadLineText(CopyFromState cstate)
                     if (cstate->eol_type == EOL_CRNL)
                         ereport(ERROR,
                                 (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                                 !cstate->opts.csv_mode ?
+                                 !is_csv ?
                                  errmsg("literal carriage return found in data") :
                                  errmsg("unquoted carriage return found in data"),
-                                 !cstate->opts.csv_mode ?
+                                 !is_csv ?
                                  errhint("Use \"\\r\" to represent carriage return.") :
                                  errhint("Use quoted CSV field to represent carriage return.")));
 
@@ -1339,10 +1401,10 @@ CopyReadLineText(CopyFromState cstate)
             else if (cstate->eol_type == EOL_NL)
                 ereport(ERROR,
                         (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                         !cstate->opts.csv_mode ?
+                         !is_csv ?
                          errmsg("literal carriage return found in data") :
                          errmsg("unquoted carriage return found in data"),
-                         !cstate->opts.csv_mode ?
+                         !is_csv ?
                          errhint("Use \"\\r\" to represent carriage return.") :
                          errhint("Use quoted CSV field to represent carriage return.")));
             /* If reach here, we have found the line terminator */
@@ -1350,15 +1412,15 @@ CopyReadLineText(CopyFromState cstate)
         }
 
         /* Process \n */
-        if (c == '\n' && (!cstate->opts.csv_mode || !in_quote))
+        if (c == '\n' && (!is_csv || !in_quote))
         {
             if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL)
                 ereport(ERROR,
                         (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                         !cstate->opts.csv_mode ?
+                         !is_csv ?
                          errmsg("literal newline found in data") :
                          errmsg("unquoted newline found in data"),
-                         !cstate->opts.csv_mode ?
+                         !is_csv ?
                          errhint("Use \"\\n\" to represent newline.") :
                          errhint("Use quoted CSV field to represent newline.")));
             cstate->eol_type = EOL_NL;    /* in case not set yet */
@@ -1370,7 +1432,7 @@ CopyReadLineText(CopyFromState cstate)
          * Process backslash, except in CSV mode where backslash is a normal
          * character.
          */
-        if (c == '\\' && !cstate->opts.csv_mode)
+        if (c == '\\' && !is_csv)
         {
             char        c2;
 
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 06dfdfef721..7bc044e2816 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -107,8 +107,6 @@ extern CopyFromState BeginCopyFrom(ParseState *pstate, Relation rel, Node *where
 extern void EndCopyFrom(CopyFromState cstate);
 extern bool NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
                          Datum *values, bool *nulls);
-extern bool NextCopyFromRawFields(CopyFromState cstate,
-                                  char ***fields, int *nfields);
 extern void CopyFromErrorCallback(void *arg);
 extern char *CopyLimitPrintoutLength(const char *str);
 
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index be29e3fbdef..51e131e5e8a 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -1,7 +1,7 @@
 /*-------------------------------------------------------------------------
  *
  * copyapi.h
- *      API for COPY TO handlers
+ *      API for COPY TO/FROM handlers
  *
  *
  * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
@@ -54,4 +54,50 @@ typedef struct CopyToRoutine
     void        (*CopyToEnd) (CopyToState cstate);
 } CopyToRoutine;
 
+/*
+ * API structure for a COPY FROM format implementation.     Note this must be
+ * allocated in a server-lifetime manner, typically as a static const struct.
+ */
+typedef struct CopyFromRoutine
+{
+    /*
+     * Set input function information. This callback is called once at the
+     * beginning of COPY FROM.
+     *
+     * 'finfo' can be optionally filled to provide the catalog information of
+     * the input function.
+     *
+     * 'typioparam' can be optionally filled to define the OID of the type to
+     * pass to the input function.'atttypid' is the OID of data type used by
+     * the relation's attribute.
+     */
+    void        (*CopyFromInFunc) (CopyFromState cstate, Oid atttypid,
+                                   FmgrInfo *finfo, Oid *typioparam);
+
+    /*
+     * Start a COPY FROM. This callback is called once at the beginning of
+     * COPY FROM.
+     *
+     * 'tupDesc' is the tuple descriptor of the relation where the data needs
+     * to be copied.  This can be used for any initialization steps required
+     * by a format.
+     */
+    void        (*CopyFromStart) (CopyFromState cstate, TupleDesc tupDesc);
+
+    /*
+     * Read one row from the source and fill *values and *nulls.
+     *
+     * 'econtext' is used to evaluate default expression for each column that
+     * is either not read from the file or is using the DEFAULT option of COPY
+     * FROM.  It is NULL if no default values are used.
+     *
+     * Returns false if there are no more tuples to read.
+     */
+    bool        (*CopyFromOneRow) (CopyFromState cstate, ExprContext *econtext,
+                                   Datum *values, bool *nulls);
+
+    /* End a COPY FROM. This callback is called once at the end of COPY FROM */
+    void        (*CopyFromEnd) (CopyFromState cstate);
+} CopyFromRoutine;
+
 #endif                            /* COPYAPI_H */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 1d8ac8f62e6..e1affe3dfa7 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -14,7 +14,7 @@
 #ifndef COPYFROM_INTERNAL_H
 #define COPYFROM_INTERNAL_H
 
-#include "commands/copy.h"
+#include "commands/copyapi.h"
 #include "commands/trigger.h"
 #include "nodes/miscnodes.h"
 
@@ -58,6 +58,9 @@ typedef enum CopyInsertMethod
  */
 typedef struct CopyFromStateData
 {
+    /* format routine */
+    const CopyFromRoutine *routine;
+
     /* low-level state data */
     CopySource    copy_src;        /* type of copy source */
     FILE       *copy_file;        /* used if copy_src == COPY_FILE */
@@ -183,4 +186,12 @@ typedef struct CopyFromStateData
 extern void ReceiveCopyBegin(CopyFromState cstate);
 extern void ReceiveCopyBinaryHeader(CopyFromState cstate);
 
+/* One-row callbacks for built-in formats defined in copyfromparse.c */
+extern bool CopyFromTextOneRow(CopyFromState cstate, ExprContext *econtext,
+                               Datum *values, bool *nulls);
+extern bool CopyFromCSVOneRow(CopyFromState cstate, ExprContext *econtext,
+                              Datum *values, bool *nulls);
+extern bool CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext,
+                                 Datum *values, bool *nulls);
+
 #endif                            /* COPYFROM_INTERNAL_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 1cbb3628857..afdafefeb9b 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -497,6 +497,7 @@ ConvertRowtypeExpr
 CookedConstraint
 CopyDest
 CopyFormatOptions
+CopyFromRoutine
 CopyFromState
 CopyFromStateData
 CopyHeaderChoice
-- 
2.47.1
From cb4937aed8565e620715e03ae3b469341ab5ae65 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Mon, 25 Nov 2024 12:19:15 +0900
Subject: [PATCH v29 3/9] Add support for adding custom COPY TO format
This uses the handler approach like tablesample. The approach creates
an internal function that returns an internal struct. In this case,
a COPY TO handler returns a CopyToRoutine.
This also add a test module for custom COPY TO handler.
---
 src/backend/commands/copy.c                   | 97 ++++++++++++++++---
 src/backend/commands/copyto.c                 | 20 ++--
 src/backend/nodes/Makefile                    |  1 +
 src/backend/nodes/gen_node_support.pl         |  2 +
 src/backend/utils/adt/pseudotypes.c           |  1 +
 src/include/catalog/pg_proc.dat               |  6 ++
 src/include/catalog/pg_type.dat               |  6 ++
 src/include/commands/copy.h                   |  1 +
 src/include/commands/copyapi.h                |  4 +-
 src/include/commands/copyto_internal.h        | 21 ++++
 src/include/nodes/meson.build                 |  1 +
 src/test/modules/Makefile                     |  1 +
 src/test/modules/meson.build                  |  1 +
 src/test/modules/test_copy_format/.gitignore  |  4 +
 src/test/modules/test_copy_format/Makefile    | 23 +++++
 .../expected/test_copy_format.out             | 17 ++++
 src/test/modules/test_copy_format/meson.build | 33 +++++++
 .../test_copy_format/sql/test_copy_format.sql |  5 +
 .../test_copy_format--1.0.sql                 |  8 ++
 .../test_copy_format/test_copy_format.c       | 63 ++++++++++++
 .../test_copy_format/test_copy_format.control |  4 +
 21 files changed, 295 insertions(+), 24 deletions(-)
 mode change 100644 => 100755 src/backend/nodes/gen_node_support.pl
 create mode 100644 src/include/commands/copyto_internal.h
 create mode 100644 src/test/modules/test_copy_format/.gitignore
 create mode 100644 src/test/modules/test_copy_format/Makefile
 create mode 100644 src/test/modules/test_copy_format/expected/test_copy_format.out
 create mode 100644 src/test/modules/test_copy_format/meson.build
 create mode 100644 src/test/modules/test_copy_format/sql/test_copy_format.sql
 create mode 100644 src/test/modules/test_copy_format/test_copy_format--1.0.sql
 create mode 100644 src/test/modules/test_copy_format/test_copy_format.c
 create mode 100644 src/test/modules/test_copy_format/test_copy_format.control
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfca9d9dc29..9500156b163 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -22,7 +22,7 @@
 #include "access/table.h"
 #include "access/xact.h"
 #include "catalog/pg_authid.h"
-#include "commands/copy.h"
+#include "commands/copyapi.h"
 #include "commands/defrem.h"
 #include "executor/executor.h"
 #include "mb/pg_wchar.h"
@@ -32,6 +32,7 @@
 #include "parser/parse_coerce.h"
 #include "parser/parse_collate.h"
 #include "parser/parse_expr.h"
+#include "parser/parse_func.h"
 #include "parser/parse_relation.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
@@ -476,6 +477,79 @@ defGetCopyLogVerbosityChoice(DefElem *def, ParseState *pstate)
     return COPY_LOG_VERBOSITY_DEFAULT;    /* keep compiler quiet */
 }
 
+/*
+ * Process the "format" option.
+ *
+ * This function checks whether the option value is a built-in format such as
+ * "text" and "csv" or not. If the option value isn't a built-in format, this
+ * function finds a COPY format handler that returns a CopyToRoutine (for
+ * is_from == false). If no COPY format handler is found, this function
+ * reports an error.
+ */
+static void
+ProcessCopyOptionFormat(ParseState *pstate,
+                        CopyFormatOptions *opts_out,
+                        bool is_from,
+                        DefElem *defel)
+{
+    char       *format;
+    bool        isBuiltin;
+    Oid            funcargtypes[1];
+    Oid            handlerOid = InvalidOid;
+    Datum        datum;
+    Node       *routine;
+
+    format = defGetString(defel);
+
+    isBuiltin = true;
+    opts_out->csv_mode = false;
+    opts_out->binary = false;
+    /* built-in formats */
+    if (strcmp(format, "text") == 0)
+         /* "csv_mode == false && binary == false" means "text" */ ;
+    else if (strcmp(format, "csv") == 0)
+        opts_out->csv_mode = true;
+    else if (strcmp(format, "binary") == 0)
+        opts_out->binary = true;
+    else
+        isBuiltin = false;
+    if (isBuiltin)
+    {
+        if (!is_from)
+            opts_out->routine = (Node *) CopyToGetBuiltinRoutine(opts_out);
+        return;
+    }
+
+    /* custom format */
+    if (!is_from)
+    {
+        funcargtypes[0] = INTERNALOID;
+        handlerOid = LookupFuncName(list_make1(makeString(format)), 1,
+                                    funcargtypes, true);
+    }
+    if (!OidIsValid(handlerOid))
+        ereport(ERROR,
+                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                 errmsg("COPY format \"%s\" not recognized", format),
+                 parser_errposition(pstate, defel->location)));
+
+    datum = OidFunctionCall1(handlerOid, BoolGetDatum(is_from));
+    routine = (Node *) DatumGetPointer(datum);
+    if (routine == NULL || !IsA(routine, CopyToRoutine))
+        ereport(
+                ERROR,
+                (errcode(
+                         ERRCODE_INVALID_PARAMETER_VALUE),
+                 errmsg("COPY handler function "
+                        "%s(%u) did not return a "
+                        "CopyToRoutine struct",
+                        format, handlerOid),
+                 parser_errposition(
+                                    pstate, defel->location)));
+
+    opts_out->routine = routine;
+}
+
 /*
  * Process the statement option list for COPY.
  *
@@ -519,22 +593,10 @@ ProcessCopyOptions(ParseState *pstate,
 
         if (strcmp(defel->defname, "format") == 0)
         {
-            char       *fmt = defGetString(defel);
-
             if (format_specified)
                 errorConflictingDefElem(defel, pstate);
             format_specified = true;
-            if (strcmp(fmt, "text") == 0)
-                 /* default format */ ;
-            else if (strcmp(fmt, "csv") == 0)
-                opts_out->csv_mode = true;
-            else if (strcmp(fmt, "binary") == 0)
-                opts_out->binary = true;
-            else
-                ereport(ERROR,
-                        (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-                         errmsg("COPY format \"%s\" not recognized", fmt),
-                         parser_errposition(pstate, defel->location)));
+            ProcessCopyOptionFormat(pstate, opts_out, is_from, defel);
         }
         else if (strcmp(defel->defname, "freeze") == 0)
         {
@@ -685,6 +747,13 @@ ProcessCopyOptions(ParseState *pstate,
                      parser_errposition(pstate, defel->location)));
     }
 
+    /* If format option isn't specified, we use a built-in routine. */
+    if (!format_specified)
+    {
+        if (!is_from)
+            opts_out->routine = (Node *) CopyToGetBuiltinRoutine(opts_out);
+    }
+
     /*
      * Check for incompatible options (must do these three before inserting
      * defaults)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 26c67ddc351..f7f44b368b7 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -150,6 +150,7 @@ static void CopySendInt16(CopyToState cstate, int16 val);
 
 /* text format */
 static const CopyToRoutine CopyToRoutineText = {
+    .type = T_CopyToRoutine,
     .CopyToStart = CopyToTextLikeStart,
     .CopyToOutFunc = CopyToTextLikeOutFunc,
     .CopyToOneRow = CopyToTextOneRow,
@@ -158,6 +159,7 @@ static const CopyToRoutine CopyToRoutineText = {
 
 /* CSV format */
 static const CopyToRoutine CopyToRoutineCSV = {
+    .type = T_CopyToRoutine,
     .CopyToStart = CopyToTextLikeStart,
     .CopyToOutFunc = CopyToTextLikeOutFunc,
     .CopyToOneRow = CopyToCSVOneRow,
@@ -166,23 +168,23 @@ static const CopyToRoutine CopyToRoutineCSV = {
 
 /* binary format */
 static const CopyToRoutine CopyToRoutineBinary = {
+    .type = T_CopyToRoutine,
     .CopyToStart = CopyToBinaryStart,
     .CopyToOutFunc = CopyToBinaryOutFunc,
     .CopyToOneRow = CopyToBinaryOneRow,
     .CopyToEnd = CopyToBinaryEnd,
 };
 
-/* Return a COPY TO routine for the given options */
-static const CopyToRoutine *
-CopyToGetRoutine(CopyFormatOptions opts)
+/* Return a built-in COPY TO routine for the given options */
+const CopyToRoutine *
+CopyToGetBuiltinRoutine(CopyFormatOptions *opts)
 {
-    if (opts.csv_mode)
+    if (opts->csv_mode)
         return &CopyToRoutineCSV;
-    else if (opts.binary)
+    else if (opts->binary)
         return &CopyToRoutineBinary;
-
-    /* default is text */
-    return &CopyToRoutineText;
+    else
+        return &CopyToRoutineText;
 }
 
 /* Implementation of the start callback for text and CSV formats */
@@ -703,7 +705,7 @@ BeginCopyTo(ParseState *pstate,
     ProcessCopyOptions(pstate, &cstate->opts, false /* is_from */ , options);
 
     /* Set format routine */
-    cstate->routine = CopyToGetRoutine(cstate->opts);
+    cstate->routine = (const CopyToRoutine *) cstate->opts.routine;
 
     /* Process the source/target relation or query */
     if (rel)
diff --git a/src/backend/nodes/Makefile b/src/backend/nodes/Makefile
index 66bbad8e6e0..173ee11811c 100644
--- a/src/backend/nodes/Makefile
+++ b/src/backend/nodes/Makefile
@@ -49,6 +49,7 @@ node_headers = \
     access/sdir.h \
     access/tableam.h \
     access/tsmapi.h \
+    commands/copyapi.h \
     commands/event_trigger.h \
     commands/trigger.h \
     executor/tuptable.h \
diff --git a/src/backend/nodes/gen_node_support.pl b/src/backend/nodes/gen_node_support.pl
old mode 100644
new mode 100755
index 7c012c27f88..5d53d32c4a7
--- a/src/backend/nodes/gen_node_support.pl
+++ b/src/backend/nodes/gen_node_support.pl
@@ -61,6 +61,7 @@ my @all_input_files = qw(
   access/sdir.h
   access/tableam.h
   access/tsmapi.h
+  commands/copyapi.h
   commands/event_trigger.h
   commands/trigger.h
   executor/tuptable.h
@@ -85,6 +86,7 @@ my @nodetag_only_files = qw(
   access/sdir.h
   access/tableam.h
   access/tsmapi.h
+  commands/copyapi.h
   commands/event_trigger.h
   commands/trigger.h
   executor/tuptable.h
diff --git a/src/backend/utils/adt/pseudotypes.c b/src/backend/utils/adt/pseudotypes.c
index 317a1f2b282..f2ebc21ca56 100644
--- a/src/backend/utils/adt/pseudotypes.c
+++ b/src/backend/utils/adt/pseudotypes.c
@@ -370,6 +370,7 @@ PSEUDOTYPE_DUMMY_IO_FUNCS(fdw_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(table_am_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(index_am_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(tsm_handler);
+PSEUDOTYPE_DUMMY_IO_FUNCS(copy_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(internal);
 PSEUDOTYPE_DUMMY_IO_FUNCS(anyelement);
 PSEUDOTYPE_DUMMY_IO_FUNCS(anynonarray);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 5b8c2ad2a54..b231e7a041e 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -7803,6 +7803,12 @@
 { oid => '3312', descr => 'I/O',
   proname => 'tsm_handler_out', prorettype => 'cstring',
   proargtypes => 'tsm_handler', prosrc => 'tsm_handler_out' },
+{ oid => '8753', descr => 'I/O',
+  proname => 'copy_handler_in', proisstrict => 'f', prorettype => 'copy_handler',
+  proargtypes => 'cstring', prosrc => 'copy_handler_in' },
+{ oid => '8754', descr => 'I/O',
+  proname => 'copy_handler_out', prorettype => 'cstring',
+  proargtypes => 'copy_handler', prosrc => 'copy_handler_out' },
 { oid => '267', descr => 'I/O',
   proname => 'table_am_handler_in', proisstrict => 'f',
   prorettype => 'table_am_handler', proargtypes => 'cstring',
diff --git a/src/include/catalog/pg_type.dat b/src/include/catalog/pg_type.dat
index 6dca77e0a22..340e0cd0a8d 100644
--- a/src/include/catalog/pg_type.dat
+++ b/src/include/catalog/pg_type.dat
@@ -633,6 +633,12 @@
   typcategory => 'P', typinput => 'tsm_handler_in',
   typoutput => 'tsm_handler_out', typreceive => '-', typsend => '-',
   typalign => 'i' },
+{ oid => '8752',
+  descr => 'pseudo-type for the result of a copy to method function',
+  typname => 'copy_handler', typlen => '4', typbyval => 't', typtype => 'p',
+  typcategory => 'P', typinput => 'copy_handler_in',
+  typoutput => 'copy_handler_out', typreceive => '-', typsend => '-',
+  typalign => 'i' },
 { oid => '269',
   descr => 'pseudo-type for the result of a table AM handler function',
   typname => 'table_am_handler', typlen => '4', typbyval => 't', typtype => 'p',
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 7bc044e2816..2a90b39b6f6 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -87,6 +87,7 @@ typedef struct CopyFormatOptions
     CopyLogVerbosityChoice log_verbosity;    /* verbosity of logged messages */
     int64        reject_limit;    /* maximum tolerable number of errors */
     List       *convert_select; /* list of column names (can be NIL) */
+    Node       *routine;        /* CopyToRoutine */
 } CopyFormatOptions;
 
 /* These are private in commands/copy[from|to].c */
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 51e131e5e8a..12e4b1d47a7 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -14,7 +14,7 @@
 #ifndef COPYAPI_H
 #define COPYAPI_H
 
-#include "commands/copy.h"
+#include "commands/copyto_internal.h"
 #include "executor/tuptable.h"
 #include "nodes/execnodes.h"
 
@@ -24,6 +24,8 @@
  */
 typedef struct CopyToRoutine
 {
+    NodeTag        type;
+
     /*
      * Set output function information. This callback is called once at the
      * beginning of COPY TO.
diff --git a/src/include/commands/copyto_internal.h b/src/include/commands/copyto_internal.h
new file mode 100644
index 00000000000..f95d8da8e3e
--- /dev/null
+++ b/src/include/commands/copyto_internal.h
@@ -0,0 +1,21 @@
+/*-------------------------------------------------------------------------
+ *
+ * copyto_internal.h
+ *      Internal definitions for COPY TO command.
+ *
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/commands/copyto_internal.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef COPYTO_INTERNAL_H
+#define COPYTO_INTERNAL_H
+
+#include "commands/copy.h"
+
+const struct CopyToRoutine *CopyToGetBuiltinRoutine(CopyFormatOptions *opts);
+
+#endif                            /* COPYTO_INTERNAL_H */
diff --git a/src/include/nodes/meson.build b/src/include/nodes/meson.build
index f3dd5461fef..09f7443195f 100644
--- a/src/include/nodes/meson.build
+++ b/src/include/nodes/meson.build
@@ -11,6 +11,7 @@ node_support_input_i = [
   'access/sdir.h',
   'access/tableam.h',
   'access/tsmapi.h',
+  'commands/copyapi.h',
   'commands/event_trigger.h',
   'commands/trigger.h',
   'executor/tuptable.h',
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index c0d3cf0e14b..33e3a49a4fb 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -15,6 +15,7 @@ SUBDIRS = \
           spgist_name_ops \
           test_bloomfilter \
           test_copy_callbacks \
+          test_copy_format \
           test_custom_rmgrs \
           test_ddl_deparse \
           test_dsa \
diff --git a/src/test/modules/meson.build b/src/test/modules/meson.build
index 4f544a042d4..bf25658793d 100644
--- a/src/test/modules/meson.build
+++ b/src/test/modules/meson.build
@@ -14,6 +14,7 @@ subdir('spgist_name_ops')
 subdir('ssl_passphrase_callback')
 subdir('test_bloomfilter')
 subdir('test_copy_callbacks')
+subdir('test_copy_format')
 subdir('test_custom_rmgrs')
 subdir('test_ddl_deparse')
 subdir('test_dsa')
diff --git a/src/test/modules/test_copy_format/.gitignore b/src/test/modules/test_copy_format/.gitignore
new file mode 100644
index 00000000000..5dcb3ff9723
--- /dev/null
+++ b/src/test/modules/test_copy_format/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/src/test/modules/test_copy_format/Makefile b/src/test/modules/test_copy_format/Makefile
new file mode 100644
index 00000000000..8497f91624d
--- /dev/null
+++ b/src/test/modules/test_copy_format/Makefile
@@ -0,0 +1,23 @@
+# src/test/modules/test_copy_format/Makefile
+
+MODULE_big = test_copy_format
+OBJS = \
+    $(WIN32RES) \
+    test_copy_format.o
+PGFILEDESC = "test_copy_format - test custom COPY FORMAT"
+
+EXTENSION = test_copy_format
+DATA = test_copy_format--1.0.sql
+
+REGRESS = test_copy_format
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_copy_format
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out
b/src/test/modules/test_copy_format/expected/test_copy_format.out
new file mode 100644
index 00000000000..adfe7d1572a
--- /dev/null
+++ b/src/test/modules/test_copy_format/expected/test_copy_format.out
@@ -0,0 +1,17 @@
+CREATE EXTENSION test_copy_format;
+CREATE TABLE public.test (a smallint, b integer, c bigint);
+INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
+ERROR:  COPY format "test_copy_format" not recognized
+LINE 1: COPY public.test FROM stdin WITH (FORMAT 'test_copy_format')...
+                                          ^
+COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
+NOTICE:  test_copy_format: is_from=false
+NOTICE:  CopyToOutFunc: atttypid=21
+NOTICE:  CopyToOutFunc: atttypid=23
+NOTICE:  CopyToOutFunc: atttypid=20
+NOTICE:  CopyToStart: natts=3
+NOTICE:  CopyToOneRow: tts_nvalid=3
+NOTICE:  CopyToOneRow: tts_nvalid=3
+NOTICE:  CopyToOneRow: tts_nvalid=3
+NOTICE:  CopyToEnd
diff --git a/src/test/modules/test_copy_format/meson.build b/src/test/modules/test_copy_format/meson.build
new file mode 100644
index 00000000000..4cefe7b709a
--- /dev/null
+++ b/src/test/modules/test_copy_format/meson.build
@@ -0,0 +1,33 @@
+# Copyright (c) 2024, PostgreSQL Global Development Group
+
+test_copy_format_sources = files(
+  'test_copy_format.c',
+)
+
+if host_system == 'windows'
+  test_copy_format_sources += rc_lib_gen.process(win32ver_rc, extra_args: [
+    '--NAME', 'test_copy_format',
+    '--FILEDESC', 'test_copy_format - test custom COPY FORMAT',])
+endif
+
+test_copy_format = shared_module('test_copy_format',
+  test_copy_format_sources,
+  kwargs: pg_test_mod_args,
+)
+test_install_libs += test_copy_format
+
+test_install_data += files(
+  'test_copy_format.control',
+  'test_copy_format--1.0.sql',
+)
+
+tests += {
+  'name': 'test_copy_format',
+  'sd': meson.current_source_dir(),
+  'bd': meson.current_build_dir(),
+  'regress': {
+    'sql': [
+      'test_copy_format',
+    ],
+  },
+}
diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql
b/src/test/modules/test_copy_format/sql/test_copy_format.sql
new file mode 100644
index 00000000000..810b3d8cedc
--- /dev/null
+++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql
@@ -0,0 +1,5 @@
+CREATE EXTENSION test_copy_format;
+CREATE TABLE public.test (a smallint, b integer, c bigint);
+INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
+COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
diff --git a/src/test/modules/test_copy_format/test_copy_format--1.0.sql
b/src/test/modules/test_copy_format/test_copy_format--1.0.sql
new file mode 100644
index 00000000000..d24ea03ce99
--- /dev/null
+++ b/src/test/modules/test_copy_format/test_copy_format--1.0.sql
@@ -0,0 +1,8 @@
+/* src/test/modules/test_copy_format/test_copy_format--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_copy_format" to load this file. \quit
+
+CREATE FUNCTION test_copy_format(internal)
+    RETURNS copy_handler
+    AS 'MODULE_PATHNAME' LANGUAGE C;
diff --git a/src/test/modules/test_copy_format/test_copy_format.c
b/src/test/modules/test_copy_format/test_copy_format.c
new file mode 100644
index 00000000000..e064f40473b
--- /dev/null
+++ b/src/test/modules/test_copy_format/test_copy_format.c
@@ -0,0 +1,63 @@
+/*--------------------------------------------------------------------------
+ *
+ * test_copy_format.c
+ *        Code for testing custom COPY format.
+ *
+ * Portions Copyright (c) 2024, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *        src/test/modules/test_copy_format/test_copy_format.c
+ *
+ * -------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "commands/copyapi.h"
+#include "commands/defrem.h"
+
+PG_MODULE_MAGIC;
+
+static void
+CopyToOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo)
+{
+    ereport(NOTICE, (errmsg("CopyToOutFunc: atttypid=%d", atttypid)));
+}
+
+static void
+CopyToStart(CopyToState cstate, TupleDesc tupDesc)
+{
+    ereport(NOTICE, (errmsg("CopyToStart: natts=%d", tupDesc->natts)));
+}
+
+static void
+CopyToOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+    ereport(NOTICE, (errmsg("CopyToOneRow: tts_nvalid=%u", slot->tts_nvalid)));
+}
+
+static void
+CopyToEnd(CopyToState cstate)
+{
+    ereport(NOTICE, (errmsg("CopyToEnd")));
+}
+
+static const CopyToRoutine CopyToRoutineTestCopyFormat = {
+    .type = T_CopyToRoutine,
+    .CopyToOutFunc = CopyToOutFunc,
+    .CopyToStart = CopyToStart,
+    .CopyToOneRow = CopyToOneRow,
+    .CopyToEnd = CopyToEnd,
+};
+
+PG_FUNCTION_INFO_V1(test_copy_format);
+Datum
+test_copy_format(PG_FUNCTION_ARGS)
+{
+    bool        is_from = PG_GETARG_BOOL(0);
+
+    ereport(NOTICE,
+            (errmsg("test_copy_format: is_from=%s", is_from ? "true" : "false")));
+
+    PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat);
+}
diff --git a/src/test/modules/test_copy_format/test_copy_format.control
b/src/test/modules/test_copy_format/test_copy_format.control
new file mode 100644
index 00000000000..f05a6362358
--- /dev/null
+++ b/src/test/modules/test_copy_format/test_copy_format.control
@@ -0,0 +1,4 @@
+comment = 'Test code for custom COPY format'
+default_version = '1.0'
+module_pathname = '$libdir/test_copy_format'
+relocatable = true
-- 
2.47.1
From d17d5dae6865de82997a8511fdc097b1c64ccd73 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Mon, 25 Nov 2024 13:58:33 +0900
Subject: [PATCH v29 4/9] Export CopyToStateData as private data
It's for custom COPY TO format handlers implemented as extension.
This just moves codes. This doesn't change codes except CopyDest enum
values. CopyDest/CopyFrom enum values such as COPY_FILE are conflicted
each other. So COPY_DEST_ prefix instead of COPY_ prefix is used for
CopyDest enum values. For example, COPY_FILE in CopyDest is renamed to
COPY_DEST_FILE.
Note that this isn't enough to implement custom COPY TO format
handlers as extension. We'll do the followings in a subsequent commit:
1. Add an opaque space for custom COPY TO format handler
2. Export CopySendEndOfRow() to flush buffer
---
 src/backend/commands/copyto.c          | 77 +++-----------------------
 src/include/commands/copy.h            |  2 +-
 src/include/commands/copyapi.h         |  2 -
 src/include/commands/copyto_internal.h | 64 +++++++++++++++++++++
 4 files changed, 73 insertions(+), 72 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index f7f44b368b7..91fa46ddf6f 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -36,67 +36,6 @@
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
-/*
- * Represents the different dest cases we need to worry about at
- * the bottom level
- */
-typedef enum CopyDest
-{
-    COPY_FILE,                    /* to file (or a piped program) */
-    COPY_FRONTEND,                /* to frontend */
-    COPY_CALLBACK,                /* to callback function */
-} CopyDest;
-
-/*
- * This struct contains all the state variables used throughout a COPY TO
- * operation.
- *
- * Multi-byte encodings: all supported client-side encodings encode multi-byte
- * characters by having the first byte's high bit set. Subsequent bytes of the
- * character can have the high bit not set. When scanning data in such an
- * encoding to look for a match to a single-byte (ie ASCII) character, we must
- * use the full pg_encoding_mblen() machinery to skip over multibyte
- * characters, else we might find a false match to a trailing byte. In
- * supported server encodings, there is no possibility of a false match, and
- * it's faster to make useless comparisons to trailing bytes than it is to
- * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true
- * when we have to do it the hard way.
- */
-typedef struct CopyToStateData
-{
-    /* format-specific routines */
-    const CopyToRoutine *routine;
-
-    /* low-level state data */
-    CopyDest    copy_dest;        /* type of copy source/destination */
-    FILE       *copy_file;        /* used if copy_dest == COPY_FILE */
-    StringInfo    fe_msgbuf;        /* used for all dests during COPY TO */
-
-    int            file_encoding;    /* file or remote side's character encoding */
-    bool        need_transcoding;    /* file encoding diff from server? */
-    bool        encoding_embeds_ascii;    /* ASCII can be non-first byte? */
-
-    /* parameters from the COPY command */
-    Relation    rel;            /* relation to copy to */
-    QueryDesc  *queryDesc;        /* executable query to copy from */
-    List       *attnumlist;        /* integer list of attnums to copy */
-    char       *filename;        /* filename, or NULL for STDOUT */
-    bool        is_program;        /* is 'filename' a program to popen? */
-    copy_data_dest_cb data_dest_cb; /* function for writing data */
-
-    CopyFormatOptions opts;
-    Node       *whereClause;    /* WHERE condition (or NULL) */
-
-    /*
-     * Working state
-     */
-    MemoryContext copycontext;    /* per-copy execution context */
-
-    FmgrInfo   *out_functions;    /* lookup info for output functions */
-    MemoryContext rowcontext;    /* per-row evaluation context */
-    uint64        bytes_processed;    /* number of bytes processed so far */
-} CopyToStateData;
-
 /* DestReceiver for COPY (query) TO */
 typedef struct
 {
@@ -406,7 +345,7 @@ SendCopyBegin(CopyToState cstate)
     for (i = 0; i < natts; i++)
         pq_sendint16(&buf, format); /* per-column formats */
     pq_endmessage(&buf);
-    cstate->copy_dest = COPY_FRONTEND;
+    cstate->copy_dest = COPY_DEST_FRONTEND;
 }
 
 static void
@@ -453,7 +392,7 @@ CopySendEndOfRow(CopyToState cstate)
 
     switch (cstate->copy_dest)
     {
-        case COPY_FILE:
+        case COPY_DEST_FILE:
             if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1,
                        cstate->copy_file) != 1 ||
                 ferror(cstate->copy_file))
@@ -487,11 +426,11 @@ CopySendEndOfRow(CopyToState cstate)
                              errmsg("could not write to COPY file: %m")));
             }
             break;
-        case COPY_FRONTEND:
+        case COPY_DEST_FRONTEND:
             /* Dump the accumulated row as one CopyData message */
             (void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len);
             break;
-        case COPY_CALLBACK:
+        case COPY_DEST_CALLBACK:
             cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len);
             break;
     }
@@ -512,7 +451,7 @@ CopySendTextLikeEndOfRow(CopyToState cstate)
 {
     switch (cstate->copy_dest)
     {
-        case COPY_FILE:
+        case COPY_DEST_FILE:
             /* Default line termination depends on platform */
 #ifndef WIN32
             CopySendChar(cstate, '\n');
@@ -520,7 +459,7 @@ CopySendTextLikeEndOfRow(CopyToState cstate)
             CopySendString(cstate, "\r\n");
 #endif
             break;
-        case COPY_FRONTEND:
+        case COPY_DEST_FRONTEND:
             /* The FE/BE protocol uses \n as newline for all platforms */
             CopySendChar(cstate, '\n');
             break;
@@ -904,12 +843,12 @@ BeginCopyTo(ParseState *pstate,
     /* See Multibyte encoding comment above */
     cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding);
 
-    cstate->copy_dest = COPY_FILE;    /* default */
+    cstate->copy_dest = COPY_DEST_FILE; /* default */
 
     if (data_dest_cb)
     {
         progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK;
-        cstate->copy_dest = COPY_CALLBACK;
+        cstate->copy_dest = COPY_DEST_CALLBACK;
         cstate->data_dest_cb = data_dest_cb;
     }
     else if (pipe)
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 2a90b39b6f6..ef3dc02c56a 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -90,7 +90,7 @@ typedef struct CopyFormatOptions
     Node       *routine;        /* CopyToRoutine */
 } CopyFormatOptions;
 
-/* These are private in commands/copy[from|to].c */
+/* These are private in commands/copy[from|to]_internal.h */
 typedef struct CopyFromStateData *CopyFromState;
 typedef struct CopyToStateData *CopyToState;
 
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 12e4b1d47a7..5d071b378d6 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -15,8 +15,6 @@
 #define COPYAPI_H
 
 #include "commands/copyto_internal.h"
-#include "executor/tuptable.h"
-#include "nodes/execnodes.h"
 
 /*
  * API structure for a COPY TO format implementation. Note this must be
diff --git a/src/include/commands/copyto_internal.h b/src/include/commands/copyto_internal.h
index f95d8da8e3e..2df53dda8a0 100644
--- a/src/include/commands/copyto_internal.h
+++ b/src/include/commands/copyto_internal.h
@@ -15,6 +15,70 @@
 #define COPYTO_INTERNAL_H
 
 #include "commands/copy.h"
+#include "executor/execdesc.h"
+#include "executor/tuptable.h"
+#include "nodes/execnodes.h"
+
+/*
+ * Represents the different dest cases we need to worry about at
+ * the bottom level
+ */
+typedef enum CopyDest
+{
+    COPY_DEST_FILE,                /* to file (or a piped program) */
+    COPY_DEST_FRONTEND,            /* to frontend */
+    COPY_DEST_CALLBACK,            /* to callback function */
+} CopyDest;
+
+/*
+ * This struct contains all the state variables used throughout a COPY TO
+ * operation.
+ *
+ * Multi-byte encodings: all supported client-side encodings encode multi-byte
+ * characters by having the first byte's high bit set. Subsequent bytes of the
+ * character can have the high bit not set. When scanning data in such an
+ * encoding to look for a match to a single-byte (ie ASCII) character, we must
+ * use the full pg_encoding_mblen() machinery to skip over multibyte
+ * characters, else we might find a false match to a trailing byte. In
+ * supported server encodings, there is no possibility of a false match, and
+ * it's faster to make useless comparisons to trailing bytes than it is to
+ * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true
+ * when we have to do it the hard way.
+ */
+typedef struct CopyToStateData
+{
+    /* format-specific routines */
+    const struct CopyToRoutine *routine;
+
+    /* low-level state data */
+    CopyDest    copy_dest;        /* type of copy source/destination */
+    FILE       *copy_file;        /* used if copy_dest == COPY_FILE */
+    StringInfo    fe_msgbuf;        /* used for all dests during COPY TO */
+
+    int            file_encoding;    /* file or remote side's character encoding */
+    bool        need_transcoding;    /* file encoding diff from server? */
+    bool        encoding_embeds_ascii;    /* ASCII can be non-first byte? */
+
+    /* parameters from the COPY command */
+    Relation    rel;            /* relation to copy to */
+    QueryDesc  *queryDesc;        /* executable query to copy from */
+    List       *attnumlist;        /* integer list of attnums to copy */
+    char       *filename;        /* filename, or NULL for STDOUT */
+    bool        is_program;        /* is 'filename' a program to popen? */
+    copy_data_dest_cb data_dest_cb; /* function for writing data */
+
+    CopyFormatOptions opts;
+    Node       *whereClause;    /* WHERE condition (or NULL) */
+
+    /*
+     * Working state
+     */
+    MemoryContext copycontext;    /* per-copy execution context */
+
+    FmgrInfo   *out_functions;    /* lookup info for output functions */
+    MemoryContext rowcontext;    /* per-row evaluation context */
+    uint64        bytes_processed;    /* number of bytes processed so far */
+} CopyToStateData;
 
 const struct CopyToRoutine *CopyToGetBuiltinRoutine(CopyFormatOptions *opts);
 
-- 
2.47.1
From 6cc082a9398998aa37dbc57568ffa784c9cd7625 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Mon, 25 Nov 2024 14:01:18 +0900
Subject: [PATCH v29 5/9] Add support for implementing custom COPY TO format as
 extension
* Add CopyToStateData::opaque that can be used to keep data for custom
  COPY TO format implementation
* Export CopySendEndOfRow() to flush data in CopyToStateData::fe_msgbuf
  as CopyToStateFlush()
---
 src/backend/commands/copyto.c          | 12 ++++++++++++
 src/include/commands/copyapi.h         |  2 ++
 src/include/commands/copyto_internal.h |  3 +++
 3 files changed, 17 insertions(+)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 91fa46ddf6f..da281f32950 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -442,6 +442,18 @@ CopySendEndOfRow(CopyToState cstate)
     resetStringInfo(fe_msgbuf);
 }
 
+/*
+ * Export CopySendEndOfRow() for extensions. We want to keep
+ * CopySendEndOfRow() as a static function for
+ * optimization. CopySendEndOfRow() calls in this file may be optimized by a
+ * compiler.
+ */
+void
+CopyToStateFlush(CopyToState cstate)
+{
+    CopySendEndOfRow(cstate);
+}
+
 /*
  * Wrapper function of CopySendEndOfRow for text and CSV formats. Sends the
  * the line termination and do common appropriate things for the end of row.
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 5d071b378d6..f8167af4c79 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -54,6 +54,8 @@ typedef struct CopyToRoutine
     void        (*CopyToEnd) (CopyToState cstate);
 } CopyToRoutine;
 
+extern void CopyToStateFlush(CopyToState cstate);
+
 /*
  * API structure for a COPY FROM format implementation.     Note this must be
  * allocated in a server-lifetime manner, typically as a static const struct.
diff --git a/src/include/commands/copyto_internal.h b/src/include/commands/copyto_internal.h
index 2df53dda8a0..4b82372691e 100644
--- a/src/include/commands/copyto_internal.h
+++ b/src/include/commands/copyto_internal.h
@@ -78,6 +78,9 @@ typedef struct CopyToStateData
     FmgrInfo   *out_functions;    /* lookup info for output functions */
     MemoryContext rowcontext;    /* per-row evaluation context */
     uint64        bytes_processed;    /* number of bytes processed so far */
+
+    /* For custom format implementation */
+    void       *opaque;            /* private space */
 } CopyToStateData;
 
 const struct CopyToRoutine *CopyToGetBuiltinRoutine(CopyFormatOptions *opts);
-- 
2.47.1
From 8bab77b24f3795ea7c1f5ee16860348998355ccb Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Mon, 25 Nov 2024 14:11:55 +0900
Subject: [PATCH v29 6/9] Add support for adding custom COPY FROM format
This uses the same handler for COPY TO and COPY FROM but uses
different routine. This uses CopyToRoutine for COPY TO and
CopyFromRoutine for COPY FROM. PostgreSQL calls a COPY TO/FROM handler
with "is_from" argument. It's true for COPY FROM and false for COPY
TO:
    copy_handler(true) returns CopyToRoutine
    copy_handler(false) returns CopyFromRoutine
This also add a test module for custom COPY FROM handler.
---
 src/backend/commands/copy.c                   | 60 ++++++++++++-------
 src/backend/commands/copyfrom.c               | 23 +++----
 src/backend/commands/copyfromparse.c          |  2 +-
 src/include/catalog/pg_type.dat               |  2 +-
 src/include/commands/copy.h                   |  2 +-
 src/include/commands/copyapi.h                |  3 +
 src/include/commands/copyfrom_internal.h      |  6 +-
 .../expected/test_copy_format.out             | 10 +++-
 .../test_copy_format/sql/test_copy_format.sql |  1 +
 .../test_copy_format/test_copy_format.c       | 39 +++++++++++-
 10 files changed, 107 insertions(+), 41 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 9500156b163..10f80ef3654 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -483,8 +483,8 @@ defGetCopyLogVerbosityChoice(DefElem *def, ParseState *pstate)
  * This function checks whether the option value is a built-in format such as
  * "text" and "csv" or not. If the option value isn't a built-in format, this
  * function finds a COPY format handler that returns a CopyToRoutine (for
- * is_from == false). If no COPY format handler is found, this function
- * reports an error.
+ * is_from == false) or CopyFromRountine (for is_from == true). If no COPY
+ * format handler is found, this function reports an error.
  */
 static void
 ProcessCopyOptionFormat(ParseState *pstate,
@@ -515,18 +515,17 @@ ProcessCopyOptionFormat(ParseState *pstate,
         isBuiltin = false;
     if (isBuiltin)
     {
-        if (!is_from)
+        if (is_from)
+            opts_out->routine = (Node *) CopyFromGetBuiltinRoutine(opts_out);
+        else
             opts_out->routine = (Node *) CopyToGetBuiltinRoutine(opts_out);
         return;
     }
 
     /* custom format */
-    if (!is_from)
-    {
-        funcargtypes[0] = INTERNALOID;
-        handlerOid = LookupFuncName(list_make1(makeString(format)), 1,
-                                    funcargtypes, true);
-    }
+    funcargtypes[0] = INTERNALOID;
+    handlerOid = LookupFuncName(list_make1(makeString(format)), 1,
+                                funcargtypes, true);
     if (!OidIsValid(handlerOid))
         ereport(ERROR,
                 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -535,17 +534,34 @@ ProcessCopyOptionFormat(ParseState *pstate,
 
     datum = OidFunctionCall1(handlerOid, BoolGetDatum(is_from));
     routine = (Node *) DatumGetPointer(datum);
-    if (routine == NULL || !IsA(routine, CopyToRoutine))
-        ereport(
-                ERROR,
-                (errcode(
-                         ERRCODE_INVALID_PARAMETER_VALUE),
-                 errmsg("COPY handler function "
-                        "%s(%u) did not return a "
-                        "CopyToRoutine struct",
-                        format, handlerOid),
-                 parser_errposition(
-                                    pstate, defel->location)));
+    if (is_from)
+    {
+        if (routine == NULL || !IsA(routine, CopyFromRoutine))
+            ereport(
+                    ERROR,
+                    (errcode(
+                             ERRCODE_INVALID_PARAMETER_VALUE),
+                     errmsg("COPY handler function "
+                            "%s(%u) did not return a "
+                            "CopyFromRoutine struct",
+                            format, handlerOid),
+                     parser_errposition(
+                                        pstate, defel->location)));
+    }
+    else
+    {
+        if (routine == NULL || !IsA(routine, CopyToRoutine))
+            ereport(
+                    ERROR,
+                    (errcode(
+                             ERRCODE_INVALID_PARAMETER_VALUE),
+                     errmsg("COPY handler function "
+                            "%s(%u) did not return a "
+                            "CopyToRoutine struct",
+                            format, handlerOid),
+                     parser_errposition(
+                                        pstate, defel->location)));
+    }
 
     opts_out->routine = routine;
 }
@@ -750,7 +766,9 @@ ProcessCopyOptions(ParseState *pstate,
     /* If format option isn't specified, we use a built-in routine. */
     if (!format_specified)
     {
-        if (!is_from)
+        if (is_from)
+            opts_out->routine = (Node *) CopyFromGetBuiltinRoutine(opts_out);
+        else
             opts_out->routine = (Node *) CopyToGetBuiltinRoutine(opts_out);
     }
 
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 917fa6605ef..23027a664ec 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -28,8 +28,7 @@
 #include "access/tableam.h"
 #include "access/xact.h"
 #include "catalog/namespace.h"
-#include "commands/copy.h"
-#include "commands/copyfrom_internal.h"
+#include "commands/copyapi.h"
 #include "commands/progress.h"
 #include "commands/trigger.h"
 #include "executor/execPartition.h"
@@ -129,6 +128,7 @@ static void CopyFromBinaryEnd(CopyFromState cstate);
 
 /* text format */
 static const CopyFromRoutine CopyFromRoutineText = {
+    .type = T_CopyFromRoutine,
     .CopyFromInFunc = CopyFromTextLikeInFunc,
     .CopyFromStart = CopyFromTextLikeStart,
     .CopyFromOneRow = CopyFromTextOneRow,
@@ -137,6 +137,7 @@ static const CopyFromRoutine CopyFromRoutineText = {
 
 /* CSV format */
 static const CopyFromRoutine CopyFromRoutineCSV = {
+    .type = T_CopyFromRoutine,
     .CopyFromInFunc = CopyFromTextLikeInFunc,
     .CopyFromStart = CopyFromTextLikeStart,
     .CopyFromOneRow = CopyFromCSVOneRow,
@@ -145,23 +146,23 @@ static const CopyFromRoutine CopyFromRoutineCSV = {
 
 /* binary format */
 static const CopyFromRoutine CopyFromRoutineBinary = {
+    .type = T_CopyFromRoutine,
     .CopyFromInFunc = CopyFromBinaryInFunc,
     .CopyFromStart = CopyFromBinaryStart,
     .CopyFromOneRow = CopyFromBinaryOneRow,
     .CopyFromEnd = CopyFromBinaryEnd,
 };
 
-/* Return a COPY FROM routine for the given options */
-static const CopyFromRoutine *
-CopyFromGetRoutine(CopyFormatOptions opts)
+/* Return a built-in COPY FROM routine for the given options */
+const CopyFromRoutine *
+CopyFromGetBuiltinRoutine(CopyFormatOptions *opts)
 {
-    if (opts.csv_mode)
+    if (opts->csv_mode)
         return &CopyFromRoutineCSV;
-    else if (opts.binary)
+    else if (opts->binary)
         return &CopyFromRoutineBinary;
-
-    /* default is text */
-    return &CopyFromRoutineText;
+    else
+        return &CopyFromRoutineText;
 }
 
 /* Implementation of the start callback for text and CSV formats */
@@ -1567,7 +1568,7 @@ BeginCopyFrom(ParseState *pstate,
     ProcessCopyOptions(pstate, &cstate->opts, true /* is_from */ , options);
 
     /* Set the format routine */
-    cstate->routine = CopyFromGetRoutine(cstate->opts);
+    cstate->routine = (const CopyFromRoutine *) cstate->opts.routine;
 
     /* Process the target relation */
     cstate->rel = rel;
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 65f20d332ee..4e6683eb9da 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -62,7 +62,7 @@
 #include <unistd.h>
 #include <sys/stat.h>
 
-#include "commands/copyfrom_internal.h"
+#include "commands/copyapi.h"
 #include "commands/progress.h"
 #include "executor/executor.h"
 #include "libpq/libpq.h"
diff --git a/src/include/catalog/pg_type.dat b/src/include/catalog/pg_type.dat
index 340e0cd0a8d..63b7d65f982 100644
--- a/src/include/catalog/pg_type.dat
+++ b/src/include/catalog/pg_type.dat
@@ -634,7 +634,7 @@
   typoutput => 'tsm_handler_out', typreceive => '-', typsend => '-',
   typalign => 'i' },
 { oid => '8752',
-  descr => 'pseudo-type for the result of a copy to method function',
+  descr => 'pseudo-type for the result of a copy to/from method function',
   typname => 'copy_handler', typlen => '4', typbyval => 't', typtype => 'p',
   typcategory => 'P', typinput => 'copy_handler_in',
   typoutput => 'copy_handler_out', typreceive => '-', typsend => '-',
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index ef3dc02c56a..586d6c0fe2e 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -87,7 +87,7 @@ typedef struct CopyFormatOptions
     CopyLogVerbosityChoice log_verbosity;    /* verbosity of logged messages */
     int64        reject_limit;    /* maximum tolerable number of errors */
     List       *convert_select; /* list of column names (can be NIL) */
-    Node       *routine;        /* CopyToRoutine */
+    Node       *routine;        /* CopyToRoutine or CopyFromRoutine */
 } CopyFormatOptions;
 
 /* These are private in commands/copy[from|to]_internal.h */
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index f8167af4c79..bf933069fea 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -15,6 +15,7 @@
 #define COPYAPI_H
 
 #include "commands/copyto_internal.h"
+#include "commands/copyfrom_internal.h"
 
 /*
  * API structure for a COPY TO format implementation. Note this must be
@@ -62,6 +63,8 @@ extern void CopyToStateFlush(CopyToState cstate);
  */
 typedef struct CopyFromRoutine
 {
+    NodeTag        type;
+
     /*
      * Set input function information. This callback is called once at the
      * beginning of COPY FROM.
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index e1affe3dfa7..9b3b8336b67 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -14,7 +14,7 @@
 #ifndef COPYFROM_INTERNAL_H
 #define COPYFROM_INTERNAL_H
 
-#include "commands/copyapi.h"
+#include "commands/copy.h"
 #include "commands/trigger.h"
 #include "nodes/miscnodes.h"
 
@@ -59,7 +59,7 @@ typedef enum CopyInsertMethod
 typedef struct CopyFromStateData
 {
     /* format routine */
-    const CopyFromRoutine *routine;
+    const struct CopyFromRoutine *routine;
 
     /* low-level state data */
     CopySource    copy_src;        /* type of copy source */
@@ -194,4 +194,6 @@ extern bool CopyFromCSVOneRow(CopyFromState cstate, ExprContext *econtext,
 extern bool CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext,
                                  Datum *values, bool *nulls);
 
+const struct CopyFromRoutine *CopyFromGetBuiltinRoutine(CopyFormatOptions *opts);
+
 #endif                            /* COPYFROM_INTERNAL_H */
diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out
b/src/test/modules/test_copy_format/expected/test_copy_format.out
index adfe7d1572a..016893e7026 100644
--- a/src/test/modules/test_copy_format/expected/test_copy_format.out
+++ b/src/test/modules/test_copy_format/expected/test_copy_format.out
@@ -2,9 +2,13 @@ CREATE EXTENSION test_copy_format;
 CREATE TABLE public.test (a smallint, b integer, c bigint);
 INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
 COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
-ERROR:  COPY format "test_copy_format" not recognized
-LINE 1: COPY public.test FROM stdin WITH (FORMAT 'test_copy_format')...
-                                          ^
+NOTICE:  test_copy_format: is_from=true
+NOTICE:  CopyFromInFunc: atttypid=21
+NOTICE:  CopyFromInFunc: atttypid=23
+NOTICE:  CopyFromInFunc: atttypid=20
+NOTICE:  CopyFromStart: natts=3
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromEnd
 COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
 NOTICE:  test_copy_format: is_from=false
 NOTICE:  CopyToOutFunc: atttypid=21
diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql
b/src/test/modules/test_copy_format/sql/test_copy_format.sql
index 810b3d8cedc..0dfdfa00080 100644
--- a/src/test/modules/test_copy_format/sql/test_copy_format.sql
+++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql
@@ -2,4 +2,5 @@ CREATE EXTENSION test_copy_format;
 CREATE TABLE public.test (a smallint, b integer, c bigint);
 INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
 COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
+\.
 COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
diff --git a/src/test/modules/test_copy_format/test_copy_format.c
b/src/test/modules/test_copy_format/test_copy_format.c
index e064f40473b..f6b105659ab 100644
--- a/src/test/modules/test_copy_format/test_copy_format.c
+++ b/src/test/modules/test_copy_format/test_copy_format.c
@@ -18,6 +18,40 @@
 
 PG_MODULE_MAGIC;
 
+static void
+CopyFromInFunc(CopyFromState cstate, Oid atttypid,
+               FmgrInfo *finfo, Oid *typioparam)
+{
+    ereport(NOTICE, (errmsg("CopyFromInFunc: atttypid=%d", atttypid)));
+}
+
+static void
+CopyFromStart(CopyFromState cstate, TupleDesc tupDesc)
+{
+    ereport(NOTICE, (errmsg("CopyFromStart: natts=%d", tupDesc->natts)));
+}
+
+static bool
+CopyFromOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls)
+{
+    ereport(NOTICE, (errmsg("CopyFromOneRow")));
+    return false;
+}
+
+static void
+CopyFromEnd(CopyFromState cstate)
+{
+    ereport(NOTICE, (errmsg("CopyFromEnd")));
+}
+
+static const CopyFromRoutine CopyFromRoutineTestCopyFormat = {
+    .type = T_CopyFromRoutine,
+    .CopyFromInFunc = CopyFromInFunc,
+    .CopyFromStart = CopyFromStart,
+    .CopyFromOneRow = CopyFromOneRow,
+    .CopyFromEnd = CopyFromEnd,
+};
+
 static void
 CopyToOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo)
 {
@@ -59,5 +93,8 @@ test_copy_format(PG_FUNCTION_ARGS)
     ereport(NOTICE,
             (errmsg("test_copy_format: is_from=%s", is_from ? "true" : "false")));
 
-    PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat);
+    if (is_from)
+        PG_RETURN_POINTER(&CopyFromRoutineTestCopyFormat);
+    else
+        PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat);
 }
-- 
2.47.1
From b96cc379092fd7e177fa8d65aa56796c1b7332be Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Mon, 25 Nov 2024 14:19:34 +0900
Subject: [PATCH v29 7/9] Use COPY_SOURCE_ prefix for CopySource enum values
This is for consistency with CopyDest.
---
 src/backend/commands/copyfrom.c          |  4 ++--
 src/backend/commands/copyfromparse.c     | 10 +++++-----
 src/include/commands/copyfrom_internal.h |  6 +++---
 3 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 23027a664ec..3f6b0031d94 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1703,7 +1703,7 @@ BeginCopyFrom(ParseState *pstate,
                             pg_encoding_to_char(GetDatabaseEncoding()))));
     }
 
-    cstate->copy_src = COPY_FILE;    /* default */
+    cstate->copy_src = COPY_SOURCE_FILE;    /* default */
 
     cstate->whereClause = whereClause;
 
@@ -1831,7 +1831,7 @@ BeginCopyFrom(ParseState *pstate,
     if (data_source_cb)
     {
         progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK;
-        cstate->copy_src = COPY_CALLBACK;
+        cstate->copy_src = COPY_SOURCE_CALLBACK;
         cstate->data_source_cb = data_source_cb;
     }
     else if (pipe)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 4e6683eb9da..f7982bf692f 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -170,7 +170,7 @@ ReceiveCopyBegin(CopyFromState cstate)
     for (i = 0; i < natts; i++)
         pq_sendint16(&buf, format); /* per-column formats */
     pq_endmessage(&buf);
-    cstate->copy_src = COPY_FRONTEND;
+    cstate->copy_src = COPY_SOURCE_FRONTEND;
     cstate->fe_msgbuf = makeStringInfo();
     /* We *must* flush here to ensure FE knows it can send. */
     pq_flush();
@@ -238,7 +238,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
 
     switch (cstate->copy_src)
     {
-        case COPY_FILE:
+        case COPY_SOURCE_FILE:
             bytesread = fread(databuf, 1, maxread, cstate->copy_file);
             if (ferror(cstate->copy_file))
                 ereport(ERROR,
@@ -247,7 +247,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
             if (bytesread == 0)
                 cstate->raw_reached_eof = true;
             break;
-        case COPY_FRONTEND:
+        case COPY_SOURCE_FRONTEND:
             while (maxread > 0 && bytesread < minread && !cstate->raw_reached_eof)
             {
                 int            avail;
@@ -330,7 +330,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
                 bytesread += avail;
             }
             break;
-        case COPY_CALLBACK:
+        case COPY_SOURCE_CALLBACK:
             bytesread = cstate->data_source_cb(databuf, minread, maxread);
             break;
     }
@@ -1158,7 +1158,7 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
          * after \. up to the protocol end of copy data.  (XXX maybe better
          * not to treat \. as special?)
          */
-        if (cstate->copy_src == COPY_FRONTEND)
+        if (cstate->copy_src == COPY_SOURCE_FRONTEND)
         {
             int            inbytes;
 
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 9b3b8336b67..3743b11faa4 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -24,9 +24,9 @@
  */
 typedef enum CopySource
 {
-    COPY_FILE,                    /* from file (or a piped program) */
-    COPY_FRONTEND,                /* from frontend */
-    COPY_CALLBACK,                /* from callback function */
+    COPY_SOURCE_FILE,            /* from file (or a piped program) */
+    COPY_SOURCE_FRONTEND,        /* from frontend */
+    COPY_SOURCE_CALLBACK,        /* from callback function */
 } CopySource;
 
 /*
-- 
2.47.1
From b52208e7f5292bcff38c353fbf1bba48a1f429d8 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Mon, 25 Nov 2024 14:21:39 +0900
Subject: [PATCH v29 8/9] Add support for implementing custom COPY FROM format
 as extension
* Add CopyFromStateData::opaque that can be used to keep data for
  custom COPY From format implementation
* Export CopyGetData() to get the next data as
  CopyFromStateGetData()
---
 src/backend/commands/copyfromparse.c     | 11 +++++++++++
 src/include/commands/copyapi.h           |  2 ++
 src/include/commands/copyfrom_internal.h |  3 +++
 3 files changed, 16 insertions(+)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index f7982bf692f..650b6b2382b 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -729,6 +729,17 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes)
     return copied_bytes;
 }
 
+/*
+ * Export CopyGetData() for extensions. We want to keep CopyGetData() as a
+ * static function for optimization. CopyGetData() calls in this file may be
+ * optimized by a compiler.
+ */
+int
+CopyFromStateGetData(CopyFromState cstate, void *dest, int minread, int maxread)
+{
+    return CopyGetData(cstate, dest, minread, maxread);
+}
+
 /*
  * Read raw fields in the next line for COPY FROM in text or csv mode.
  * Return false if no more lines.
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index bf933069fea..d1a1dbeb178 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -105,4 +105,6 @@ typedef struct CopyFromRoutine
     void        (*CopyFromEnd) (CopyFromState cstate);
 } CopyFromRoutine;
 
+extern int    CopyFromStateGetData(CopyFromState cstate, void *dest, int minread, int maxread);
+
 #endif                            /* COPYAPI_H */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 3743b11faa4..a65bbbc962e 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -181,6 +181,9 @@ typedef struct CopyFromStateData
 #define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index)
 
     uint64        bytes_processed;    /* number of bytes processed so far */
+
+    /* For custom format implementation */
+    void       *opaque;            /* private space */
 } CopyFromStateData;
 
 extern void ReceiveCopyBegin(CopyFromState cstate);
-- 
2.47.1
From 7496b8bcceb5434a7005fbdf2ecea485f82b9fde Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Wed, 27 Nov 2024 16:23:55 +0900
Subject: [PATCH v29 9/9] Add CopyFromSkipErrorRow() for custom COPY format
 extension
Extensions must call CopyFromSkipErrorRow() when CopyFromOneRow
callback reports an error by errsave(). CopyFromSkipErrorRow() handles
"ON_ERROR stop" and "LOG_VERBOSITY verbose" cases.
---
 src/backend/commands/copyfromparse.c          | 82 +++++++++++--------
 src/include/commands/copyapi.h                |  2 +
 .../expected/test_copy_format.out             | 47 +++++++++++
 .../test_copy_format/sql/test_copy_format.sql | 24 ++++++
 .../test_copy_format/test_copy_format.c       | 82 ++++++++++++++++++-
 5 files changed, 199 insertions(+), 38 deletions(-)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 650b6b2382b..b016f43a711 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -851,6 +851,51 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields, bool i
     return true;
 }
 
+/*
+ * Call this when you report an error by errsave() in your CopyFromOneRow
+ * callback. This handles "ON_ERROR stop" and "LOG_VERBOSITY verbose" cases
+ * for you.
+ */
+void
+CopyFromSkipErrorRow(CopyFromState cstate)
+{
+    Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
+
+    cstate->num_errors++;
+
+    if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+    {
+        /*
+         * Since we emit line number and column info in the below notice
+         * message, we suppress error context information other than the
+         * relation name.
+         */
+        Assert(!cstate->relname_only);
+        cstate->relname_only = true;
+
+        if (cstate->cur_attval)
+        {
+            char       *attval;
+
+            attval = CopyLimitPrintoutLength(cstate->cur_attval);
+            ereport(NOTICE,
+                    errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": \"%s\"",
+                           (unsigned long long) cstate->cur_lineno,
+                           cstate->cur_attname,
+                           attval));
+            pfree(attval);
+        }
+        else
+            ereport(NOTICE,
+                    errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": null
input",
+                           (unsigned long long) cstate->cur_lineno,
+                           cstate->cur_attname));
+
+        /* reset relname_only */
+        cstate->relname_only = false;
+    }
+}
+
 /*
  * Workhorse for CopyFromTextOneRow() and CopyFromCSVOneRow().
  *
@@ -959,42 +1004,7 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
                                         (Node *) cstate->escontext,
                                         &values[m]))
         {
-            Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
-
-            cstate->num_errors++;
-
-            if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
-            {
-                /*
-                 * Since we emit line number and column info in the below
-                 * notice message, we suppress error context information other
-                 * than the relation name.
-                 */
-                Assert(!cstate->relname_only);
-                cstate->relname_only = true;
-
-                if (cstate->cur_attval)
-                {
-                    char       *attval;
-
-                    attval = CopyLimitPrintoutLength(cstate->cur_attval);
-                    ereport(NOTICE,
-                            errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\":
\"%s\"",
-                                   (unsigned long long) cstate->cur_lineno,
-                                   cstate->cur_attname,
-                                   attval));
-                    pfree(attval);
-                }
-                else
-                    ereport(NOTICE,
-                            errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": null
input",
-                                   (unsigned long long) cstate->cur_lineno,
-                                   cstate->cur_attname));
-
-                /* reset relname_only */
-                cstate->relname_only = false;
-            }
-
+            CopyFromSkipErrorRow(cstate);
             return true;
         }
 
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index d1a1dbeb178..389f887b2c1 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -107,4 +107,6 @@ typedef struct CopyFromRoutine
 
 extern int    CopyFromStateGetData(CopyFromState cstate, void *dest, int minread, int maxread);
 
+extern void CopyFromSkipErrorRow(CopyFromState cstate);
+
 #endif                            /* COPYAPI_H */
diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out
b/src/test/modules/test_copy_format/expected/test_copy_format.out
index 016893e7026..b9a6baa85c0 100644
--- a/src/test/modules/test_copy_format/expected/test_copy_format.out
+++ b/src/test/modules/test_copy_format/expected/test_copy_format.out
@@ -1,6 +1,8 @@
 CREATE EXTENSION test_copy_format;
 CREATE TABLE public.test (a smallint, b integer, c bigint);
 INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+-- 987 is accepted.
+-- 654 is a hard error because ON_ERROR is stop by default.
 COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
 NOTICE:  test_copy_format: is_from=true
 NOTICE:  CopyFromInFunc: atttypid=21
@@ -8,7 +10,50 @@ NOTICE:  CopyFromInFunc: atttypid=23
 NOTICE:  CopyFromInFunc: atttypid=20
 NOTICE:  CopyFromStart: natts=3
 NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromOneRow
+ERROR:  invalid value: "6"
+CONTEXT:  COPY test, line 2, column a: "6"
+-- 987 is accepted.
+-- 654 is a soft error because ON_ERROR is ignore.
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore);
+NOTICE:  test_copy_format: is_from=true
+NOTICE:  CopyFromInFunc: atttypid=21
+NOTICE:  CopyFromInFunc: atttypid=23
+NOTICE:  CopyFromInFunc: atttypid=20
+NOTICE:  CopyFromStart: natts=3
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromOneRow
+NOTICE:  1 row was skipped due to data type incompatibility
 NOTICE:  CopyFromEnd
+-- 987 is accepted.
+-- 654 is a soft error because ON_ERROR is ignore.
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore, LOG_VERBOSITY verbose);
+NOTICE:  test_copy_format: is_from=true
+NOTICE:  CopyFromInFunc: atttypid=21
+NOTICE:  CopyFromInFunc: atttypid=23
+NOTICE:  CopyFromInFunc: atttypid=20
+NOTICE:  CopyFromStart: natts=3
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromOneRow
+NOTICE:  skipping row due to data type incompatibility at line 2 for column "a": "6"
+NOTICE:  CopyFromOneRow
+NOTICE:  1 row was skipped due to data type incompatibility
+NOTICE:  CopyFromEnd
+-- 987 is accepted.
+-- 654 is a soft error because ON_ERROR is ignore.
+-- 321 is a hard error.
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore);
+NOTICE:  test_copy_format: is_from=true
+NOTICE:  CopyFromInFunc: atttypid=21
+NOTICE:  CopyFromInFunc: atttypid=23
+NOTICE:  CopyFromInFunc: atttypid=20
+NOTICE:  CopyFromStart: natts=3
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromOneRow
+ERROR:  too much lines: 3
+CONTEXT:  COPY test, line 3
 COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
 NOTICE:  test_copy_format: is_from=false
 NOTICE:  CopyToOutFunc: atttypid=21
@@ -18,4 +63,6 @@ NOTICE:  CopyToStart: natts=3
 NOTICE:  CopyToOneRow: tts_nvalid=3
 NOTICE:  CopyToOneRow: tts_nvalid=3
 NOTICE:  CopyToOneRow: tts_nvalid=3
+NOTICE:  CopyToOneRow: tts_nvalid=3
+NOTICE:  CopyToOneRow: tts_nvalid=3
 NOTICE:  CopyToEnd
diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql
b/src/test/modules/test_copy_format/sql/test_copy_format.sql
index 0dfdfa00080..86db71bce7f 100644
--- a/src/test/modules/test_copy_format/sql/test_copy_format.sql
+++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql
@@ -1,6 +1,30 @@
 CREATE EXTENSION test_copy_format;
 CREATE TABLE public.test (a smallint, b integer, c bigint);
 INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+-- 987 is accepted.
+-- 654 is a hard error because ON_ERROR is stop by default.
 COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
+987
+654
+\.
+-- 987 is accepted.
+-- 654 is a soft error because ON_ERROR is ignore.
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore);
+987
+654
+\.
+-- 987 is accepted.
+-- 654 is a soft error because ON_ERROR is ignore.
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore, LOG_VERBOSITY verbose);
+987
+654
+\.
+-- 987 is accepted.
+-- 654 is a soft error because ON_ERROR is ignore.
+-- 321 is a hard error.
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore);
+987
+654
+321
 \.
 COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
diff --git a/src/test/modules/test_copy_format/test_copy_format.c
b/src/test/modules/test_copy_format/test_copy_format.c
index f6b105659ab..f0f53838aef 100644
--- a/src/test/modules/test_copy_format/test_copy_format.c
+++ b/src/test/modules/test_copy_format/test_copy_format.c
@@ -32,10 +32,88 @@ CopyFromStart(CopyFromState cstate, TupleDesc tupDesc)
 }
 
 static bool
-CopyFromOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls)
+CopyFromOneRow(CopyFromState cstate, ExprContext *econtext,
+               Datum *values, bool *nulls)
 {
+    int            n_attributes = list_length(cstate->attnumlist);
+    char       *line;
+    int            line_size = n_attributes + 1;    /* +1 is for new line */
+    int            read_bytes;
+
     ereport(NOTICE, (errmsg("CopyFromOneRow")));
-    return false;
+
+    cstate->cur_lineno++;
+    line = palloc(line_size);
+    read_bytes = CopyFromStateRead(cstate, line, line_size);
+    if (read_bytes == 0)
+        return false;
+    if (read_bytes != line_size)
+        ereport(ERROR,
+                (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                 errmsg("one line must be %d bytes: %d",
+                        line_size, read_bytes)));
+
+    if (cstate->cur_lineno == 1)
+    {
+        /* Success */
+        TupleDesc    tupDesc = RelationGetDescr(cstate->rel);
+        ListCell   *cur;
+        int            i = 0;
+
+        foreach(cur, cstate->attnumlist)
+        {
+            int            attnum = lfirst_int(cur);
+            int            m = attnum - 1;
+            Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+            if (att->atttypid == INT2OID)
+            {
+                values[i] = Int16GetDatum(line[i] - '0');
+            }
+            else if (att->atttypid == INT4OID)
+            {
+                values[i] = Int32GetDatum(line[i] - '0');
+            }
+            else if (att->atttypid == INT8OID)
+            {
+                values[i] = Int64GetDatum(line[i] - '0');
+            }
+            nulls[i] = false;
+            i++;
+        }
+    }
+    else if (cstate->cur_lineno == 2)
+    {
+        /* Soft error */
+        TupleDesc    tupDesc = RelationGetDescr(cstate->rel);
+        int            attnum = lfirst_int(list_head(cstate->attnumlist));
+        int            m = attnum - 1;
+        Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+        char        value[2];
+
+        cstate->cur_attname = NameStr(att->attname);
+        value[0] = line[0];
+        value[1] = '\0';
+        cstate->cur_attval = value;
+        errsave((Node *) cstate->escontext,
+                (
+                 errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+                 errmsg("invalid value: \"%c\"", line[0])));
+        CopyFromSkipErrorRow(cstate);
+        cstate->cur_attname = NULL;
+        cstate->cur_attval = NULL;
+        return true;
+    }
+    else
+    {
+        /* Hard error */
+        ereport(ERROR,
+                (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                 errmsg("too much lines: %llu",
+                        (unsigned long long) cstate->cur_lineno)));
+    }
+
+    return true;
 }
 
 static void
-- 
2.47.1
			
		Hi, In <CAD21AoBpWFU4k-_bwrTq0AkFSAdwQqhAsSW188STmu9HxLJ0nQ@mail.gmail.com> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 31 Jan 2025 14:25:34 -0800, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > I think that CopyToState and CopyFromState are not APIs but the > execution states. I'm not against exposing CopyToState and > CopyFromState. What I'd like to avoid is that we end up adding > everything (including new fields we add in the future) related to copy > operation to copyapi.h, leading to include copyapi.h into files that > are not related to custom format api. fdwapi.h and tsmapi.h as > examples have only a struct having a bunch of callbacks but not the > execution state data such as SampScanState are not defined there. Thanks for sharing examples. But it seems that fdwapi.h/tsmapi.h (ForeignScanState/SampleScanSate) are not good examples. It seems that PostgreSQL uses nodes/execnodes.h for all *ScanState. It seems that the sparation is not related to *api.h usage. > My understanding is that we don't strictly prohibit _internal.h from > being included in out of core files. For example, file_fdw.c includes > copyfrom_internal.h in order to access some fields of CopyFromState. > > If the name with _internal.h is the problem, we can rename them to > copyfrom.h and copyto.h. It makes sense to me that the code that needs > to access the internal of the copy execution state include _internal.h > header, though. Thanks for sharing the file_fdw.c example. I'm OK with _internal.h suffix because PostgreSQL doesn't prohibit _internal.h usage by extensions as you mentioned. >> > While we get the format routines for custom formats in >> > ProcessCopyOptionFormat(), we do that for built-in formats in >> > BeginCopyTo(), which seems odd to me. I think we can have >> > CopyToGetRoutine() responsible for getting CopyToRoutine for built-in >> > formats as well as custom format. The same is true for >> > CopyFromRoutine. >> >> I like the current design because we don't need to export >> CopyToGetBuiltinRoutine() (we can use static for >> CopyToGetBuiltinRoutine()) but I applied your >> suggestion. Because it's not a strong opinion. > > I meant that ProcessCopyOptionFormat() doesn't not necessarily get the > routine. An idea is that in ProcessCopyOptionFormat() we just get the > OID of the handler function, and then set up the format routine in > BeginCopyTo(). I've attached a patch for this idea (applied on top of > 0009). Oh, sorry. I misunderstood your suggestion. I understand what you suggested by the patch. Thanks. If we use the approach, we can't show error position when a custom COPY format handler function returns invalid routine because DefElem for the "format" option isn't available in BeginCopyTo(). Is it acceptable? If it's acceptable, let's use the approach. > Also, please check some regression test failures on cfbot. Oh, sorry. I forgot to follow function name change in 0009. I attach the v30 patch set that fixes it in 0009. Thanks, -- kou From 4435e1d3bff645b84bb9fe1eb4da33e158ad2f9d Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Sat, 28 Sep 2024 23:24:49 +0900 Subject: [PATCH v30 1/9] Refactor COPY TO to use format callback functions. This commit introduces a new CopyToRoutine struct, which is a set of callback routines to copy tuples in a specific format. It also makes the existing formats (text, CSV, and binary) utilize these format callbacks. This change is a preliminary step towards making the COPY TO command extensible in terms of output formats. Additionally, this refactoring contributes to a performance improvement by reducing the number of "if" branches that need to be checked on a per-row basis when sending field representations in text or CSV mode. The performance benchmark results showed ~5% performance gain in text or CSV mode. Author: Sutou Kouhei Reviewed-by: Michael Paquier, Tomas Vondra, Masahiko Sawada Reviewed-by: Junwang Zhao Discussion: https://postgr.es/m/20231204.153548.2126325458835528809.kou@clear-code.com --- src/backend/commands/copyto.c | 441 +++++++++++++++++++++---------- src/include/commands/copyapi.h | 57 ++++ src/tools/pgindent/typedefs.list | 1 + 3 files changed, 358 insertions(+), 141 deletions(-) create mode 100644 src/include/commands/copyapi.h diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c index 99cb23cb347..26c67ddc351 100644 --- a/src/backend/commands/copyto.c +++ b/src/backend/commands/copyto.c @@ -19,7 +19,7 @@ #include <sys/stat.h> #include "access/tableam.h" -#include "commands/copy.h" +#include "commands/copyapi.h" #include "commands/progress.h" #include "executor/execdesc.h" #include "executor/executor.h" @@ -64,6 +64,9 @@ typedef enum CopyDest */ typedef struct CopyToStateData { + /* format-specific routines */ + const CopyToRoutine *routine; + /* low-level state data */ CopyDest copy_dest; /* type of copy source/destination */ FILE *copy_file; /* used if copy_dest == COPY_FILE */ @@ -114,6 +117,19 @@ static void CopyAttributeOutText(CopyToState cstate, const char *string); static void CopyAttributeOutCSV(CopyToState cstate, const char *string, bool use_quote); +/* built-in format-specific routines */ +static void CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc); +static void CopyToTextLikeOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo); +static void CopyToTextOneRow(CopyToState cstate, TupleTableSlot *slot); +static void CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot); +static void CopyToTextLikeOneRow(CopyToState cstate, TupleTableSlot *slot, + bool is_csv); +static void CopyToTextLikeEnd(CopyToState cstate); +static void CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc); +static void CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo); +static void CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot); +static void CopyToBinaryEnd(CopyToState cstate); + /* Low-level communications functions */ static void SendCopyBegin(CopyToState cstate); static void SendCopyEnd(CopyToState cstate); @@ -121,9 +137,254 @@ static void CopySendData(CopyToState cstate, const void *databuf, int datasize); static void CopySendString(CopyToState cstate, const char *str); static void CopySendChar(CopyToState cstate, char c); static void CopySendEndOfRow(CopyToState cstate); +static void CopySendTextLikeEndOfRow(CopyToState cstate); static void CopySendInt32(CopyToState cstate, int32 val); static void CopySendInt16(CopyToState cstate, int16 val); +/* + * COPY TO routines for built-in formats. + * + * CSV and text formats share the same TextLike routines except for the + * one-row callback. + */ + +/* text format */ +static const CopyToRoutine CopyToRoutineText = { + .CopyToStart = CopyToTextLikeStart, + .CopyToOutFunc = CopyToTextLikeOutFunc, + .CopyToOneRow = CopyToTextOneRow, + .CopyToEnd = CopyToTextLikeEnd, +}; + +/* CSV format */ +static const CopyToRoutine CopyToRoutineCSV = { + .CopyToStart = CopyToTextLikeStart, + .CopyToOutFunc = CopyToTextLikeOutFunc, + .CopyToOneRow = CopyToCSVOneRow, + .CopyToEnd = CopyToTextLikeEnd, +}; + +/* binary format */ +static const CopyToRoutine CopyToRoutineBinary = { + .CopyToStart = CopyToBinaryStart, + .CopyToOutFunc = CopyToBinaryOutFunc, + .CopyToOneRow = CopyToBinaryOneRow, + .CopyToEnd = CopyToBinaryEnd, +}; + +/* Return a COPY TO routine for the given options */ +static const CopyToRoutine * +CopyToGetRoutine(CopyFormatOptions opts) +{ + if (opts.csv_mode) + return &CopyToRoutineCSV; + else if (opts.binary) + return &CopyToRoutineBinary; + + /* default is text */ + return &CopyToRoutineText; +} + +/* Implementation of the start callback for text and CSV formats */ +static void +CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc) +{ + /* + * For non-binary copy, we need to convert null_print to file encoding, + * because it will be sent directly with CopySendString. + */ + if (cstate->need_transcoding) + cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print, + cstate->opts.null_print_len, + cstate->file_encoding); + + /* if a header has been requested send the line */ + if (cstate->opts.header_line) + { + ListCell *cur; + bool hdr_delim = false; + + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + char *colname; + + if (hdr_delim) + CopySendChar(cstate, cstate->opts.delim[0]); + hdr_delim = true; + + colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname); + + if (cstate->opts.csv_mode) + CopyAttributeOutCSV(cstate, colname, false); + else + CopyAttributeOutText(cstate, colname); + } + + CopySendTextLikeEndOfRow(cstate); + } +} + +/* + * Implementation of the outfunc callback for text and CSV formats. Assign + * the output function data to the given *finfo. + */ +static void +CopyToTextLikeOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo) +{ + Oid func_oid; + bool is_varlena; + + /* Set output function for an attribute */ + getTypeOutputInfo(atttypid, &func_oid, &is_varlena); + fmgr_info(func_oid, finfo); +} + +/* Implementation of the per-row callback for text format */ +static void +CopyToTextOneRow(CopyToState cstate, TupleTableSlot *slot) +{ + CopyToTextLikeOneRow(cstate, slot, false); +} + +/* Implementation of the per-row callback for CSV format */ +static void +CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot) +{ + CopyToTextLikeOneRow(cstate, slot, true); +} + +/* + * Workhorse for CopyToTextOneRow() and CopyToCSVOneRow(). + * + * We use pg_attribute_always_inline to reduce function call overheads. + */ +static pg_attribute_always_inline void +CopyToTextLikeOneRow(CopyToState cstate, + TupleTableSlot *slot, + bool is_csv) +{ + bool need_delim = false; + FmgrInfo *out_functions = cstate->out_functions; + + foreach_int(attnum, cstate->attnumlist) + { + Datum value = slot->tts_values[attnum - 1]; + bool isnull = slot->tts_isnull[attnum - 1]; + + if (need_delim) + CopySendChar(cstate, cstate->opts.delim[0]); + need_delim = true; + + if (isnull) + { + CopySendString(cstate, cstate->opts.null_print_client); + } + else + { + char *string; + + string = OutputFunctionCall(&out_functions[attnum - 1], + value); + + /* + * is_csv will be optimized away by compiler, as argument is + * constant at caller. + */ + if (is_csv) + CopyAttributeOutCSV(cstate, string, + cstate->opts.force_quote_flags[attnum - 1]); + else + CopyAttributeOutText(cstate, string); + } + } + + CopySendTextLikeEndOfRow(cstate); +} + +/* Implementation of the end callback for text and CSV formats */ +static void +CopyToTextLikeEnd(CopyToState cstate) +{ + /* Nothing to do here */ +} + +/* + * Implementation of the start callback for binary format. Send a header + * for a binary copy. + */ +static void +CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc) +{ + int32 tmp; + + /* Signature */ + CopySendData(cstate, BinarySignature, 11); + /* Flags field */ + tmp = 0; + CopySendInt32(cstate, tmp); + /* No header extension */ + tmp = 0; + CopySendInt32(cstate, tmp); +} + +/* + * Implementation of the outfunc callback for binary format. Assign + * the binary output function to the given *finfo. + */ +static void +CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo) +{ + Oid func_oid; + bool is_varlena; + + /* Set output function for an attribute */ + getTypeBinaryOutputInfo(atttypid, &func_oid, &is_varlena); + fmgr_info(func_oid, finfo); +} + +/* Implementation of the per-row callback for binary format */ +static void +CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot) +{ + FmgrInfo *out_functions = cstate->out_functions; + + /* Binary per-tuple header */ + CopySendInt16(cstate, list_length(cstate->attnumlist)); + + foreach_int(attnum, cstate->attnumlist) + { + Datum value = slot->tts_values[attnum - 1]; + bool isnull = slot->tts_isnull[attnum - 1]; + + if (isnull) + { + CopySendInt32(cstate, -1); + } + else + { + bytea *outputbytes; + + outputbytes = SendFunctionCall(&out_functions[attnum - 1], + value); + CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ); + CopySendData(cstate, VARDATA(outputbytes), + VARSIZE(outputbytes) - VARHDRSZ); + } + } + + CopySendEndOfRow(cstate); +} + +/* Implementation of the end callback for binary format */ +static void +CopyToBinaryEnd(CopyToState cstate) +{ + /* Generate trailer for a binary copy */ + CopySendInt16(cstate, -1); + /* Need to flush out the trailer */ + CopySendEndOfRow(cstate); +} /* * Send copy start/stop messages for frontend copies. These have changed @@ -191,16 +452,6 @@ CopySendEndOfRow(CopyToState cstate) switch (cstate->copy_dest) { case COPY_FILE: - if (!cstate->opts.binary) - { - /* Default line termination depends on platform */ -#ifndef WIN32 - CopySendChar(cstate, '\n'); -#else - CopySendString(cstate, "\r\n"); -#endif - } - if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1, cstate->copy_file) != 1 || ferror(cstate->copy_file)) @@ -235,10 +486,6 @@ CopySendEndOfRow(CopyToState cstate) } break; case COPY_FRONTEND: - /* The FE/BE protocol uses \n as newline for all platforms */ - if (!cstate->opts.binary) - CopySendChar(cstate, '\n'); - /* Dump the accumulated row as one CopyData message */ (void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len); break; @@ -254,6 +501,35 @@ CopySendEndOfRow(CopyToState cstate) resetStringInfo(fe_msgbuf); } +/* + * Wrapper function of CopySendEndOfRow for text and CSV formats. Sends the + * the line termination and do common appropriate things for the end of row. + */ +static inline void +CopySendTextLikeEndOfRow(CopyToState cstate) +{ + switch (cstate->copy_dest) + { + case COPY_FILE: + /* Default line termination depends on platform */ +#ifndef WIN32 + CopySendChar(cstate, '\n'); +#else + CopySendString(cstate, "\r\n"); +#endif + break; + case COPY_FRONTEND: + /* The FE/BE protocol uses \n as newline for all platforms */ + CopySendChar(cstate, '\n'); + break; + default: + break; + } + + /* Now take the actions related to the end of a row */ + CopySendEndOfRow(cstate); +} + /* * These functions do apply some data conversion */ @@ -426,6 +702,9 @@ BeginCopyTo(ParseState *pstate, /* Extract options from the statement node tree */ ProcessCopyOptions(pstate, &cstate->opts, false /* is_from */ , options); + /* Set format routine */ + cstate->routine = CopyToGetRoutine(cstate->opts); + /* Process the source/target relation or query */ if (rel) { @@ -771,19 +1050,10 @@ DoCopyTo(CopyToState cstate) foreach(cur, cstate->attnumlist) { int attnum = lfirst_int(cur); - Oid out_func_oid; - bool isvarlena; Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1); - if (cstate->opts.binary) - getTypeBinaryOutputInfo(attr->atttypid, - &out_func_oid, - &isvarlena); - else - getTypeOutputInfo(attr->atttypid, - &out_func_oid, - &isvarlena); - fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]); + cstate->routine->CopyToOutFunc(cstate, attr->atttypid, + &cstate->out_functions[attnum - 1]); } /* @@ -796,56 +1066,7 @@ DoCopyTo(CopyToState cstate) "COPY TO", ALLOCSET_DEFAULT_SIZES); - if (cstate->opts.binary) - { - /* Generate header for a binary copy */ - int32 tmp; - - /* Signature */ - CopySendData(cstate, BinarySignature, 11); - /* Flags field */ - tmp = 0; - CopySendInt32(cstate, tmp); - /* No header extension */ - tmp = 0; - CopySendInt32(cstate, tmp); - } - else - { - /* - * For non-binary copy, we need to convert null_print to file - * encoding, because it will be sent directly with CopySendString. - */ - if (cstate->need_transcoding) - cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print, - cstate->opts.null_print_len, - cstate->file_encoding); - - /* if a header has been requested send the line */ - if (cstate->opts.header_line) - { - bool hdr_delim = false; - - foreach(cur, cstate->attnumlist) - { - int attnum = lfirst_int(cur); - char *colname; - - if (hdr_delim) - CopySendChar(cstate, cstate->opts.delim[0]); - hdr_delim = true; - - colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname); - - if (cstate->opts.csv_mode) - CopyAttributeOutCSV(cstate, colname, false); - else - CopyAttributeOutText(cstate, colname); - } - - CopySendEndOfRow(cstate); - } - } + cstate->routine->CopyToStart(cstate, tupDesc); if (cstate->rel) { @@ -884,13 +1105,7 @@ DoCopyTo(CopyToState cstate) processed = ((DR_copy *) cstate->queryDesc->dest)->processed; } - if (cstate->opts.binary) - { - /* Generate trailer for a binary copy */ - CopySendInt16(cstate, -1); - /* Need to flush out the trailer */ - CopySendEndOfRow(cstate); - } + cstate->routine->CopyToEnd(cstate); MemoryContextDelete(cstate->rowcontext); @@ -903,74 +1118,18 @@ DoCopyTo(CopyToState cstate) /* * Emit one row during DoCopyTo(). */ -static void +static inline void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot) { - FmgrInfo *out_functions = cstate->out_functions; MemoryContext oldcontext; MemoryContextReset(cstate->rowcontext); oldcontext = MemoryContextSwitchTo(cstate->rowcontext); - if (cstate->opts.binary) - { - /* Binary per-tuple header */ - CopySendInt16(cstate, list_length(cstate->attnumlist)); - } - /* Make sure the tuple is fully deconstructed */ slot_getallattrs(slot); - if (!cstate->opts.binary) - { - bool need_delim = false; - - foreach_int(attnum, cstate->attnumlist) - { - Datum value = slot->tts_values[attnum - 1]; - bool isnull = slot->tts_isnull[attnum - 1]; - char *string; - - if (need_delim) - CopySendChar(cstate, cstate->opts.delim[0]); - need_delim = true; - - if (isnull) - CopySendString(cstate, cstate->opts.null_print_client); - else - { - string = OutputFunctionCall(&out_functions[attnum - 1], - value); - if (cstate->opts.csv_mode) - CopyAttributeOutCSV(cstate, string, - cstate->opts.force_quote_flags[attnum - 1]); - else - CopyAttributeOutText(cstate, string); - } - } - } - else - { - foreach_int(attnum, cstate->attnumlist) - { - Datum value = slot->tts_values[attnum - 1]; - bool isnull = slot->tts_isnull[attnum - 1]; - bytea *outputbytes; - - if (isnull) - CopySendInt32(cstate, -1); - else - { - outputbytes = SendFunctionCall(&out_functions[attnum - 1], - value); - CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ); - CopySendData(cstate, VARDATA(outputbytes), - VARSIZE(outputbytes) - VARHDRSZ); - } - } - } - - CopySendEndOfRow(cstate); + cstate->routine->CopyToOneRow(cstate, slot); MemoryContextSwitchTo(oldcontext); } diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h new file mode 100644 index 00000000000..be29e3fbdef --- /dev/null +++ b/src/include/commands/copyapi.h @@ -0,0 +1,57 @@ +/*------------------------------------------------------------------------- + * + * copyapi.h + * API for COPY TO handlers + * + * + * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + * src/include/commands/copyapi.h + * + *------------------------------------------------------------------------- + */ +#ifndef COPYAPI_H +#define COPYAPI_H + +#include "commands/copy.h" +#include "executor/tuptable.h" +#include "nodes/execnodes.h" + +/* + * API structure for a COPY TO format implementation. Note this must be + * allocated in a server-lifetime manner, typically as a static const struct. + */ +typedef struct CopyToRoutine +{ + /* + * Set output function information. This callback is called once at the + * beginning of COPY TO. + * + * 'finfo' can be optionally filled to provide the catalog information of + * the output function. + * + * 'atttypid' is the OID of data type used by the relation's attribute. + */ + void (*CopyToOutFunc) (CopyToState cstate, Oid atttypid, + FmgrInfo *finfo); + + /* + * Start a COPY TO. This callback is called once at the beginning of COPY + * FROM. + * + * 'tupDesc' is the tuple descriptor of the relation from where the data + * is read. + */ + void (*CopyToStart) (CopyToState cstate, TupleDesc tupDesc); + + /* + * Write one row to the 'slot'. + */ + void (*CopyToOneRow) (CopyToState cstate, TupleTableSlot *slot); + + /* End a COPY TO. This callback is called once at the end of COPY FROM */ + void (*CopyToEnd) (CopyToState cstate); +} CopyToRoutine; + +#endif /* COPYAPI_H */ diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list index a2644a2e653..1cbb3628857 100644 --- a/src/tools/pgindent/typedefs.list +++ b/src/tools/pgindent/typedefs.list @@ -508,6 +508,7 @@ CopyMultiInsertInfo CopyOnErrorChoice CopySource CopyStmt +CopyToRoutine CopyToState CopyToStateData Cost -- 2.47.1 From 6e23a4faa3af71c2ad9cba733c547cfa78de99c0 Mon Sep 17 00:00:00 2001 From: Masahiko Sawada <sawada.mshk@gmail.com> Date: Mon, 18 Nov 2024 16:32:43 -0800 Subject: [PATCH v30 2/9] Refactor COPY FROM to use format callback functions. This commit introduces a new CopyFromRoutine struct, which is a set of callback routines to read tuples in a specific format. It also makes COPY FROM with the existing formats (text, CSV, and binary) utilize these format callbacks. This change is a preliminary step towards making the COPY TO command extensible in terms of output formats. Similar to XXXX, this refactoring contributes to a performance improvement by reducing the number of "if" branches that need to be checked on a per-row basis when sending field representations in text or CSV mode. The performance benchmark results showed ~5% performance gain in text or CSV mode. Author: Sutou Kouhei Reviewed-by: Michael Paquier, Tomas Vondra, Masahiko Sawada Reviewed-by: Junwang Zhao Discussion: https://postgr.es/m/20231204.153548.2126325458835528809.kou@clear-code.com --- contrib/file_fdw/file_fdw.c | 1 - src/backend/commands/copyfrom.c | 190 +++++++-- src/backend/commands/copyfromparse.c | 504 +++++++++++++---------- src/include/commands/copy.h | 2 - src/include/commands/copyapi.h | 48 ++- src/include/commands/copyfrom_internal.h | 13 +- src/tools/pgindent/typedefs.list | 1 + 7 files changed, 492 insertions(+), 267 deletions(-) diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c index 678e754b2b9..323c43dca4a 100644 --- a/contrib/file_fdw/file_fdw.c +++ b/contrib/file_fdw/file_fdw.c @@ -21,7 +21,6 @@ #include "access/table.h" #include "catalog/pg_authid.h" #include "catalog/pg_foreign_table.h" -#include "commands/copy.h" #include "commands/copyfrom_internal.h" #include "commands/defrem.h" #include "commands/explain.h" diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index 0cbd05f5602..917fa6605ef 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -106,6 +106,145 @@ typedef struct CopyMultiInsertInfo /* non-export function prototypes */ static void ClosePipeFromProgram(CopyFromState cstate); +/* + * Built-in format-specific routines. One-row callbacks are defined in + * copyfromparse.c + */ +static void CopyFromTextLikeInFunc(CopyFromState cstate, Oid atttypid, FmgrInfo *finfo, + Oid *typioparam); +static void CopyFromTextLikeStart(CopyFromState cstate, TupleDesc tupDesc); +static void CopyFromTextLikeEnd(CopyFromState cstate); +static void CopyFromBinaryInFunc(CopyFromState cstate, Oid atttypid, + FmgrInfo *finfo, Oid *typioparam); +static void CopyFromBinaryStart(CopyFromState cstate, TupleDesc tupDesc); +static void CopyFromBinaryEnd(CopyFromState cstate); + + +/* + * COPY FROM routines for built-in formats. + * + * CSV and text formats share the same TextLike routines except for the + * one-row callback. + */ + +/* text format */ +static const CopyFromRoutine CopyFromRoutineText = { + .CopyFromInFunc = CopyFromTextLikeInFunc, + .CopyFromStart = CopyFromTextLikeStart, + .CopyFromOneRow = CopyFromTextOneRow, + .CopyFromEnd = CopyFromTextLikeEnd, +}; + +/* CSV format */ +static const CopyFromRoutine CopyFromRoutineCSV = { + .CopyFromInFunc = CopyFromTextLikeInFunc, + .CopyFromStart = CopyFromTextLikeStart, + .CopyFromOneRow = CopyFromCSVOneRow, + .CopyFromEnd = CopyFromTextLikeEnd, +}; + +/* binary format */ +static const CopyFromRoutine CopyFromRoutineBinary = { + .CopyFromInFunc = CopyFromBinaryInFunc, + .CopyFromStart = CopyFromBinaryStart, + .CopyFromOneRow = CopyFromBinaryOneRow, + .CopyFromEnd = CopyFromBinaryEnd, +}; + +/* Return a COPY FROM routine for the given options */ +static const CopyFromRoutine * +CopyFromGetRoutine(CopyFormatOptions opts) +{ + if (opts.csv_mode) + return &CopyFromRoutineCSV; + else if (opts.binary) + return &CopyFromRoutineBinary; + + /* default is text */ + return &CopyFromRoutineText; +} + +/* Implementation of the start callback for text and CSV formats */ +static void +CopyFromTextLikeStart(CopyFromState cstate, TupleDesc tupDesc) +{ + AttrNumber attr_count; + + /* + * If encoding conversion is needed, we need another buffer to hold the + * converted input data. Otherwise, we can just point input_buf to the + * same buffer as raw_buf. + */ + if (cstate->need_transcoding) + { + cstate->input_buf = (char *) palloc(INPUT_BUF_SIZE + 1); + cstate->input_buf_index = cstate->input_buf_len = 0; + } + else + cstate->input_buf = cstate->raw_buf; + cstate->input_reached_eof = false; + + initStringInfo(&cstate->line_buf); + + /* + * Create workspace for CopyReadAttributes results; used by CSV and text + * format. + */ + attr_count = list_length(cstate->attnumlist); + cstate->max_fields = attr_count; + cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *)); +} + +/* + * Implementation of the infunc callback for text and CSV formats. Assign + * the input function data to the given *finfo. + */ +static void +CopyFromTextLikeInFunc(CopyFromState cstate, Oid atttypid, FmgrInfo *finfo, + Oid *typioparam) +{ + Oid func_oid; + + getTypeInputInfo(atttypid, &func_oid, typioparam); + fmgr_info(func_oid, finfo); +} + +/* Implementation of the end callback for text and CSV formats */ +static void +CopyFromTextLikeEnd(CopyFromState cstate) +{ + /* nothing to do */ +} + +/* Implementation of the start callback for binary format */ +static void +CopyFromBinaryStart(CopyFromState cstate, TupleDesc tupDesc) +{ + /* Read and verify binary header */ + ReceiveCopyBinaryHeader(cstate); +} + +/* + * Implementation of the infunc callback for binary format. Assign + * the binary input function to the given *finfo. + */ +static void +CopyFromBinaryInFunc(CopyFromState cstate, Oid atttypid, + FmgrInfo *finfo, Oid *typioparam) +{ + Oid func_oid; + + getTypeBinaryInputInfo(atttypid, &func_oid, typioparam); + fmgr_info(func_oid, finfo); +} + +/* Implementation of the end callback for binary format */ +static void +CopyFromBinaryEnd(CopyFromState cstate) +{ + /* nothing to do */ +} + /* * error context callback for COPY FROM * @@ -1396,7 +1535,6 @@ BeginCopyFrom(ParseState *pstate, num_defaults; FmgrInfo *in_functions; Oid *typioparams; - Oid in_func_oid; int *defmap; ExprState **defexprs; MemoryContext oldcontext; @@ -1428,6 +1566,9 @@ BeginCopyFrom(ParseState *pstate, /* Extract options from the statement node tree */ ProcessCopyOptions(pstate, &cstate->opts, true /* is_from */ , options); + /* Set the format routine */ + cstate->routine = CopyFromGetRoutine(cstate->opts); + /* Process the target relation */ cstate->rel = rel; @@ -1583,25 +1724,6 @@ BeginCopyFrom(ParseState *pstate, cstate->raw_buf_index = cstate->raw_buf_len = 0; cstate->raw_reached_eof = false; - if (!cstate->opts.binary) - { - /* - * If encoding conversion is needed, we need another buffer to hold - * the converted input data. Otherwise, we can just point input_buf - * to the same buffer as raw_buf. - */ - if (cstate->need_transcoding) - { - cstate->input_buf = (char *) palloc(INPUT_BUF_SIZE + 1); - cstate->input_buf_index = cstate->input_buf_len = 0; - } - else - cstate->input_buf = cstate->raw_buf; - cstate->input_reached_eof = false; - - initStringInfo(&cstate->line_buf); - } - initStringInfo(&cstate->attribute_buf); /* Assign range table and rteperminfos, we'll need them in CopyFrom. */ @@ -1634,13 +1756,9 @@ BeginCopyFrom(ParseState *pstate, continue; /* Fetch the input function and typioparam info */ - if (cstate->opts.binary) - getTypeBinaryInputInfo(att->atttypid, - &in_func_oid, &typioparams[attnum - 1]); - else - getTypeInputInfo(att->atttypid, - &in_func_oid, &typioparams[attnum - 1]); - fmgr_info(in_func_oid, &in_functions[attnum - 1]); + cstate->routine->CopyFromInFunc(cstate, att->atttypid, + &in_functions[attnum - 1], + &typioparams[attnum - 1]); /* Get default info if available */ defexprs[attnum - 1] = NULL; @@ -1775,20 +1893,7 @@ BeginCopyFrom(ParseState *pstate, pgstat_progress_update_multi_param(3, progress_cols, progress_vals); - if (cstate->opts.binary) - { - /* Read and verify binary header */ - ReceiveCopyBinaryHeader(cstate); - } - - /* create workspace for CopyReadAttributes results */ - if (!cstate->opts.binary) - { - AttrNumber attr_count = list_length(cstate->attnumlist); - - cstate->max_fields = attr_count; - cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *)); - } + cstate->routine->CopyFromStart(cstate, tupDesc); MemoryContextSwitchTo(oldcontext); @@ -1801,6 +1906,9 @@ BeginCopyFrom(ParseState *pstate, void EndCopyFrom(CopyFromState cstate) { + /* Invoke the end callback */ + cstate->routine->CopyFromEnd(cstate); + /* No COPY FROM related resources except memory. */ if (cstate->is_program) { diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index caccdc8563c..65f20d332ee 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -62,7 +62,6 @@ #include <unistd.h> #include <sys/stat.h> -#include "commands/copy.h" #include "commands/copyfrom_internal.h" #include "commands/progress.h" #include "executor/executor.h" @@ -140,8 +139,8 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0"; /* non-export function prototypes */ -static bool CopyReadLine(CopyFromState cstate); -static bool CopyReadLineText(CopyFromState cstate); +static bool CopyReadLine(CopyFromState cstate, bool is_csv); +static bool CopyReadLineText(CopyFromState cstate, bool is_csv); static int CopyReadAttributesText(CopyFromState cstate); static int CopyReadAttributesCSV(CopyFromState cstate); static Datum CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo, @@ -740,9 +739,11 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes) * in the relation. * * NOTE: force_not_null option are not applied to the returned fields. + * + * We use pg_attribute_always_inline to reduce function call overheads. */ -bool -NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields) +static pg_attribute_always_inline bool +NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields, bool is_csv) { int fldct; bool done; @@ -759,13 +760,17 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields) tupDesc = RelationGetDescr(cstate->rel); cstate->cur_lineno++; - done = CopyReadLine(cstate); + done = CopyReadLine(cstate, is_csv); if (cstate->opts.header_line == COPY_HEADER_MATCH) { int fldnum; - if (cstate->opts.csv_mode) + /* + * is_csv will be optimized away by compiler, as argument is + * constant at caller. + */ + if (is_csv) fldct = CopyReadAttributesCSV(cstate); else fldct = CopyReadAttributesText(cstate); @@ -809,7 +814,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields) cstate->cur_lineno++; /* Actually read the line into memory here */ - done = CopyReadLine(cstate); + done = CopyReadLine(cstate, is_csv); /* * EOF at start of line means we're done. If we see EOF after some @@ -819,8 +824,13 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields) if (done && cstate->line_buf.len == 0) return false; - /* Parse the line into de-escaped field values */ - if (cstate->opts.csv_mode) + /* + * Parse the line into de-escaped field values + * + * is_csv will be optimized away by compiler, as argument is constant at + * caller. + */ + if (is_csv) fldct = CopyReadAttributesCSV(cstate); else fldct = CopyReadAttributesText(cstate); @@ -830,6 +840,244 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields) return true; } +/* + * Workhorse for CopyFromTextOneRow() and CopyFromCSVOneRow(). + * + * We use pg_attribute_always_inline to reduce function call overheads. + */ +static pg_attribute_always_inline bool +CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext, + Datum *values, bool *nulls, bool is_csv) +{ + TupleDesc tupDesc; + AttrNumber attr_count; + FmgrInfo *in_functions = cstate->in_functions; + Oid *typioparams = cstate->typioparams; + ExprState **defexprs = cstate->defexprs; + char **field_strings; + ListCell *cur; + int fldct; + int fieldno; + char *string; + + tupDesc = RelationGetDescr(cstate->rel); + attr_count = list_length(cstate->attnumlist); + + /* read raw fields in the next line */ + if (!NextCopyFromRawFields(cstate, &field_strings, &fldct, is_csv)) + return false; + + /* check for overflowing fields */ + if (attr_count > 0 && fldct > attr_count) + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("extra data after last expected column"))); + + fieldno = 0; + + /* Loop to read the user attributes on the line. */ + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + int m = attnum - 1; + Form_pg_attribute att = TupleDescAttr(tupDesc, m); + + if (fieldno >= fldct) + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("missing data for column \"%s\"", + NameStr(att->attname)))); + string = field_strings[fieldno++]; + + if (cstate->convert_select_flags && + !cstate->convert_select_flags[m]) + { + /* ignore input field, leaving column as NULL */ + continue; + } + + if (is_csv) + { + if (string == NULL && + cstate->opts.force_notnull_flags[m]) + { + /* + * FORCE_NOT_NULL option is set and column is NULL - convert + * it to the NULL string. + */ + string = cstate->opts.null_print; + } + else if (string != NULL && cstate->opts.force_null_flags[m] + && strcmp(string, cstate->opts.null_print) == 0) + { + /* + * FORCE_NULL option is set and column matches the NULL + * string. It must have been quoted, or otherwise the string + * would already have been set to NULL. Convert it to NULL as + * specified. + */ + string = NULL; + } + } + + cstate->cur_attname = NameStr(att->attname); + cstate->cur_attval = string; + + if (string != NULL) + nulls[m] = false; + + if (cstate->defaults[m]) + { + /* + * The caller must supply econtext and have switched into the + * per-tuple memory context in it. + */ + Assert(econtext != NULL); + Assert(CurrentMemoryContext == econtext->ecxt_per_tuple_memory); + + values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]); + } + + /* + * If ON_ERROR is specified with IGNORE, skip rows with soft errors + */ + else if (!InputFunctionCallSafe(&in_functions[m], + string, + typioparams[m], + att->atttypmod, + (Node *) cstate->escontext, + &values[m])) + { + Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP); + + cstate->num_errors++; + + if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE) + { + /* + * Since we emit line number and column info in the below + * notice message, we suppress error context information other + * than the relation name. + */ + Assert(!cstate->relname_only); + cstate->relname_only = true; + + if (cstate->cur_attval) + { + char *attval; + + attval = CopyLimitPrintoutLength(cstate->cur_attval); + ereport(NOTICE, + errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": \"%s\"", + (unsigned long long) cstate->cur_lineno, + cstate->cur_attname, + attval)); + pfree(attval); + } + else + ereport(NOTICE, + errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": null input", + (unsigned long long) cstate->cur_lineno, + cstate->cur_attname)); + + /* reset relname_only */ + cstate->relname_only = false; + } + + return true; + } + + cstate->cur_attname = NULL; + cstate->cur_attval = NULL; + } + + Assert(fieldno == attr_count); + + return true; +} + +/* Implementation of the per-row callback for text format */ +bool +CopyFromTextOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, + bool *nulls) +{ + return CopyFromTextLikeOneRow(cstate, econtext, values, nulls, false); +} + +/* Implementation of the per-row callback for CSV format */ +bool +CopyFromCSVOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, + bool *nulls) +{ + return CopyFromTextLikeOneRow(cstate, econtext, values, nulls, true); +} + +/* Implementation of the per-row callback for binary format */ +bool +CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, + bool *nulls) +{ + TupleDesc tupDesc; + AttrNumber attr_count; + FmgrInfo *in_functions = cstate->in_functions; + Oid *typioparams = cstate->typioparams; + int16 fld_count; + ListCell *cur; + + tupDesc = RelationGetDescr(cstate->rel); + attr_count = list_length(cstate->attnumlist); + + cstate->cur_lineno++; + + if (!CopyGetInt16(cstate, &fld_count)) + { + /* EOF detected (end of file, or protocol-level EOF) */ + return false; + } + + if (fld_count == -1) + { + /* + * Received EOF marker. Wait for the protocol-level EOF, and complain + * if it doesn't come immediately. In COPY FROM STDIN, this ensures + * that we correctly handle CopyFail, if client chooses to send that + * now. When copying from file, we could ignore the rest of the file + * like in text mode, but we choose to be consistent with the COPY + * FROM STDIN case. + */ + char dummy; + + if (CopyReadBinaryData(cstate, &dummy, 1) > 0) + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("received copy data after EOF marker"))); + return false; + } + + if (fld_count != attr_count) + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("row field count is %d, expected %d", + (int) fld_count, attr_count))); + + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + int m = attnum - 1; + Form_pg_attribute att = TupleDescAttr(tupDesc, m); + + cstate->cur_attname = NameStr(att->attname); + values[m] = CopyReadBinaryAttribute(cstate, + &in_functions[m], + typioparams[m], + att->atttypmod, + &nulls[m]); + cstate->cur_attname = NULL; + } + + return true; +} + /* * Read next tuple from file for COPY FROM. Return false if no more tuples. * @@ -847,216 +1095,22 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext, { TupleDesc tupDesc; AttrNumber num_phys_attrs, - attr_count, num_defaults = cstate->num_defaults; - FmgrInfo *in_functions = cstate->in_functions; - Oid *typioparams = cstate->typioparams; int i; int *defmap = cstate->defmap; ExprState **defexprs = cstate->defexprs; tupDesc = RelationGetDescr(cstate->rel); num_phys_attrs = tupDesc->natts; - attr_count = list_length(cstate->attnumlist); /* Initialize all values for row to NULL */ MemSet(values, 0, num_phys_attrs * sizeof(Datum)); MemSet(nulls, true, num_phys_attrs * sizeof(bool)); MemSet(cstate->defaults, false, num_phys_attrs * sizeof(bool)); - if (!cstate->opts.binary) - { - char **field_strings; - ListCell *cur; - int fldct; - int fieldno; - char *string; - - /* read raw fields in the next line */ - if (!NextCopyFromRawFields(cstate, &field_strings, &fldct)) - return false; - - /* check for overflowing fields */ - if (attr_count > 0 && fldct > attr_count) - ereport(ERROR, - (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - errmsg("extra data after last expected column"))); - - fieldno = 0; - - /* Loop to read the user attributes on the line. */ - foreach(cur, cstate->attnumlist) - { - int attnum = lfirst_int(cur); - int m = attnum - 1; - Form_pg_attribute att = TupleDescAttr(tupDesc, m); - - if (fieldno >= fldct) - ereport(ERROR, - (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - errmsg("missing data for column \"%s\"", - NameStr(att->attname)))); - string = field_strings[fieldno++]; - - if (cstate->convert_select_flags && - !cstate->convert_select_flags[m]) - { - /* ignore input field, leaving column as NULL */ - continue; - } - - if (cstate->opts.csv_mode) - { - if (string == NULL && - cstate->opts.force_notnull_flags[m]) - { - /* - * FORCE_NOT_NULL option is set and column is NULL - - * convert it to the NULL string. - */ - string = cstate->opts.null_print; - } - else if (string != NULL && cstate->opts.force_null_flags[m] - && strcmp(string, cstate->opts.null_print) == 0) - { - /* - * FORCE_NULL option is set and column matches the NULL - * string. It must have been quoted, or otherwise the - * string would already have been set to NULL. Convert it - * to NULL as specified. - */ - string = NULL; - } - } - - cstate->cur_attname = NameStr(att->attname); - cstate->cur_attval = string; - - if (string != NULL) - nulls[m] = false; - - if (cstate->defaults[m]) - { - /* - * The caller must supply econtext and have switched into the - * per-tuple memory context in it. - */ - Assert(econtext != NULL); - Assert(CurrentMemoryContext == econtext->ecxt_per_tuple_memory); - - values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]); - } - - /* - * If ON_ERROR is specified with IGNORE, skip rows with soft - * errors - */ - else if (!InputFunctionCallSafe(&in_functions[m], - string, - typioparams[m], - att->atttypmod, - (Node *) cstate->escontext, - &values[m])) - { - Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP); - - cstate->num_errors++; - - if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE) - { - /* - * Since we emit line number and column info in the below - * notice message, we suppress error context information - * other than the relation name. - */ - Assert(!cstate->relname_only); - cstate->relname_only = true; - - if (cstate->cur_attval) - { - char *attval; - - attval = CopyLimitPrintoutLength(cstate->cur_attval); - ereport(NOTICE, - errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": \"%s\"", - (unsigned long long) cstate->cur_lineno, - cstate->cur_attname, - attval)); - pfree(attval); - } - else - ereport(NOTICE, - errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": nullinput", - (unsigned long long) cstate->cur_lineno, - cstate->cur_attname)); - - /* reset relname_only */ - cstate->relname_only = false; - } - - return true; - } - - cstate->cur_attname = NULL; - cstate->cur_attval = NULL; - } - - Assert(fieldno == attr_count); - } - else - { - /* binary */ - int16 fld_count; - ListCell *cur; - - cstate->cur_lineno++; - - if (!CopyGetInt16(cstate, &fld_count)) - { - /* EOF detected (end of file, or protocol-level EOF) */ - return false; - } - - if (fld_count == -1) - { - /* - * Received EOF marker. Wait for the protocol-level EOF, and - * complain if it doesn't come immediately. In COPY FROM STDIN, - * this ensures that we correctly handle CopyFail, if client - * chooses to send that now. When copying from file, we could - * ignore the rest of the file like in text mode, but we choose to - * be consistent with the COPY FROM STDIN case. - */ - char dummy; - - if (CopyReadBinaryData(cstate, &dummy, 1) > 0) - ereport(ERROR, - (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - errmsg("received copy data after EOF marker"))); - return false; - } - - if (fld_count != attr_count) - ereport(ERROR, - (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - errmsg("row field count is %d, expected %d", - (int) fld_count, attr_count))); - - foreach(cur, cstate->attnumlist) - { - int attnum = lfirst_int(cur); - int m = attnum - 1; - Form_pg_attribute att = TupleDescAttr(tupDesc, m); - - cstate->cur_attname = NameStr(att->attname); - values[m] = CopyReadBinaryAttribute(cstate, - &in_functions[m], - typioparams[m], - att->atttypmod, - &nulls[m]); - cstate->cur_attname = NULL; - } - } + /* Get one row from source */ + if (!cstate->routine->CopyFromOneRow(cstate, econtext, values, nulls)) + return false; /* * Now compute and insert any defaults available for the columns not @@ -1087,7 +1141,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext, * in the final value of line_buf. */ static bool -CopyReadLine(CopyFromState cstate) +CopyReadLine(CopyFromState cstate, bool is_csv) { bool result; @@ -1095,7 +1149,7 @@ CopyReadLine(CopyFromState cstate) cstate->line_buf_valid = false; /* Parse data and transfer into line_buf */ - result = CopyReadLineText(cstate); + result = CopyReadLineText(cstate, is_csv); if (result) { @@ -1163,7 +1217,7 @@ CopyReadLine(CopyFromState cstate) * CopyReadLineText - inner loop of CopyReadLine for text mode */ static bool -CopyReadLineText(CopyFromState cstate) +CopyReadLineText(CopyFromState cstate, bool is_csv) { char *copy_input_buf; int input_buf_ptr; @@ -1178,7 +1232,11 @@ CopyReadLineText(CopyFromState cstate) char quotec = '\0'; char escapec = '\0'; - if (cstate->opts.csv_mode) + /* + * is_csv will be optimized away by compiler, as argument is constant at + * caller. + */ + if (is_csv) { quotec = cstate->opts.quote[0]; escapec = cstate->opts.escape[0]; @@ -1255,7 +1313,11 @@ CopyReadLineText(CopyFromState cstate) prev_raw_ptr = input_buf_ptr; c = copy_input_buf[input_buf_ptr++]; - if (cstate->opts.csv_mode) + /* + * is_csv will be optimized away by compiler, as argument is constant + * at caller. + */ + if (is_csv) { /* * If character is '\r', we may need to look ahead below. Force @@ -1294,7 +1356,7 @@ CopyReadLineText(CopyFromState cstate) } /* Process \r */ - if (c == '\r' && (!cstate->opts.csv_mode || !in_quote)) + if (c == '\r' && (!is_csv || !in_quote)) { /* Check for \r\n on first line, _and_ handle \r\n. */ if (cstate->eol_type == EOL_UNKNOWN || @@ -1322,10 +1384,10 @@ CopyReadLineText(CopyFromState cstate) if (cstate->eol_type == EOL_CRNL) ereport(ERROR, (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - !cstate->opts.csv_mode ? + !is_csv ? errmsg("literal carriage return found in data") : errmsg("unquoted carriage return found in data"), - !cstate->opts.csv_mode ? + !is_csv ? errhint("Use \"\\r\" to represent carriage return.") : errhint("Use quoted CSV field to represent carriage return."))); @@ -1339,10 +1401,10 @@ CopyReadLineText(CopyFromState cstate) else if (cstate->eol_type == EOL_NL) ereport(ERROR, (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - !cstate->opts.csv_mode ? + !is_csv ? errmsg("literal carriage return found in data") : errmsg("unquoted carriage return found in data"), - !cstate->opts.csv_mode ? + !is_csv ? errhint("Use \"\\r\" to represent carriage return.") : errhint("Use quoted CSV field to represent carriage return."))); /* If reach here, we have found the line terminator */ @@ -1350,15 +1412,15 @@ CopyReadLineText(CopyFromState cstate) } /* Process \n */ - if (c == '\n' && (!cstate->opts.csv_mode || !in_quote)) + if (c == '\n' && (!is_csv || !in_quote)) { if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL) ereport(ERROR, (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), - !cstate->opts.csv_mode ? + !is_csv ? errmsg("literal newline found in data") : errmsg("unquoted newline found in data"), - !cstate->opts.csv_mode ? + !is_csv ? errhint("Use \"\\n\" to represent newline.") : errhint("Use quoted CSV field to represent newline."))); cstate->eol_type = EOL_NL; /* in case not set yet */ @@ -1370,7 +1432,7 @@ CopyReadLineText(CopyFromState cstate) * Process backslash, except in CSV mode where backslash is a normal * character. */ - if (c == '\\' && !cstate->opts.csv_mode) + if (c == '\\' && !is_csv) { char c2; diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h index 06dfdfef721..7bc044e2816 100644 --- a/src/include/commands/copy.h +++ b/src/include/commands/copy.h @@ -107,8 +107,6 @@ extern CopyFromState BeginCopyFrom(ParseState *pstate, Relation rel, Node *where extern void EndCopyFrom(CopyFromState cstate); extern bool NextCopyFrom(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls); -extern bool NextCopyFromRawFields(CopyFromState cstate, - char ***fields, int *nfields); extern void CopyFromErrorCallback(void *arg); extern char *CopyLimitPrintoutLength(const char *str); diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index be29e3fbdef..51e131e5e8a 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -1,7 +1,7 @@ /*------------------------------------------------------------------------- * * copyapi.h - * API for COPY TO handlers + * API for COPY TO/FROM handlers * * * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group @@ -54,4 +54,50 @@ typedef struct CopyToRoutine void (*CopyToEnd) (CopyToState cstate); } CopyToRoutine; +/* + * API structure for a COPY FROM format implementation. Note this must be + * allocated in a server-lifetime manner, typically as a static const struct. + */ +typedef struct CopyFromRoutine +{ + /* + * Set input function information. This callback is called once at the + * beginning of COPY FROM. + * + * 'finfo' can be optionally filled to provide the catalog information of + * the input function. + * + * 'typioparam' can be optionally filled to define the OID of the type to + * pass to the input function.'atttypid' is the OID of data type used by + * the relation's attribute. + */ + void (*CopyFromInFunc) (CopyFromState cstate, Oid atttypid, + FmgrInfo *finfo, Oid *typioparam); + + /* + * Start a COPY FROM. This callback is called once at the beginning of + * COPY FROM. + * + * 'tupDesc' is the tuple descriptor of the relation where the data needs + * to be copied. This can be used for any initialization steps required + * by a format. + */ + void (*CopyFromStart) (CopyFromState cstate, TupleDesc tupDesc); + + /* + * Read one row from the source and fill *values and *nulls. + * + * 'econtext' is used to evaluate default expression for each column that + * is either not read from the file or is using the DEFAULT option of COPY + * FROM. It is NULL if no default values are used. + * + * Returns false if there are no more tuples to read. + */ + bool (*CopyFromOneRow) (CopyFromState cstate, ExprContext *econtext, + Datum *values, bool *nulls); + + /* End a COPY FROM. This callback is called once at the end of COPY FROM */ + void (*CopyFromEnd) (CopyFromState cstate); +} CopyFromRoutine; + #endif /* COPYAPI_H */ diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h index 1d8ac8f62e6..e1affe3dfa7 100644 --- a/src/include/commands/copyfrom_internal.h +++ b/src/include/commands/copyfrom_internal.h @@ -14,7 +14,7 @@ #ifndef COPYFROM_INTERNAL_H #define COPYFROM_INTERNAL_H -#include "commands/copy.h" +#include "commands/copyapi.h" #include "commands/trigger.h" #include "nodes/miscnodes.h" @@ -58,6 +58,9 @@ typedef enum CopyInsertMethod */ typedef struct CopyFromStateData { + /* format routine */ + const CopyFromRoutine *routine; + /* low-level state data */ CopySource copy_src; /* type of copy source */ FILE *copy_file; /* used if copy_src == COPY_FILE */ @@ -183,4 +186,12 @@ typedef struct CopyFromStateData extern void ReceiveCopyBegin(CopyFromState cstate); extern void ReceiveCopyBinaryHeader(CopyFromState cstate); +/* One-row callbacks for built-in formats defined in copyfromparse.c */ +extern bool CopyFromTextOneRow(CopyFromState cstate, ExprContext *econtext, + Datum *values, bool *nulls); +extern bool CopyFromCSVOneRow(CopyFromState cstate, ExprContext *econtext, + Datum *values, bool *nulls); +extern bool CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext, + Datum *values, bool *nulls); + #endif /* COPYFROM_INTERNAL_H */ diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list index 1cbb3628857..afdafefeb9b 100644 --- a/src/tools/pgindent/typedefs.list +++ b/src/tools/pgindent/typedefs.list @@ -497,6 +497,7 @@ ConvertRowtypeExpr CookedConstraint CopyDest CopyFormatOptions +CopyFromRoutine CopyFromState CopyFromStateData CopyHeaderChoice -- 2.47.1 From 666dd4bad571674c9e8be296a911acc92e50e440 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Mon, 25 Nov 2024 12:19:15 +0900 Subject: [PATCH v30 3/9] Add support for adding custom COPY TO format This uses the handler approach like tablesample. The approach creates an internal function that returns an internal struct. In this case, a COPY TO handler returns a CopyToRoutine. This also add a test module for custom COPY TO handler. --- src/backend/commands/copy.c | 97 ++++++++++++++++--- src/backend/commands/copyto.c | 20 ++-- src/backend/nodes/Makefile | 1 + src/backend/nodes/gen_node_support.pl | 2 + src/backend/utils/adt/pseudotypes.c | 1 + src/include/catalog/pg_proc.dat | 6 ++ src/include/catalog/pg_type.dat | 6 ++ src/include/commands/copy.h | 1 + src/include/commands/copyapi.h | 4 +- src/include/commands/copyto_internal.h | 21 ++++ src/include/nodes/meson.build | 1 + src/test/modules/Makefile | 1 + src/test/modules/meson.build | 1 + src/test/modules/test_copy_format/.gitignore | 4 + src/test/modules/test_copy_format/Makefile | 23 +++++ .../expected/test_copy_format.out | 17 ++++ src/test/modules/test_copy_format/meson.build | 33 +++++++ .../test_copy_format/sql/test_copy_format.sql | 5 + .../test_copy_format--1.0.sql | 8 ++ .../test_copy_format/test_copy_format.c | 63 ++++++++++++ .../test_copy_format/test_copy_format.control | 4 + 21 files changed, 295 insertions(+), 24 deletions(-) mode change 100644 => 100755 src/backend/nodes/gen_node_support.pl create mode 100644 src/include/commands/copyto_internal.h create mode 100644 src/test/modules/test_copy_format/.gitignore create mode 100644 src/test/modules/test_copy_format/Makefile create mode 100644 src/test/modules/test_copy_format/expected/test_copy_format.out create mode 100644 src/test/modules/test_copy_format/meson.build create mode 100644 src/test/modules/test_copy_format/sql/test_copy_format.sql create mode 100644 src/test/modules/test_copy_format/test_copy_format--1.0.sql create mode 100644 src/test/modules/test_copy_format/test_copy_format.c create mode 100644 src/test/modules/test_copy_format/test_copy_format.control diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index cfca9d9dc29..9500156b163 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -22,7 +22,7 @@ #include "access/table.h" #include "access/xact.h" #include "catalog/pg_authid.h" -#include "commands/copy.h" +#include "commands/copyapi.h" #include "commands/defrem.h" #include "executor/executor.h" #include "mb/pg_wchar.h" @@ -32,6 +32,7 @@ #include "parser/parse_coerce.h" #include "parser/parse_collate.h" #include "parser/parse_expr.h" +#include "parser/parse_func.h" #include "parser/parse_relation.h" #include "utils/acl.h" #include "utils/builtins.h" @@ -476,6 +477,79 @@ defGetCopyLogVerbosityChoice(DefElem *def, ParseState *pstate) return COPY_LOG_VERBOSITY_DEFAULT; /* keep compiler quiet */ } +/* + * Process the "format" option. + * + * This function checks whether the option value is a built-in format such as + * "text" and "csv" or not. If the option value isn't a built-in format, this + * function finds a COPY format handler that returns a CopyToRoutine (for + * is_from == false). If no COPY format handler is found, this function + * reports an error. + */ +static void +ProcessCopyOptionFormat(ParseState *pstate, + CopyFormatOptions *opts_out, + bool is_from, + DefElem *defel) +{ + char *format; + bool isBuiltin; + Oid funcargtypes[1]; + Oid handlerOid = InvalidOid; + Datum datum; + Node *routine; + + format = defGetString(defel); + + isBuiltin = true; + opts_out->csv_mode = false; + opts_out->binary = false; + /* built-in formats */ + if (strcmp(format, "text") == 0) + /* "csv_mode == false && binary == false" means "text" */ ; + else if (strcmp(format, "csv") == 0) + opts_out->csv_mode = true; + else if (strcmp(format, "binary") == 0) + opts_out->binary = true; + else + isBuiltin = false; + if (isBuiltin) + { + if (!is_from) + opts_out->routine = (Node *) CopyToGetBuiltinRoutine(opts_out); + return; + } + + /* custom format */ + if (!is_from) + { + funcargtypes[0] = INTERNALOID; + handlerOid = LookupFuncName(list_make1(makeString(format)), 1, + funcargtypes, true); + } + if (!OidIsValid(handlerOid)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("COPY format \"%s\" not recognized", format), + parser_errposition(pstate, defel->location))); + + datum = OidFunctionCall1(handlerOid, BoolGetDatum(is_from)); + routine = (Node *) DatumGetPointer(datum); + if (routine == NULL || !IsA(routine, CopyToRoutine)) + ereport( + ERROR, + (errcode( + ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("COPY handler function " + "%s(%u) did not return a " + "CopyToRoutine struct", + format, handlerOid), + parser_errposition( + pstate, defel->location))); + + opts_out->routine = routine; +} + /* * Process the statement option list for COPY. * @@ -519,22 +593,10 @@ ProcessCopyOptions(ParseState *pstate, if (strcmp(defel->defname, "format") == 0) { - char *fmt = defGetString(defel); - if (format_specified) errorConflictingDefElem(defel, pstate); format_specified = true; - if (strcmp(fmt, "text") == 0) - /* default format */ ; - else if (strcmp(fmt, "csv") == 0) - opts_out->csv_mode = true; - else if (strcmp(fmt, "binary") == 0) - opts_out->binary = true; - else - ereport(ERROR, - (errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("COPY format \"%s\" not recognized", fmt), - parser_errposition(pstate, defel->location))); + ProcessCopyOptionFormat(pstate, opts_out, is_from, defel); } else if (strcmp(defel->defname, "freeze") == 0) { @@ -685,6 +747,13 @@ ProcessCopyOptions(ParseState *pstate, parser_errposition(pstate, defel->location))); } + /* If format option isn't specified, we use a built-in routine. */ + if (!format_specified) + { + if (!is_from) + opts_out->routine = (Node *) CopyToGetBuiltinRoutine(opts_out); + } + /* * Check for incompatible options (must do these three before inserting * defaults) diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c index 26c67ddc351..f7f44b368b7 100644 --- a/src/backend/commands/copyto.c +++ b/src/backend/commands/copyto.c @@ -150,6 +150,7 @@ static void CopySendInt16(CopyToState cstate, int16 val); /* text format */ static const CopyToRoutine CopyToRoutineText = { + .type = T_CopyToRoutine, .CopyToStart = CopyToTextLikeStart, .CopyToOutFunc = CopyToTextLikeOutFunc, .CopyToOneRow = CopyToTextOneRow, @@ -158,6 +159,7 @@ static const CopyToRoutine CopyToRoutineText = { /* CSV format */ static const CopyToRoutine CopyToRoutineCSV = { + .type = T_CopyToRoutine, .CopyToStart = CopyToTextLikeStart, .CopyToOutFunc = CopyToTextLikeOutFunc, .CopyToOneRow = CopyToCSVOneRow, @@ -166,23 +168,23 @@ static const CopyToRoutine CopyToRoutineCSV = { /* binary format */ static const CopyToRoutine CopyToRoutineBinary = { + .type = T_CopyToRoutine, .CopyToStart = CopyToBinaryStart, .CopyToOutFunc = CopyToBinaryOutFunc, .CopyToOneRow = CopyToBinaryOneRow, .CopyToEnd = CopyToBinaryEnd, }; -/* Return a COPY TO routine for the given options */ -static const CopyToRoutine * -CopyToGetRoutine(CopyFormatOptions opts) +/* Return a built-in COPY TO routine for the given options */ +const CopyToRoutine * +CopyToGetBuiltinRoutine(CopyFormatOptions *opts) { - if (opts.csv_mode) + if (opts->csv_mode) return &CopyToRoutineCSV; - else if (opts.binary) + else if (opts->binary) return &CopyToRoutineBinary; - - /* default is text */ - return &CopyToRoutineText; + else + return &CopyToRoutineText; } /* Implementation of the start callback for text and CSV formats */ @@ -703,7 +705,7 @@ BeginCopyTo(ParseState *pstate, ProcessCopyOptions(pstate, &cstate->opts, false /* is_from */ , options); /* Set format routine */ - cstate->routine = CopyToGetRoutine(cstate->opts); + cstate->routine = (const CopyToRoutine *) cstate->opts.routine; /* Process the source/target relation or query */ if (rel) diff --git a/src/backend/nodes/Makefile b/src/backend/nodes/Makefile index 66bbad8e6e0..173ee11811c 100644 --- a/src/backend/nodes/Makefile +++ b/src/backend/nodes/Makefile @@ -49,6 +49,7 @@ node_headers = \ access/sdir.h \ access/tableam.h \ access/tsmapi.h \ + commands/copyapi.h \ commands/event_trigger.h \ commands/trigger.h \ executor/tuptable.h \ diff --git a/src/backend/nodes/gen_node_support.pl b/src/backend/nodes/gen_node_support.pl old mode 100644 new mode 100755 index 7c012c27f88..5d53d32c4a7 --- a/src/backend/nodes/gen_node_support.pl +++ b/src/backend/nodes/gen_node_support.pl @@ -61,6 +61,7 @@ my @all_input_files = qw( access/sdir.h access/tableam.h access/tsmapi.h + commands/copyapi.h commands/event_trigger.h commands/trigger.h executor/tuptable.h @@ -85,6 +86,7 @@ my @nodetag_only_files = qw( access/sdir.h access/tableam.h access/tsmapi.h + commands/copyapi.h commands/event_trigger.h commands/trigger.h executor/tuptable.h diff --git a/src/backend/utils/adt/pseudotypes.c b/src/backend/utils/adt/pseudotypes.c index 317a1f2b282..f2ebc21ca56 100644 --- a/src/backend/utils/adt/pseudotypes.c +++ b/src/backend/utils/adt/pseudotypes.c @@ -370,6 +370,7 @@ PSEUDOTYPE_DUMMY_IO_FUNCS(fdw_handler); PSEUDOTYPE_DUMMY_IO_FUNCS(table_am_handler); PSEUDOTYPE_DUMMY_IO_FUNCS(index_am_handler); PSEUDOTYPE_DUMMY_IO_FUNCS(tsm_handler); +PSEUDOTYPE_DUMMY_IO_FUNCS(copy_handler); PSEUDOTYPE_DUMMY_IO_FUNCS(internal); PSEUDOTYPE_DUMMY_IO_FUNCS(anyelement); PSEUDOTYPE_DUMMY_IO_FUNCS(anynonarray); diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index 5b8c2ad2a54..b231e7a041e 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -7803,6 +7803,12 @@ { oid => '3312', descr => 'I/O', proname => 'tsm_handler_out', prorettype => 'cstring', proargtypes => 'tsm_handler', prosrc => 'tsm_handler_out' }, +{ oid => '8753', descr => 'I/O', + proname => 'copy_handler_in', proisstrict => 'f', prorettype => 'copy_handler', + proargtypes => 'cstring', prosrc => 'copy_handler_in' }, +{ oid => '8754', descr => 'I/O', + proname => 'copy_handler_out', prorettype => 'cstring', + proargtypes => 'copy_handler', prosrc => 'copy_handler_out' }, { oid => '267', descr => 'I/O', proname => 'table_am_handler_in', proisstrict => 'f', prorettype => 'table_am_handler', proargtypes => 'cstring', diff --git a/src/include/catalog/pg_type.dat b/src/include/catalog/pg_type.dat index 6dca77e0a22..340e0cd0a8d 100644 --- a/src/include/catalog/pg_type.dat +++ b/src/include/catalog/pg_type.dat @@ -633,6 +633,12 @@ typcategory => 'P', typinput => 'tsm_handler_in', typoutput => 'tsm_handler_out', typreceive => '-', typsend => '-', typalign => 'i' }, +{ oid => '8752', + descr => 'pseudo-type for the result of a copy to method function', + typname => 'copy_handler', typlen => '4', typbyval => 't', typtype => 'p', + typcategory => 'P', typinput => 'copy_handler_in', + typoutput => 'copy_handler_out', typreceive => '-', typsend => '-', + typalign => 'i' }, { oid => '269', descr => 'pseudo-type for the result of a table AM handler function', typname => 'table_am_handler', typlen => '4', typbyval => 't', typtype => 'p', diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h index 7bc044e2816..2a90b39b6f6 100644 --- a/src/include/commands/copy.h +++ b/src/include/commands/copy.h @@ -87,6 +87,7 @@ typedef struct CopyFormatOptions CopyLogVerbosityChoice log_verbosity; /* verbosity of logged messages */ int64 reject_limit; /* maximum tolerable number of errors */ List *convert_select; /* list of column names (can be NIL) */ + Node *routine; /* CopyToRoutine */ } CopyFormatOptions; /* These are private in commands/copy[from|to].c */ diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index 51e131e5e8a..12e4b1d47a7 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -14,7 +14,7 @@ #ifndef COPYAPI_H #define COPYAPI_H -#include "commands/copy.h" +#include "commands/copyto_internal.h" #include "executor/tuptable.h" #include "nodes/execnodes.h" @@ -24,6 +24,8 @@ */ typedef struct CopyToRoutine { + NodeTag type; + /* * Set output function information. This callback is called once at the * beginning of COPY TO. diff --git a/src/include/commands/copyto_internal.h b/src/include/commands/copyto_internal.h new file mode 100644 index 00000000000..f95d8da8e3e --- /dev/null +++ b/src/include/commands/copyto_internal.h @@ -0,0 +1,21 @@ +/*------------------------------------------------------------------------- + * + * copyto_internal.h + * Internal definitions for COPY TO command. + * + * + * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + * src/include/commands/copyto_internal.h + * + *------------------------------------------------------------------------- + */ +#ifndef COPYTO_INTERNAL_H +#define COPYTO_INTERNAL_H + +#include "commands/copy.h" + +const struct CopyToRoutine *CopyToGetBuiltinRoutine(CopyFormatOptions *opts); + +#endif /* COPYTO_INTERNAL_H */ diff --git a/src/include/nodes/meson.build b/src/include/nodes/meson.build index f3dd5461fef..09f7443195f 100644 --- a/src/include/nodes/meson.build +++ b/src/include/nodes/meson.build @@ -11,6 +11,7 @@ node_support_input_i = [ 'access/sdir.h', 'access/tableam.h', 'access/tsmapi.h', + 'commands/copyapi.h', 'commands/event_trigger.h', 'commands/trigger.h', 'executor/tuptable.h', diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile index c0d3cf0e14b..33e3a49a4fb 100644 --- a/src/test/modules/Makefile +++ b/src/test/modules/Makefile @@ -15,6 +15,7 @@ SUBDIRS = \ spgist_name_ops \ test_bloomfilter \ test_copy_callbacks \ + test_copy_format \ test_custom_rmgrs \ test_ddl_deparse \ test_dsa \ diff --git a/src/test/modules/meson.build b/src/test/modules/meson.build index 4f544a042d4..bf25658793d 100644 --- a/src/test/modules/meson.build +++ b/src/test/modules/meson.build @@ -14,6 +14,7 @@ subdir('spgist_name_ops') subdir('ssl_passphrase_callback') subdir('test_bloomfilter') subdir('test_copy_callbacks') +subdir('test_copy_format') subdir('test_custom_rmgrs') subdir('test_ddl_deparse') subdir('test_dsa') diff --git a/src/test/modules/test_copy_format/.gitignore b/src/test/modules/test_copy_format/.gitignore new file mode 100644 index 00000000000..5dcb3ff9723 --- /dev/null +++ b/src/test/modules/test_copy_format/.gitignore @@ -0,0 +1,4 @@ +# Generated subdirectories +/log/ +/results/ +/tmp_check/ diff --git a/src/test/modules/test_copy_format/Makefile b/src/test/modules/test_copy_format/Makefile new file mode 100644 index 00000000000..8497f91624d --- /dev/null +++ b/src/test/modules/test_copy_format/Makefile @@ -0,0 +1,23 @@ +# src/test/modules/test_copy_format/Makefile + +MODULE_big = test_copy_format +OBJS = \ + $(WIN32RES) \ + test_copy_format.o +PGFILEDESC = "test_copy_format - test custom COPY FORMAT" + +EXTENSION = test_copy_format +DATA = test_copy_format--1.0.sql + +REGRESS = test_copy_format + +ifdef USE_PGXS +PG_CONFIG = pg_config +PGXS := $(shell $(PG_CONFIG) --pgxs) +include $(PGXS) +else +subdir = src/test/modules/test_copy_format +top_builddir = ../../../.. +include $(top_builddir)/src/Makefile.global +include $(top_srcdir)/contrib/contrib-global.mk +endif diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out b/src/test/modules/test_copy_format/expected/test_copy_format.out new file mode 100644 index 00000000000..adfe7d1572a --- /dev/null +++ b/src/test/modules/test_copy_format/expected/test_copy_format.out @@ -0,0 +1,17 @@ +CREATE EXTENSION test_copy_format; +CREATE TABLE public.test (a smallint, b integer, c bigint); +INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789); +COPY public.test FROM stdin WITH (FORMAT 'test_copy_format'); +ERROR: COPY format "test_copy_format" not recognized +LINE 1: COPY public.test FROM stdin WITH (FORMAT 'test_copy_format')... + ^ +COPY public.test TO stdout WITH (FORMAT 'test_copy_format'); +NOTICE: test_copy_format: is_from=false +NOTICE: CopyToOutFunc: atttypid=21 +NOTICE: CopyToOutFunc: atttypid=23 +NOTICE: CopyToOutFunc: atttypid=20 +NOTICE: CopyToStart: natts=3 +NOTICE: CopyToOneRow: tts_nvalid=3 +NOTICE: CopyToOneRow: tts_nvalid=3 +NOTICE: CopyToOneRow: tts_nvalid=3 +NOTICE: CopyToEnd diff --git a/src/test/modules/test_copy_format/meson.build b/src/test/modules/test_copy_format/meson.build new file mode 100644 index 00000000000..4cefe7b709a --- /dev/null +++ b/src/test/modules/test_copy_format/meson.build @@ -0,0 +1,33 @@ +# Copyright (c) 2024, PostgreSQL Global Development Group + +test_copy_format_sources = files( + 'test_copy_format.c', +) + +if host_system == 'windows' + test_copy_format_sources += rc_lib_gen.process(win32ver_rc, extra_args: [ + '--NAME', 'test_copy_format', + '--FILEDESC', 'test_copy_format - test custom COPY FORMAT',]) +endif + +test_copy_format = shared_module('test_copy_format', + test_copy_format_sources, + kwargs: pg_test_mod_args, +) +test_install_libs += test_copy_format + +test_install_data += files( + 'test_copy_format.control', + 'test_copy_format--1.0.sql', +) + +tests += { + 'name': 'test_copy_format', + 'sd': meson.current_source_dir(), + 'bd': meson.current_build_dir(), + 'regress': { + 'sql': [ + 'test_copy_format', + ], + }, +} diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql b/src/test/modules/test_copy_format/sql/test_copy_format.sql new file mode 100644 index 00000000000..810b3d8cedc --- /dev/null +++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql @@ -0,0 +1,5 @@ +CREATE EXTENSION test_copy_format; +CREATE TABLE public.test (a smallint, b integer, c bigint); +INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789); +COPY public.test FROM stdin WITH (FORMAT 'test_copy_format'); +COPY public.test TO stdout WITH (FORMAT 'test_copy_format'); diff --git a/src/test/modules/test_copy_format/test_copy_format--1.0.sql b/src/test/modules/test_copy_format/test_copy_format--1.0.sql new file mode 100644 index 00000000000..d24ea03ce99 --- /dev/null +++ b/src/test/modules/test_copy_format/test_copy_format--1.0.sql @@ -0,0 +1,8 @@ +/* src/test/modules/test_copy_format/test_copy_format--1.0.sql */ + +-- complain if script is sourced in psql, rather than via CREATE EXTENSION +\echo Use "CREATE EXTENSION test_copy_format" to load this file. \quit + +CREATE FUNCTION test_copy_format(internal) + RETURNS copy_handler + AS 'MODULE_PATHNAME' LANGUAGE C; diff --git a/src/test/modules/test_copy_format/test_copy_format.c b/src/test/modules/test_copy_format/test_copy_format.c new file mode 100644 index 00000000000..e064f40473b --- /dev/null +++ b/src/test/modules/test_copy_format/test_copy_format.c @@ -0,0 +1,63 @@ +/*-------------------------------------------------------------------------- + * + * test_copy_format.c + * Code for testing custom COPY format. + * + * Portions Copyright (c) 2024, PostgreSQL Global Development Group + * + * IDENTIFICATION + * src/test/modules/test_copy_format/test_copy_format.c + * + * ------------------------------------------------------------------------- + */ + +#include "postgres.h" + +#include "commands/copyapi.h" +#include "commands/defrem.h" + +PG_MODULE_MAGIC; + +static void +CopyToOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo) +{ + ereport(NOTICE, (errmsg("CopyToOutFunc: atttypid=%d", atttypid))); +} + +static void +CopyToStart(CopyToState cstate, TupleDesc tupDesc) +{ + ereport(NOTICE, (errmsg("CopyToStart: natts=%d", tupDesc->natts))); +} + +static void +CopyToOneRow(CopyToState cstate, TupleTableSlot *slot) +{ + ereport(NOTICE, (errmsg("CopyToOneRow: tts_nvalid=%u", slot->tts_nvalid))); +} + +static void +CopyToEnd(CopyToState cstate) +{ + ereport(NOTICE, (errmsg("CopyToEnd"))); +} + +static const CopyToRoutine CopyToRoutineTestCopyFormat = { + .type = T_CopyToRoutine, + .CopyToOutFunc = CopyToOutFunc, + .CopyToStart = CopyToStart, + .CopyToOneRow = CopyToOneRow, + .CopyToEnd = CopyToEnd, +}; + +PG_FUNCTION_INFO_V1(test_copy_format); +Datum +test_copy_format(PG_FUNCTION_ARGS) +{ + bool is_from = PG_GETARG_BOOL(0); + + ereport(NOTICE, + (errmsg("test_copy_format: is_from=%s", is_from ? "true" : "false"))); + + PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat); +} diff --git a/src/test/modules/test_copy_format/test_copy_format.control b/src/test/modules/test_copy_format/test_copy_format.control new file mode 100644 index 00000000000..f05a6362358 --- /dev/null +++ b/src/test/modules/test_copy_format/test_copy_format.control @@ -0,0 +1,4 @@ +comment = 'Test code for custom COPY format' +default_version = '1.0' +module_pathname = '$libdir/test_copy_format' +relocatable = true -- 2.47.1 From 1b991a14a53e26e9a46ae4bbe054cedffb84b1b3 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Mon, 25 Nov 2024 13:58:33 +0900 Subject: [PATCH v30 4/9] Export CopyToStateData as private data It's for custom COPY TO format handlers implemented as extension. This just moves codes. This doesn't change codes except CopyDest enum values. CopyDest/CopyFrom enum values such as COPY_FILE are conflicted each other. So COPY_DEST_ prefix instead of COPY_ prefix is used for CopyDest enum values. For example, COPY_FILE in CopyDest is renamed to COPY_DEST_FILE. Note that this isn't enough to implement custom COPY TO format handlers as extension. We'll do the followings in a subsequent commit: 1. Add an opaque space for custom COPY TO format handler 2. Export CopySendEndOfRow() to flush buffer --- src/backend/commands/copyto.c | 77 +++----------------------- src/include/commands/copy.h | 2 +- src/include/commands/copyapi.h | 2 - src/include/commands/copyto_internal.h | 64 +++++++++++++++++++++ 4 files changed, 73 insertions(+), 72 deletions(-) diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c index f7f44b368b7..91fa46ddf6f 100644 --- a/src/backend/commands/copyto.c +++ b/src/backend/commands/copyto.c @@ -36,67 +36,6 @@ #include "utils/rel.h" #include "utils/snapmgr.h" -/* - * Represents the different dest cases we need to worry about at - * the bottom level - */ -typedef enum CopyDest -{ - COPY_FILE, /* to file (or a piped program) */ - COPY_FRONTEND, /* to frontend */ - COPY_CALLBACK, /* to callback function */ -} CopyDest; - -/* - * This struct contains all the state variables used throughout a COPY TO - * operation. - * - * Multi-byte encodings: all supported client-side encodings encode multi-byte - * characters by having the first byte's high bit set. Subsequent bytes of the - * character can have the high bit not set. When scanning data in such an - * encoding to look for a match to a single-byte (ie ASCII) character, we must - * use the full pg_encoding_mblen() machinery to skip over multibyte - * characters, else we might find a false match to a trailing byte. In - * supported server encodings, there is no possibility of a false match, and - * it's faster to make useless comparisons to trailing bytes than it is to - * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true - * when we have to do it the hard way. - */ -typedef struct CopyToStateData -{ - /* format-specific routines */ - const CopyToRoutine *routine; - - /* low-level state data */ - CopyDest copy_dest; /* type of copy source/destination */ - FILE *copy_file; /* used if copy_dest == COPY_FILE */ - StringInfo fe_msgbuf; /* used for all dests during COPY TO */ - - int file_encoding; /* file or remote side's character encoding */ - bool need_transcoding; /* file encoding diff from server? */ - bool encoding_embeds_ascii; /* ASCII can be non-first byte? */ - - /* parameters from the COPY command */ - Relation rel; /* relation to copy to */ - QueryDesc *queryDesc; /* executable query to copy from */ - List *attnumlist; /* integer list of attnums to copy */ - char *filename; /* filename, or NULL for STDOUT */ - bool is_program; /* is 'filename' a program to popen? */ - copy_data_dest_cb data_dest_cb; /* function for writing data */ - - CopyFormatOptions opts; - Node *whereClause; /* WHERE condition (or NULL) */ - - /* - * Working state - */ - MemoryContext copycontext; /* per-copy execution context */ - - FmgrInfo *out_functions; /* lookup info for output functions */ - MemoryContext rowcontext; /* per-row evaluation context */ - uint64 bytes_processed; /* number of bytes processed so far */ -} CopyToStateData; - /* DestReceiver for COPY (query) TO */ typedef struct { @@ -406,7 +345,7 @@ SendCopyBegin(CopyToState cstate) for (i = 0; i < natts; i++) pq_sendint16(&buf, format); /* per-column formats */ pq_endmessage(&buf); - cstate->copy_dest = COPY_FRONTEND; + cstate->copy_dest = COPY_DEST_FRONTEND; } static void @@ -453,7 +392,7 @@ CopySendEndOfRow(CopyToState cstate) switch (cstate->copy_dest) { - case COPY_FILE: + case COPY_DEST_FILE: if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1, cstate->copy_file) != 1 || ferror(cstate->copy_file)) @@ -487,11 +426,11 @@ CopySendEndOfRow(CopyToState cstate) errmsg("could not write to COPY file: %m"))); } break; - case COPY_FRONTEND: + case COPY_DEST_FRONTEND: /* Dump the accumulated row as one CopyData message */ (void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len); break; - case COPY_CALLBACK: + case COPY_DEST_CALLBACK: cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len); break; } @@ -512,7 +451,7 @@ CopySendTextLikeEndOfRow(CopyToState cstate) { switch (cstate->copy_dest) { - case COPY_FILE: + case COPY_DEST_FILE: /* Default line termination depends on platform */ #ifndef WIN32 CopySendChar(cstate, '\n'); @@ -520,7 +459,7 @@ CopySendTextLikeEndOfRow(CopyToState cstate) CopySendString(cstate, "\r\n"); #endif break; - case COPY_FRONTEND: + case COPY_DEST_FRONTEND: /* The FE/BE protocol uses \n as newline for all platforms */ CopySendChar(cstate, '\n'); break; @@ -904,12 +843,12 @@ BeginCopyTo(ParseState *pstate, /* See Multibyte encoding comment above */ cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding); - cstate->copy_dest = COPY_FILE; /* default */ + cstate->copy_dest = COPY_DEST_FILE; /* default */ if (data_dest_cb) { progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK; - cstate->copy_dest = COPY_CALLBACK; + cstate->copy_dest = COPY_DEST_CALLBACK; cstate->data_dest_cb = data_dest_cb; } else if (pipe) diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h index 2a90b39b6f6..ef3dc02c56a 100644 --- a/src/include/commands/copy.h +++ b/src/include/commands/copy.h @@ -90,7 +90,7 @@ typedef struct CopyFormatOptions Node *routine; /* CopyToRoutine */ } CopyFormatOptions; -/* These are private in commands/copy[from|to].c */ +/* These are private in commands/copy[from|to]_internal.h */ typedef struct CopyFromStateData *CopyFromState; typedef struct CopyToStateData *CopyToState; diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index 12e4b1d47a7..5d071b378d6 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -15,8 +15,6 @@ #define COPYAPI_H #include "commands/copyto_internal.h" -#include "executor/tuptable.h" -#include "nodes/execnodes.h" /* * API structure for a COPY TO format implementation. Note this must be diff --git a/src/include/commands/copyto_internal.h b/src/include/commands/copyto_internal.h index f95d8da8e3e..2df53dda8a0 100644 --- a/src/include/commands/copyto_internal.h +++ b/src/include/commands/copyto_internal.h @@ -15,6 +15,70 @@ #define COPYTO_INTERNAL_H #include "commands/copy.h" +#include "executor/execdesc.h" +#include "executor/tuptable.h" +#include "nodes/execnodes.h" + +/* + * Represents the different dest cases we need to worry about at + * the bottom level + */ +typedef enum CopyDest +{ + COPY_DEST_FILE, /* to file (or a piped program) */ + COPY_DEST_FRONTEND, /* to frontend */ + COPY_DEST_CALLBACK, /* to callback function */ +} CopyDest; + +/* + * This struct contains all the state variables used throughout a COPY TO + * operation. + * + * Multi-byte encodings: all supported client-side encodings encode multi-byte + * characters by having the first byte's high bit set. Subsequent bytes of the + * character can have the high bit not set. When scanning data in such an + * encoding to look for a match to a single-byte (ie ASCII) character, we must + * use the full pg_encoding_mblen() machinery to skip over multibyte + * characters, else we might find a false match to a trailing byte. In + * supported server encodings, there is no possibility of a false match, and + * it's faster to make useless comparisons to trailing bytes than it is to + * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true + * when we have to do it the hard way. + */ +typedef struct CopyToStateData +{ + /* format-specific routines */ + const struct CopyToRoutine *routine; + + /* low-level state data */ + CopyDest copy_dest; /* type of copy source/destination */ + FILE *copy_file; /* used if copy_dest == COPY_FILE */ + StringInfo fe_msgbuf; /* used for all dests during COPY TO */ + + int file_encoding; /* file or remote side's character encoding */ + bool need_transcoding; /* file encoding diff from server? */ + bool encoding_embeds_ascii; /* ASCII can be non-first byte? */ + + /* parameters from the COPY command */ + Relation rel; /* relation to copy to */ + QueryDesc *queryDesc; /* executable query to copy from */ + List *attnumlist; /* integer list of attnums to copy */ + char *filename; /* filename, or NULL for STDOUT */ + bool is_program; /* is 'filename' a program to popen? */ + copy_data_dest_cb data_dest_cb; /* function for writing data */ + + CopyFormatOptions opts; + Node *whereClause; /* WHERE condition (or NULL) */ + + /* + * Working state + */ + MemoryContext copycontext; /* per-copy execution context */ + + FmgrInfo *out_functions; /* lookup info for output functions */ + MemoryContext rowcontext; /* per-row evaluation context */ + uint64 bytes_processed; /* number of bytes processed so far */ +} CopyToStateData; const struct CopyToRoutine *CopyToGetBuiltinRoutine(CopyFormatOptions *opts); -- 2.47.1 From 597ce8280f57884978ee427486509be2660affbc Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Mon, 25 Nov 2024 14:01:18 +0900 Subject: [PATCH v30 5/9] Add support for implementing custom COPY TO format as extension * Add CopyToStateData::opaque that can be used to keep data for custom COPY TO format implementation * Export CopySendEndOfRow() to flush data in CopyToStateData::fe_msgbuf as CopyToStateFlush() --- src/backend/commands/copyto.c | 12 ++++++++++++ src/include/commands/copyapi.h | 2 ++ src/include/commands/copyto_internal.h | 3 +++ 3 files changed, 17 insertions(+) diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c index 91fa46ddf6f..da281f32950 100644 --- a/src/backend/commands/copyto.c +++ b/src/backend/commands/copyto.c @@ -442,6 +442,18 @@ CopySendEndOfRow(CopyToState cstate) resetStringInfo(fe_msgbuf); } +/* + * Export CopySendEndOfRow() for extensions. We want to keep + * CopySendEndOfRow() as a static function for + * optimization. CopySendEndOfRow() calls in this file may be optimized by a + * compiler. + */ +void +CopyToStateFlush(CopyToState cstate) +{ + CopySendEndOfRow(cstate); +} + /* * Wrapper function of CopySendEndOfRow for text and CSV formats. Sends the * the line termination and do common appropriate things for the end of row. diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index 5d071b378d6..f8167af4c79 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -54,6 +54,8 @@ typedef struct CopyToRoutine void (*CopyToEnd) (CopyToState cstate); } CopyToRoutine; +extern void CopyToStateFlush(CopyToState cstate); + /* * API structure for a COPY FROM format implementation. Note this must be * allocated in a server-lifetime manner, typically as a static const struct. diff --git a/src/include/commands/copyto_internal.h b/src/include/commands/copyto_internal.h index 2df53dda8a0..4b82372691e 100644 --- a/src/include/commands/copyto_internal.h +++ b/src/include/commands/copyto_internal.h @@ -78,6 +78,9 @@ typedef struct CopyToStateData FmgrInfo *out_functions; /* lookup info for output functions */ MemoryContext rowcontext; /* per-row evaluation context */ uint64 bytes_processed; /* number of bytes processed so far */ + + /* For custom format implementation */ + void *opaque; /* private space */ } CopyToStateData; const struct CopyToRoutine *CopyToGetBuiltinRoutine(CopyFormatOptions *opts); -- 2.47.1 From 167c7173e2657d0a88237c5ce3b91739a8ce19fe Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Mon, 25 Nov 2024 14:11:55 +0900 Subject: [PATCH v30 6/9] Add support for adding custom COPY FROM format This uses the same handler for COPY TO and COPY FROM but uses different routine. This uses CopyToRoutine for COPY TO and CopyFromRoutine for COPY FROM. PostgreSQL calls a COPY TO/FROM handler with "is_from" argument. It's true for COPY FROM and false for COPY TO: copy_handler(true) returns CopyToRoutine copy_handler(false) returns CopyFromRoutine This also add a test module for custom COPY FROM handler. --- src/backend/commands/copy.c | 60 ++++++++++++------- src/backend/commands/copyfrom.c | 23 +++---- src/backend/commands/copyfromparse.c | 2 +- src/include/catalog/pg_type.dat | 2 +- src/include/commands/copy.h | 2 +- src/include/commands/copyapi.h | 3 + src/include/commands/copyfrom_internal.h | 6 +- .../expected/test_copy_format.out | 10 +++- .../test_copy_format/sql/test_copy_format.sql | 1 + .../test_copy_format/test_copy_format.c | 39 +++++++++++- 10 files changed, 107 insertions(+), 41 deletions(-) diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index 9500156b163..10f80ef3654 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -483,8 +483,8 @@ defGetCopyLogVerbosityChoice(DefElem *def, ParseState *pstate) * This function checks whether the option value is a built-in format such as * "text" and "csv" or not. If the option value isn't a built-in format, this * function finds a COPY format handler that returns a CopyToRoutine (for - * is_from == false). If no COPY format handler is found, this function - * reports an error. + * is_from == false) or CopyFromRountine (for is_from == true). If no COPY + * format handler is found, this function reports an error. */ static void ProcessCopyOptionFormat(ParseState *pstate, @@ -515,18 +515,17 @@ ProcessCopyOptionFormat(ParseState *pstate, isBuiltin = false; if (isBuiltin) { - if (!is_from) + if (is_from) + opts_out->routine = (Node *) CopyFromGetBuiltinRoutine(opts_out); + else opts_out->routine = (Node *) CopyToGetBuiltinRoutine(opts_out); return; } /* custom format */ - if (!is_from) - { - funcargtypes[0] = INTERNALOID; - handlerOid = LookupFuncName(list_make1(makeString(format)), 1, - funcargtypes, true); - } + funcargtypes[0] = INTERNALOID; + handlerOid = LookupFuncName(list_make1(makeString(format)), 1, + funcargtypes, true); if (!OidIsValid(handlerOid)) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), @@ -535,17 +534,34 @@ ProcessCopyOptionFormat(ParseState *pstate, datum = OidFunctionCall1(handlerOid, BoolGetDatum(is_from)); routine = (Node *) DatumGetPointer(datum); - if (routine == NULL || !IsA(routine, CopyToRoutine)) - ereport( - ERROR, - (errcode( - ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("COPY handler function " - "%s(%u) did not return a " - "CopyToRoutine struct", - format, handlerOid), - parser_errposition( - pstate, defel->location))); + if (is_from) + { + if (routine == NULL || !IsA(routine, CopyFromRoutine)) + ereport( + ERROR, + (errcode( + ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("COPY handler function " + "%s(%u) did not return a " + "CopyFromRoutine struct", + format, handlerOid), + parser_errposition( + pstate, defel->location))); + } + else + { + if (routine == NULL || !IsA(routine, CopyToRoutine)) + ereport( + ERROR, + (errcode( + ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("COPY handler function " + "%s(%u) did not return a " + "CopyToRoutine struct", + format, handlerOid), + parser_errposition( + pstate, defel->location))); + } opts_out->routine = routine; } @@ -750,7 +766,9 @@ ProcessCopyOptions(ParseState *pstate, /* If format option isn't specified, we use a built-in routine. */ if (!format_specified) { - if (!is_from) + if (is_from) + opts_out->routine = (Node *) CopyFromGetBuiltinRoutine(opts_out); + else opts_out->routine = (Node *) CopyToGetBuiltinRoutine(opts_out); } diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index 917fa6605ef..23027a664ec 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -28,8 +28,7 @@ #include "access/tableam.h" #include "access/xact.h" #include "catalog/namespace.h" -#include "commands/copy.h" -#include "commands/copyfrom_internal.h" +#include "commands/copyapi.h" #include "commands/progress.h" #include "commands/trigger.h" #include "executor/execPartition.h" @@ -129,6 +128,7 @@ static void CopyFromBinaryEnd(CopyFromState cstate); /* text format */ static const CopyFromRoutine CopyFromRoutineText = { + .type = T_CopyFromRoutine, .CopyFromInFunc = CopyFromTextLikeInFunc, .CopyFromStart = CopyFromTextLikeStart, .CopyFromOneRow = CopyFromTextOneRow, @@ -137,6 +137,7 @@ static const CopyFromRoutine CopyFromRoutineText = { /* CSV format */ static const CopyFromRoutine CopyFromRoutineCSV = { + .type = T_CopyFromRoutine, .CopyFromInFunc = CopyFromTextLikeInFunc, .CopyFromStart = CopyFromTextLikeStart, .CopyFromOneRow = CopyFromCSVOneRow, @@ -145,23 +146,23 @@ static const CopyFromRoutine CopyFromRoutineCSV = { /* binary format */ static const CopyFromRoutine CopyFromRoutineBinary = { + .type = T_CopyFromRoutine, .CopyFromInFunc = CopyFromBinaryInFunc, .CopyFromStart = CopyFromBinaryStart, .CopyFromOneRow = CopyFromBinaryOneRow, .CopyFromEnd = CopyFromBinaryEnd, }; -/* Return a COPY FROM routine for the given options */ -static const CopyFromRoutine * -CopyFromGetRoutine(CopyFormatOptions opts) +/* Return a built-in COPY FROM routine for the given options */ +const CopyFromRoutine * +CopyFromGetBuiltinRoutine(CopyFormatOptions *opts) { - if (opts.csv_mode) + if (opts->csv_mode) return &CopyFromRoutineCSV; - else if (opts.binary) + else if (opts->binary) return &CopyFromRoutineBinary; - - /* default is text */ - return &CopyFromRoutineText; + else + return &CopyFromRoutineText; } /* Implementation of the start callback for text and CSV formats */ @@ -1567,7 +1568,7 @@ BeginCopyFrom(ParseState *pstate, ProcessCopyOptions(pstate, &cstate->opts, true /* is_from */ , options); /* Set the format routine */ - cstate->routine = CopyFromGetRoutine(cstate->opts); + cstate->routine = (const CopyFromRoutine *) cstate->opts.routine; /* Process the target relation */ cstate->rel = rel; diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index 65f20d332ee..4e6683eb9da 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -62,7 +62,7 @@ #include <unistd.h> #include <sys/stat.h> -#include "commands/copyfrom_internal.h" +#include "commands/copyapi.h" #include "commands/progress.h" #include "executor/executor.h" #include "libpq/libpq.h" diff --git a/src/include/catalog/pg_type.dat b/src/include/catalog/pg_type.dat index 340e0cd0a8d..63b7d65f982 100644 --- a/src/include/catalog/pg_type.dat +++ b/src/include/catalog/pg_type.dat @@ -634,7 +634,7 @@ typoutput => 'tsm_handler_out', typreceive => '-', typsend => '-', typalign => 'i' }, { oid => '8752', - descr => 'pseudo-type for the result of a copy to method function', + descr => 'pseudo-type for the result of a copy to/from method function', typname => 'copy_handler', typlen => '4', typbyval => 't', typtype => 'p', typcategory => 'P', typinput => 'copy_handler_in', typoutput => 'copy_handler_out', typreceive => '-', typsend => '-', diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h index ef3dc02c56a..586d6c0fe2e 100644 --- a/src/include/commands/copy.h +++ b/src/include/commands/copy.h @@ -87,7 +87,7 @@ typedef struct CopyFormatOptions CopyLogVerbosityChoice log_verbosity; /* verbosity of logged messages */ int64 reject_limit; /* maximum tolerable number of errors */ List *convert_select; /* list of column names (can be NIL) */ - Node *routine; /* CopyToRoutine */ + Node *routine; /* CopyToRoutine or CopyFromRoutine */ } CopyFormatOptions; /* These are private in commands/copy[from|to]_internal.h */ diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index f8167af4c79..bf933069fea 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -15,6 +15,7 @@ #define COPYAPI_H #include "commands/copyto_internal.h" +#include "commands/copyfrom_internal.h" /* * API structure for a COPY TO format implementation. Note this must be @@ -62,6 +63,8 @@ extern void CopyToStateFlush(CopyToState cstate); */ typedef struct CopyFromRoutine { + NodeTag type; + /* * Set input function information. This callback is called once at the * beginning of COPY FROM. diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h index e1affe3dfa7..9b3b8336b67 100644 --- a/src/include/commands/copyfrom_internal.h +++ b/src/include/commands/copyfrom_internal.h @@ -14,7 +14,7 @@ #ifndef COPYFROM_INTERNAL_H #define COPYFROM_INTERNAL_H -#include "commands/copyapi.h" +#include "commands/copy.h" #include "commands/trigger.h" #include "nodes/miscnodes.h" @@ -59,7 +59,7 @@ typedef enum CopyInsertMethod typedef struct CopyFromStateData { /* format routine */ - const CopyFromRoutine *routine; + const struct CopyFromRoutine *routine; /* low-level state data */ CopySource copy_src; /* type of copy source */ @@ -194,4 +194,6 @@ extern bool CopyFromCSVOneRow(CopyFromState cstate, ExprContext *econtext, extern bool CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls); +const struct CopyFromRoutine *CopyFromGetBuiltinRoutine(CopyFormatOptions *opts); + #endif /* COPYFROM_INTERNAL_H */ diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out b/src/test/modules/test_copy_format/expected/test_copy_format.out index adfe7d1572a..016893e7026 100644 --- a/src/test/modules/test_copy_format/expected/test_copy_format.out +++ b/src/test/modules/test_copy_format/expected/test_copy_format.out @@ -2,9 +2,13 @@ CREATE EXTENSION test_copy_format; CREATE TABLE public.test (a smallint, b integer, c bigint); INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789); COPY public.test FROM stdin WITH (FORMAT 'test_copy_format'); -ERROR: COPY format "test_copy_format" not recognized -LINE 1: COPY public.test FROM stdin WITH (FORMAT 'test_copy_format')... - ^ +NOTICE: test_copy_format: is_from=true +NOTICE: CopyFromInFunc: atttypid=21 +NOTICE: CopyFromInFunc: atttypid=23 +NOTICE: CopyFromInFunc: atttypid=20 +NOTICE: CopyFromStart: natts=3 +NOTICE: CopyFromOneRow +NOTICE: CopyFromEnd COPY public.test TO stdout WITH (FORMAT 'test_copy_format'); NOTICE: test_copy_format: is_from=false NOTICE: CopyToOutFunc: atttypid=21 diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql b/src/test/modules/test_copy_format/sql/test_copy_format.sql index 810b3d8cedc..0dfdfa00080 100644 --- a/src/test/modules/test_copy_format/sql/test_copy_format.sql +++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql @@ -2,4 +2,5 @@ CREATE EXTENSION test_copy_format; CREATE TABLE public.test (a smallint, b integer, c bigint); INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789); COPY public.test FROM stdin WITH (FORMAT 'test_copy_format'); +\. COPY public.test TO stdout WITH (FORMAT 'test_copy_format'); diff --git a/src/test/modules/test_copy_format/test_copy_format.c b/src/test/modules/test_copy_format/test_copy_format.c index e064f40473b..f6b105659ab 100644 --- a/src/test/modules/test_copy_format/test_copy_format.c +++ b/src/test/modules/test_copy_format/test_copy_format.c @@ -18,6 +18,40 @@ PG_MODULE_MAGIC; +static void +CopyFromInFunc(CopyFromState cstate, Oid atttypid, + FmgrInfo *finfo, Oid *typioparam) +{ + ereport(NOTICE, (errmsg("CopyFromInFunc: atttypid=%d", atttypid))); +} + +static void +CopyFromStart(CopyFromState cstate, TupleDesc tupDesc) +{ + ereport(NOTICE, (errmsg("CopyFromStart: natts=%d", tupDesc->natts))); +} + +static bool +CopyFromOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls) +{ + ereport(NOTICE, (errmsg("CopyFromOneRow"))); + return false; +} + +static void +CopyFromEnd(CopyFromState cstate) +{ + ereport(NOTICE, (errmsg("CopyFromEnd"))); +} + +static const CopyFromRoutine CopyFromRoutineTestCopyFormat = { + .type = T_CopyFromRoutine, + .CopyFromInFunc = CopyFromInFunc, + .CopyFromStart = CopyFromStart, + .CopyFromOneRow = CopyFromOneRow, + .CopyFromEnd = CopyFromEnd, +}; + static void CopyToOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo) { @@ -59,5 +93,8 @@ test_copy_format(PG_FUNCTION_ARGS) ereport(NOTICE, (errmsg("test_copy_format: is_from=%s", is_from ? "true" : "false"))); - PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat); + if (is_from) + PG_RETURN_POINTER(&CopyFromRoutineTestCopyFormat); + else + PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat); } -- 2.47.1 From 4668d3707000b833cb4ef9800f3fc7e6a69c8c45 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Mon, 25 Nov 2024 14:19:34 +0900 Subject: [PATCH v30 7/9] Use COPY_SOURCE_ prefix for CopySource enum values This is for consistency with CopyDest. --- src/backend/commands/copyfrom.c | 4 ++-- src/backend/commands/copyfromparse.c | 10 +++++----- src/include/commands/copyfrom_internal.h | 6 +++--- 3 files changed, 10 insertions(+), 10 deletions(-) diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index 23027a664ec..3f6b0031d94 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -1703,7 +1703,7 @@ BeginCopyFrom(ParseState *pstate, pg_encoding_to_char(GetDatabaseEncoding())))); } - cstate->copy_src = COPY_FILE; /* default */ + cstate->copy_src = COPY_SOURCE_FILE; /* default */ cstate->whereClause = whereClause; @@ -1831,7 +1831,7 @@ BeginCopyFrom(ParseState *pstate, if (data_source_cb) { progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK; - cstate->copy_src = COPY_CALLBACK; + cstate->copy_src = COPY_SOURCE_CALLBACK; cstate->data_source_cb = data_source_cb; } else if (pipe) diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index 4e6683eb9da..f7982bf692f 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -170,7 +170,7 @@ ReceiveCopyBegin(CopyFromState cstate) for (i = 0; i < natts; i++) pq_sendint16(&buf, format); /* per-column formats */ pq_endmessage(&buf); - cstate->copy_src = COPY_FRONTEND; + cstate->copy_src = COPY_SOURCE_FRONTEND; cstate->fe_msgbuf = makeStringInfo(); /* We *must* flush here to ensure FE knows it can send. */ pq_flush(); @@ -238,7 +238,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread) switch (cstate->copy_src) { - case COPY_FILE: + case COPY_SOURCE_FILE: bytesread = fread(databuf, 1, maxread, cstate->copy_file); if (ferror(cstate->copy_file)) ereport(ERROR, @@ -247,7 +247,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread) if (bytesread == 0) cstate->raw_reached_eof = true; break; - case COPY_FRONTEND: + case COPY_SOURCE_FRONTEND: while (maxread > 0 && bytesread < minread && !cstate->raw_reached_eof) { int avail; @@ -330,7 +330,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread) bytesread += avail; } break; - case COPY_CALLBACK: + case COPY_SOURCE_CALLBACK: bytesread = cstate->data_source_cb(databuf, minread, maxread); break; } @@ -1158,7 +1158,7 @@ CopyReadLine(CopyFromState cstate, bool is_csv) * after \. up to the protocol end of copy data. (XXX maybe better * not to treat \. as special?) */ - if (cstate->copy_src == COPY_FRONTEND) + if (cstate->copy_src == COPY_SOURCE_FRONTEND) { int inbytes; diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h index 9b3b8336b67..3743b11faa4 100644 --- a/src/include/commands/copyfrom_internal.h +++ b/src/include/commands/copyfrom_internal.h @@ -24,9 +24,9 @@ */ typedef enum CopySource { - COPY_FILE, /* from file (or a piped program) */ - COPY_FRONTEND, /* from frontend */ - COPY_CALLBACK, /* from callback function */ + COPY_SOURCE_FILE, /* from file (or a piped program) */ + COPY_SOURCE_FRONTEND, /* from frontend */ + COPY_SOURCE_CALLBACK, /* from callback function */ } CopySource; /* -- 2.47.1 From d1eddd43103df14867e45e36d4d16aa91b244a4e Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Mon, 25 Nov 2024 14:21:39 +0900 Subject: [PATCH v30 8/9] Add support for implementing custom COPY FROM format as extension * Add CopyFromStateData::opaque that can be used to keep data for custom COPY From format implementation * Export CopyGetData() to get the next data as CopyFromStateGetData() --- src/backend/commands/copyfromparse.c | 11 +++++++++++ src/include/commands/copyapi.h | 2 ++ src/include/commands/copyfrom_internal.h | 3 +++ 3 files changed, 16 insertions(+) diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index f7982bf692f..650b6b2382b 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -729,6 +729,17 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes) return copied_bytes; } +/* + * Export CopyGetData() for extensions. We want to keep CopyGetData() as a + * static function for optimization. CopyGetData() calls in this file may be + * optimized by a compiler. + */ +int +CopyFromStateGetData(CopyFromState cstate, void *dest, int minread, int maxread) +{ + return CopyGetData(cstate, dest, minread, maxread); +} + /* * Read raw fields in the next line for COPY FROM in text or csv mode. * Return false if no more lines. diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index bf933069fea..d1a1dbeb178 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -105,4 +105,6 @@ typedef struct CopyFromRoutine void (*CopyFromEnd) (CopyFromState cstate); } CopyFromRoutine; +extern int CopyFromStateGetData(CopyFromState cstate, void *dest, int minread, int maxread); + #endif /* COPYAPI_H */ diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h index 3743b11faa4..a65bbbc962e 100644 --- a/src/include/commands/copyfrom_internal.h +++ b/src/include/commands/copyfrom_internal.h @@ -181,6 +181,9 @@ typedef struct CopyFromStateData #define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index) uint64 bytes_processed; /* number of bytes processed so far */ + + /* For custom format implementation */ + void *opaque; /* private space */ } CopyFromStateData; extern void ReceiveCopyBegin(CopyFromState cstate); -- 2.47.1 From efd8de92afa74f0712f2fb540ffe6550e3d815e9 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Wed, 27 Nov 2024 16:23:55 +0900 Subject: [PATCH v30 9/9] Add CopyFromSkipErrorRow() for custom COPY format extension Extensions must call CopyFromSkipErrorRow() when CopyFromOneRow callback reports an error by errsave(). CopyFromSkipErrorRow() handles "ON_ERROR stop" and "LOG_VERBOSITY verbose" cases. --- src/backend/commands/copyfromparse.c | 82 +++++++++++-------- src/include/commands/copyapi.h | 2 + .../expected/test_copy_format.out | 47 +++++++++++ .../test_copy_format/sql/test_copy_format.sql | 24 ++++++ .../test_copy_format/test_copy_format.c | 82 ++++++++++++++++++- 5 files changed, 199 insertions(+), 38 deletions(-) diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index 650b6b2382b..b016f43a711 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -851,6 +851,51 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields, bool i return true; } +/* + * Call this when you report an error by errsave() in your CopyFromOneRow + * callback. This handles "ON_ERROR stop" and "LOG_VERBOSITY verbose" cases + * for you. + */ +void +CopyFromSkipErrorRow(CopyFromState cstate) +{ + Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP); + + cstate->num_errors++; + + if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE) + { + /* + * Since we emit line number and column info in the below notice + * message, we suppress error context information other than the + * relation name. + */ + Assert(!cstate->relname_only); + cstate->relname_only = true; + + if (cstate->cur_attval) + { + char *attval; + + attval = CopyLimitPrintoutLength(cstate->cur_attval); + ereport(NOTICE, + errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": \"%s\"", + (unsigned long long) cstate->cur_lineno, + cstate->cur_attname, + attval)); + pfree(attval); + } + else + ereport(NOTICE, + errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": null input", + (unsigned long long) cstate->cur_lineno, + cstate->cur_attname)); + + /* reset relname_only */ + cstate->relname_only = false; + } +} + /* * Workhorse for CopyFromTextOneRow() and CopyFromCSVOneRow(). * @@ -959,42 +1004,7 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext, (Node *) cstate->escontext, &values[m])) { - Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP); - - cstate->num_errors++; - - if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE) - { - /* - * Since we emit line number and column info in the below - * notice message, we suppress error context information other - * than the relation name. - */ - Assert(!cstate->relname_only); - cstate->relname_only = true; - - if (cstate->cur_attval) - { - char *attval; - - attval = CopyLimitPrintoutLength(cstate->cur_attval); - ereport(NOTICE, - errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": \"%s\"", - (unsigned long long) cstate->cur_lineno, - cstate->cur_attname, - attval)); - pfree(attval); - } - else - ereport(NOTICE, - errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": null input", - (unsigned long long) cstate->cur_lineno, - cstate->cur_attname)); - - /* reset relname_only */ - cstate->relname_only = false; - } - + CopyFromSkipErrorRow(cstate); return true; } diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index d1a1dbeb178..389f887b2c1 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -107,4 +107,6 @@ typedef struct CopyFromRoutine extern int CopyFromStateGetData(CopyFromState cstate, void *dest, int minread, int maxread); +extern void CopyFromSkipErrorRow(CopyFromState cstate); + #endif /* COPYAPI_H */ diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out b/src/test/modules/test_copy_format/expected/test_copy_format.out index 016893e7026..b9a6baa85c0 100644 --- a/src/test/modules/test_copy_format/expected/test_copy_format.out +++ b/src/test/modules/test_copy_format/expected/test_copy_format.out @@ -1,6 +1,8 @@ CREATE EXTENSION test_copy_format; CREATE TABLE public.test (a smallint, b integer, c bigint); INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789); +-- 987 is accepted. +-- 654 is a hard error because ON_ERROR is stop by default. COPY public.test FROM stdin WITH (FORMAT 'test_copy_format'); NOTICE: test_copy_format: is_from=true NOTICE: CopyFromInFunc: atttypid=21 @@ -8,7 +10,50 @@ NOTICE: CopyFromInFunc: atttypid=23 NOTICE: CopyFromInFunc: atttypid=20 NOTICE: CopyFromStart: natts=3 NOTICE: CopyFromOneRow +NOTICE: CopyFromOneRow +ERROR: invalid value: "6" +CONTEXT: COPY test, line 2, column a: "6" +-- 987 is accepted. +-- 654 is a soft error because ON_ERROR is ignore. +COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore); +NOTICE: test_copy_format: is_from=true +NOTICE: CopyFromInFunc: atttypid=21 +NOTICE: CopyFromInFunc: atttypid=23 +NOTICE: CopyFromInFunc: atttypid=20 +NOTICE: CopyFromStart: natts=3 +NOTICE: CopyFromOneRow +NOTICE: CopyFromOneRow +NOTICE: CopyFromOneRow +NOTICE: 1 row was skipped due to data type incompatibility NOTICE: CopyFromEnd +-- 987 is accepted. +-- 654 is a soft error because ON_ERROR is ignore. +COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore, LOG_VERBOSITY verbose); +NOTICE: test_copy_format: is_from=true +NOTICE: CopyFromInFunc: atttypid=21 +NOTICE: CopyFromInFunc: atttypid=23 +NOTICE: CopyFromInFunc: atttypid=20 +NOTICE: CopyFromStart: natts=3 +NOTICE: CopyFromOneRow +NOTICE: CopyFromOneRow +NOTICE: skipping row due to data type incompatibility at line 2 for column "a": "6" +NOTICE: CopyFromOneRow +NOTICE: 1 row was skipped due to data type incompatibility +NOTICE: CopyFromEnd +-- 987 is accepted. +-- 654 is a soft error because ON_ERROR is ignore. +-- 321 is a hard error. +COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore); +NOTICE: test_copy_format: is_from=true +NOTICE: CopyFromInFunc: atttypid=21 +NOTICE: CopyFromInFunc: atttypid=23 +NOTICE: CopyFromInFunc: atttypid=20 +NOTICE: CopyFromStart: natts=3 +NOTICE: CopyFromOneRow +NOTICE: CopyFromOneRow +NOTICE: CopyFromOneRow +ERROR: too much lines: 3 +CONTEXT: COPY test, line 3 COPY public.test TO stdout WITH (FORMAT 'test_copy_format'); NOTICE: test_copy_format: is_from=false NOTICE: CopyToOutFunc: atttypid=21 @@ -18,4 +63,6 @@ NOTICE: CopyToStart: natts=3 NOTICE: CopyToOneRow: tts_nvalid=3 NOTICE: CopyToOneRow: tts_nvalid=3 NOTICE: CopyToOneRow: tts_nvalid=3 +NOTICE: CopyToOneRow: tts_nvalid=3 +NOTICE: CopyToOneRow: tts_nvalid=3 NOTICE: CopyToEnd diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql b/src/test/modules/test_copy_format/sql/test_copy_format.sql index 0dfdfa00080..86db71bce7f 100644 --- a/src/test/modules/test_copy_format/sql/test_copy_format.sql +++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql @@ -1,6 +1,30 @@ CREATE EXTENSION test_copy_format; CREATE TABLE public.test (a smallint, b integer, c bigint); INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789); +-- 987 is accepted. +-- 654 is a hard error because ON_ERROR is stop by default. COPY public.test FROM stdin WITH (FORMAT 'test_copy_format'); +987 +654 +\. +-- 987 is accepted. +-- 654 is a soft error because ON_ERROR is ignore. +COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore); +987 +654 +\. +-- 987 is accepted. +-- 654 is a soft error because ON_ERROR is ignore. +COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore, LOG_VERBOSITY verbose); +987 +654 +\. +-- 987 is accepted. +-- 654 is a soft error because ON_ERROR is ignore. +-- 321 is a hard error. +COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore); +987 +654 +321 \. COPY public.test TO stdout WITH (FORMAT 'test_copy_format'); diff --git a/src/test/modules/test_copy_format/test_copy_format.c b/src/test/modules/test_copy_format/test_copy_format.c index f6b105659ab..d72d5c33c1b 100644 --- a/src/test/modules/test_copy_format/test_copy_format.c +++ b/src/test/modules/test_copy_format/test_copy_format.c @@ -32,10 +32,88 @@ CopyFromStart(CopyFromState cstate, TupleDesc tupDesc) } static bool -CopyFromOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls) +CopyFromOneRow(CopyFromState cstate, ExprContext *econtext, + Datum *values, bool *nulls) { + int n_attributes = list_length(cstate->attnumlist); + char *line; + int line_size = n_attributes + 1; /* +1 is for new line */ + int read_bytes; + ereport(NOTICE, (errmsg("CopyFromOneRow"))); - return false; + + cstate->cur_lineno++; + line = palloc(line_size); + read_bytes = CopyFromStateGetData(cstate, line, line_size, line_size); + if (read_bytes == 0) + return false; + if (read_bytes != line_size) + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("one line must be %d bytes: %d", + line_size, read_bytes))); + + if (cstate->cur_lineno == 1) + { + /* Success */ + TupleDesc tupDesc = RelationGetDescr(cstate->rel); + ListCell *cur; + int i = 0; + + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + int m = attnum - 1; + Form_pg_attribute att = TupleDescAttr(tupDesc, m); + + if (att->atttypid == INT2OID) + { + values[i] = Int16GetDatum(line[i] - '0'); + } + else if (att->atttypid == INT4OID) + { + values[i] = Int32GetDatum(line[i] - '0'); + } + else if (att->atttypid == INT8OID) + { + values[i] = Int64GetDatum(line[i] - '0'); + } + nulls[i] = false; + i++; + } + } + else if (cstate->cur_lineno == 2) + { + /* Soft error */ + TupleDesc tupDesc = RelationGetDescr(cstate->rel); + int attnum = lfirst_int(list_head(cstate->attnumlist)); + int m = attnum - 1; + Form_pg_attribute att = TupleDescAttr(tupDesc, m); + char value[2]; + + cstate->cur_attname = NameStr(att->attname); + value[0] = line[0]; + value[1] = '\0'; + cstate->cur_attval = value; + errsave((Node *) cstate->escontext, + ( + errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), + errmsg("invalid value: \"%c\"", line[0]))); + CopyFromSkipErrorRow(cstate); + cstate->cur_attname = NULL; + cstate->cur_attval = NULL; + return true; + } + else + { + /* Hard error */ + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("too much lines: %llu", + (unsigned long long) cstate->cur_lineno))); + } + + return true; } static void -- 2.47.1
On Fri, Jan 31, 2025 at 3:10 PM Sutou Kouhei <kou@clear-code.com> wrote: > > Hi, > > In <CAD21AoBpWFU4k-_bwrTq0AkFSAdwQqhAsSW188STmu9HxLJ0nQ@mail.gmail.com> > "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 31 Jan 2025 14:25:34 -0800, > Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > I think that CopyToState and CopyFromState are not APIs but the > > execution states. I'm not against exposing CopyToState and > > CopyFromState. What I'd like to avoid is that we end up adding > > everything (including new fields we add in the future) related to copy > > operation to copyapi.h, leading to include copyapi.h into files that > > are not related to custom format api. fdwapi.h and tsmapi.h as > > examples have only a struct having a bunch of callbacks but not the > > execution state data such as SampScanState are not defined there. > > Thanks for sharing examples. But it seems that > fdwapi.h/tsmapi.h (ForeignScanState/SampleScanSate) are not > good examples. It seems that PostgreSQL uses > nodes/execnodes.h for all *ScanState. It seems that the > sparation is not related to *api.h usage. I didn't mean these examples perfectly apply the copyapi.h case. Again, what I'd like to avoid is that we end up adding everything (including new fields we add in the future) related to copy operation to copyapi.h. For example, with v28 that moves both CopyFromState and CopyToState to copyapi.h, file_fdw.c includes unrelated CopyToState struct via copyfrom_internal.h -> copyapi.h. In addition to that, both copyfrom.c and copyfrom_internal.h did the same, which made me think copyfrom_internal.h mostly no longer plays its role. I'm very welcome to other ideas too if they could achieve the same goal. > > > My understanding is that we don't strictly prohibit _internal.h from > > being included in out of core files. For example, file_fdw.c includes > > copyfrom_internal.h in order to access some fields of CopyFromState. > > > > If the name with _internal.h is the problem, we can rename them to > > copyfrom.h and copyto.h. It makes sense to me that the code that needs > > to access the internal of the copy execution state include _internal.h > > header, though. > > Thanks for sharing the file_fdw.c example. I'm OK with > _internal.h suffix because PostgreSQL doesn't prohibit > _internal.h usage by extensions as you mentioned. > > >> > While we get the format routines for custom formats in > >> > ProcessCopyOptionFormat(), we do that for built-in formats in > >> > BeginCopyTo(), which seems odd to me. I think we can have > >> > CopyToGetRoutine() responsible for getting CopyToRoutine for built-in > >> > formats as well as custom format. The same is true for > >> > CopyFromRoutine. > >> > >> I like the current design because we don't need to export > >> CopyToGetBuiltinRoutine() (we can use static for > >> CopyToGetBuiltinRoutine()) but I applied your > >> suggestion. Because it's not a strong opinion. > > > > I meant that ProcessCopyOptionFormat() doesn't not necessarily get the > > routine. An idea is that in ProcessCopyOptionFormat() we just get the > > OID of the handler function, and then set up the format routine in > > BeginCopyTo(). I've attached a patch for this idea (applied on top of > > 0009). > > Oh, sorry. I misunderstood your suggestion. I understand > what you suggested by the patch. Thanks. > > If we use the approach, we can't show error position when a > custom COPY format handler function returns invalid routine > because DefElem for the "format" option isn't available in > BeginCopyTo(). Is it acceptable? If it's acceptable, let's > use the approach. I think we can live with it. All errors happening while processing the copy options don't necessarily show the error position. > Oh, sorry. I forgot to follow function name change in > 0009. I attach the v30 patch set that fixes it in 0009. Thank you for updating the patch! I'll review these patches. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Hi,
In <CAD21AoA3KMddnjxY1hxth3f4f1wo8a8i2icgK6GEKqXNR_e6jA@mail.gmail.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 31 Jan 2025 16:34:52 -0800,
  Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> Again, what I'd like to avoid is that we end up adding everything
> (including new fields we add in the future) related to copy operation
> to copyapi.h. For example, with v28 that moves both CopyFromState and
> CopyToState to copyapi.h, file_fdw.c includes unrelated CopyToState
> struct via copyfrom_internal.h -> copyapi.h. In addition to that, both
> copyfrom.c and copyfrom_internal.h did the same, which made me think
> copyfrom_internal.h mostly no longer plays its role. I'm very welcome
> to other ideas too if they could achieve the same goal.
For the propose, copyapi.h should not include
copy{to,from}_internal.h. If we do it, copyto.c includes
CopyFromState and copyfrom*.c include CopyToState.
What do you think about the following change? Note that
extensions must include copy{to,from}_internal.h explicitly
in addition to copyapi.h.
-----
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 10f80ef3654..a2dc2d04407 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -23,6 +23,8 @@
 #include "access/xact.h"
 #include "catalog/pg_authid.h"
 #include "commands/copyapi.h"
+#include "commands/copyto_internal.h"
+#include "commands/copyfrom_internal.h"
 #include "commands/defrem.h"
 #include "executor/executor.h"
 #include "mb/pg_wchar.h"
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 3f6b0031d94..7bcf1c6544b 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -29,6 +29,7 @@
 #include "access/xact.h"
 #include "catalog/namespace.h"
 #include "commands/copyapi.h"
+#include "commands/copyfrom_internal.h"
 #include "commands/progress.h"
 #include "commands/trigger.h"
 #include "executor/execPartition.h"
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index b016f43a711..7296745d6d2 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -63,6 +63,7 @@
 #include <sys/stat.h>
 
 #include "commands/copyapi.h"
+#include "commands/copyfrom_internal.h"
 #include "commands/progress.h"
 #include "executor/executor.h"
 #include "libpq/libpq.h"
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index da281f32950..a69771ea6da 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -20,6 +20,7 @@
 
 #include "access/tableam.h"
 #include "commands/copyapi.h"
+#include "commands/copyto_internal.h"
 #include "commands/progress.h"
 #include "executor/execdesc.h"
 #include "executor/executor.h"
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 389f887b2c1..dfab62372a7 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -14,8 +14,7 @@
 #ifndef COPYAPI_H
 #define COPYAPI_H
 
-#include "commands/copyto_internal.h"
-#include "commands/copyfrom_internal.h"
+#include "commands/copy.h"
 
 /*
  * API structure for a COPY TO format implementation. Note this must be
diff --git a/src/test/modules/test_copy_format/test_copy_format.c
b/src/test/modules/test_copy_format/test_copy_format.c
index d72d5c33c1b..c05d65557a9 100644
--- a/src/test/modules/test_copy_format/test_copy_format.c
+++ b/src/test/modules/test_copy_format/test_copy_format.c
@@ -14,6 +14,8 @@
 #include "postgres.h"
 
 #include "commands/copyapi.h"
+#include "commands/copyfrom_internal.h"
+#include "commands/copyto_internal.h"
 #include "commands/defrem.h"
 
 PG_MODULE_MAGIC;
-----
>> If we use the approach, we can't show error position when a
>> custom COPY format handler function returns invalid routine
>> because DefElem for the "format" option isn't available in
>> BeginCopyTo(). Is it acceptable? If it's acceptable, let's
>> use the approach.
> 
> I think we can live with it. All errors happening while processing the
> copy options don't necessarily show the error position.
OK. I attach the v31 patch set that uses this
approach. Mainly, 0003 and 0006 were changed. The v31 patch
set also includes the above
copyapi.h/copy{to,from}_internal.h related changes.
If we have a feature that returns a function name from Oid,
we can improve error messages by including function name
(format name) when a custom format handler function returns
not Copy{To,From}Routine...
Thanks,
-- 
kou
From 7c9a6d7be003f5a63d12e4c3c3a30231c726c794 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Sat, 28 Sep 2024 23:24:49 +0900
Subject: [PATCH v31 1/9] Refactor COPY TO to use format callback functions.
This commit introduces a new CopyToRoutine struct, which is a set of
callback routines to copy tuples in a specific format. It also makes
the existing formats (text, CSV, and binary) utilize these format
callbacks.
This change is a preliminary step towards making the COPY TO command
extensible in terms of output formats.
Additionally, this refactoring contributes to a performance
improvement by reducing the number of "if" branches that need to be
checked on a per-row basis when sending field representations in text
or CSV mode. The performance benchmark results showed ~5% performance
gain in text or CSV mode.
Author: Sutou Kouhei
Reviewed-by: Michael Paquier, Tomas Vondra, Masahiko Sawada
Reviewed-by: Junwang Zhao
Discussion: https://postgr.es/m/20231204.153548.2126325458835528809.kou@clear-code.com
---
 src/backend/commands/copyto.c    | 441 +++++++++++++++++++++----------
 src/include/commands/copyapi.h   |  55 ++++
 src/tools/pgindent/typedefs.list |   1 +
 3 files changed, 356 insertions(+), 141 deletions(-)
 create mode 100644 src/include/commands/copyapi.h
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 99cb23cb347..26c67ddc351 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -19,7 +19,7 @@
 #include <sys/stat.h>
 
 #include "access/tableam.h"
-#include "commands/copy.h"
+#include "commands/copyapi.h"
 #include "commands/progress.h"
 #include "executor/execdesc.h"
 #include "executor/executor.h"
@@ -64,6 +64,9 @@ typedef enum CopyDest
  */
 typedef struct CopyToStateData
 {
+    /* format-specific routines */
+    const CopyToRoutine *routine;
+
     /* low-level state data */
     CopyDest    copy_dest;        /* type of copy source/destination */
     FILE       *copy_file;        /* used if copy_dest == COPY_FILE */
@@ -114,6 +117,19 @@ static void CopyAttributeOutText(CopyToState cstate, const char *string);
 static void CopyAttributeOutCSV(CopyToState cstate, const char *string,
                                 bool use_quote);
 
+/* built-in format-specific routines */
+static void CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc);
+static void CopyToTextLikeOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo);
+static void CopyToTextOneRow(CopyToState cstate, TupleTableSlot *slot);
+static void CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot);
+static void CopyToTextLikeOneRow(CopyToState cstate, TupleTableSlot *slot,
+                                 bool is_csv);
+static void CopyToTextLikeEnd(CopyToState cstate);
+static void CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc);
+static void CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo);
+static void CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot);
+static void CopyToBinaryEnd(CopyToState cstate);
+
 /* Low-level communications functions */
 static void SendCopyBegin(CopyToState cstate);
 static void SendCopyEnd(CopyToState cstate);
@@ -121,9 +137,254 @@ static void CopySendData(CopyToState cstate, const void *databuf, int datasize);
 static void CopySendString(CopyToState cstate, const char *str);
 static void CopySendChar(CopyToState cstate, char c);
 static void CopySendEndOfRow(CopyToState cstate);
+static void CopySendTextLikeEndOfRow(CopyToState cstate);
 static void CopySendInt32(CopyToState cstate, int32 val);
 static void CopySendInt16(CopyToState cstate, int16 val);
 
+/*
+ * COPY TO routines for built-in formats.
+ *
+ * CSV and text formats share the same TextLike routines except for the
+ * one-row callback.
+ */
+
+/* text format */
+static const CopyToRoutine CopyToRoutineText = {
+    .CopyToStart = CopyToTextLikeStart,
+    .CopyToOutFunc = CopyToTextLikeOutFunc,
+    .CopyToOneRow = CopyToTextOneRow,
+    .CopyToEnd = CopyToTextLikeEnd,
+};
+
+/* CSV format */
+static const CopyToRoutine CopyToRoutineCSV = {
+    .CopyToStart = CopyToTextLikeStart,
+    .CopyToOutFunc = CopyToTextLikeOutFunc,
+    .CopyToOneRow = CopyToCSVOneRow,
+    .CopyToEnd = CopyToTextLikeEnd,
+};
+
+/* binary format */
+static const CopyToRoutine CopyToRoutineBinary = {
+    .CopyToStart = CopyToBinaryStart,
+    .CopyToOutFunc = CopyToBinaryOutFunc,
+    .CopyToOneRow = CopyToBinaryOneRow,
+    .CopyToEnd = CopyToBinaryEnd,
+};
+
+/* Return a COPY TO routine for the given options */
+static const CopyToRoutine *
+CopyToGetRoutine(CopyFormatOptions opts)
+{
+    if (opts.csv_mode)
+        return &CopyToRoutineCSV;
+    else if (opts.binary)
+        return &CopyToRoutineBinary;
+
+    /* default is text */
+    return &CopyToRoutineText;
+}
+
+/* Implementation of the start callback for text and CSV formats */
+static void
+CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
+{
+    /*
+     * For non-binary copy, we need to convert null_print to file encoding,
+     * because it will be sent directly with CopySendString.
+     */
+    if (cstate->need_transcoding)
+        cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print,
+                                                          cstate->opts.null_print_len,
+                                                          cstate->file_encoding);
+
+    /* if a header has been requested send the line */
+    if (cstate->opts.header_line)
+    {
+        ListCell   *cur;
+        bool        hdr_delim = false;
+
+        foreach(cur, cstate->attnumlist)
+        {
+            int            attnum = lfirst_int(cur);
+            char       *colname;
+
+            if (hdr_delim)
+                CopySendChar(cstate, cstate->opts.delim[0]);
+            hdr_delim = true;
+
+            colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
+
+            if (cstate->opts.csv_mode)
+                CopyAttributeOutCSV(cstate, colname, false);
+            else
+                CopyAttributeOutText(cstate, colname);
+        }
+
+        CopySendTextLikeEndOfRow(cstate);
+    }
+}
+
+/*
+ * Implementation of the outfunc callback for text and CSV formats. Assign
+ * the output function data to the given *finfo.
+ */
+static void
+CopyToTextLikeOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo)
+{
+    Oid            func_oid;
+    bool        is_varlena;
+
+    /* Set output function for an attribute */
+    getTypeOutputInfo(atttypid, &func_oid, &is_varlena);
+    fmgr_info(func_oid, finfo);
+}
+
+/* Implementation of the per-row callback for text format */
+static void
+CopyToTextOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+    CopyToTextLikeOneRow(cstate, slot, false);
+}
+
+/* Implementation of the per-row callback for CSV format */
+static void
+CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+    CopyToTextLikeOneRow(cstate, slot, true);
+}
+
+/*
+ * Workhorse for CopyToTextOneRow() and CopyToCSVOneRow().
+ *
+ * We use pg_attribute_always_inline to reduce function call overheads.
+ */
+static pg_attribute_always_inline void
+CopyToTextLikeOneRow(CopyToState cstate,
+                     TupleTableSlot *slot,
+                     bool is_csv)
+{
+    bool        need_delim = false;
+    FmgrInfo   *out_functions = cstate->out_functions;
+
+    foreach_int(attnum, cstate->attnumlist)
+    {
+        Datum        value = slot->tts_values[attnum - 1];
+        bool        isnull = slot->tts_isnull[attnum - 1];
+
+        if (need_delim)
+            CopySendChar(cstate, cstate->opts.delim[0]);
+        need_delim = true;
+
+        if (isnull)
+        {
+            CopySendString(cstate, cstate->opts.null_print_client);
+        }
+        else
+        {
+            char       *string;
+
+            string = OutputFunctionCall(&out_functions[attnum - 1],
+                                        value);
+
+            /*
+             * is_csv will be optimized away by compiler, as argument is
+             * constant at caller.
+             */
+            if (is_csv)
+                CopyAttributeOutCSV(cstate, string,
+                                    cstate->opts.force_quote_flags[attnum - 1]);
+            else
+                CopyAttributeOutText(cstate, string);
+        }
+    }
+
+    CopySendTextLikeEndOfRow(cstate);
+}
+
+/* Implementation of the end callback for text and CSV formats */
+static void
+CopyToTextLikeEnd(CopyToState cstate)
+{
+    /* Nothing to do here */
+}
+
+/*
+ * Implementation of the start callback for binary format. Send a header
+ * for a binary copy.
+ */
+static void
+CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc)
+{
+    int32        tmp;
+
+    /* Signature */
+    CopySendData(cstate, BinarySignature, 11);
+    /* Flags field */
+    tmp = 0;
+    CopySendInt32(cstate, tmp);
+    /* No header extension */
+    tmp = 0;
+    CopySendInt32(cstate, tmp);
+}
+
+/*
+ * Implementation of the outfunc callback for binary format. Assign
+ * the binary output function to the given *finfo.
+ */
+static void
+CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo)
+{
+    Oid            func_oid;
+    bool        is_varlena;
+
+    /* Set output function for an attribute */
+    getTypeBinaryOutputInfo(atttypid, &func_oid, &is_varlena);
+    fmgr_info(func_oid, finfo);
+}
+
+/* Implementation of the per-row callback for binary format */
+static void
+CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+    FmgrInfo   *out_functions = cstate->out_functions;
+
+    /* Binary per-tuple header */
+    CopySendInt16(cstate, list_length(cstate->attnumlist));
+
+    foreach_int(attnum, cstate->attnumlist)
+    {
+        Datum        value = slot->tts_values[attnum - 1];
+        bool        isnull = slot->tts_isnull[attnum - 1];
+
+        if (isnull)
+        {
+            CopySendInt32(cstate, -1);
+        }
+        else
+        {
+            bytea       *outputbytes;
+
+            outputbytes = SendFunctionCall(&out_functions[attnum - 1],
+                                           value);
+            CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
+            CopySendData(cstate, VARDATA(outputbytes),
+                         VARSIZE(outputbytes) - VARHDRSZ);
+        }
+    }
+
+    CopySendEndOfRow(cstate);
+}
+
+/* Implementation of the end callback for binary format */
+static void
+CopyToBinaryEnd(CopyToState cstate)
+{
+    /* Generate trailer for a binary copy */
+    CopySendInt16(cstate, -1);
+    /* Need to flush out the trailer */
+    CopySendEndOfRow(cstate);
+}
 
 /*
  * Send copy start/stop messages for frontend copies.  These have changed
@@ -191,16 +452,6 @@ CopySendEndOfRow(CopyToState cstate)
     switch (cstate->copy_dest)
     {
         case COPY_FILE:
-            if (!cstate->opts.binary)
-            {
-                /* Default line termination depends on platform */
-#ifndef WIN32
-                CopySendChar(cstate, '\n');
-#else
-                CopySendString(cstate, "\r\n");
-#endif
-            }
-
             if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1,
                        cstate->copy_file) != 1 ||
                 ferror(cstate->copy_file))
@@ -235,10 +486,6 @@ CopySendEndOfRow(CopyToState cstate)
             }
             break;
         case COPY_FRONTEND:
-            /* The FE/BE protocol uses \n as newline for all platforms */
-            if (!cstate->opts.binary)
-                CopySendChar(cstate, '\n');
-
             /* Dump the accumulated row as one CopyData message */
             (void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len);
             break;
@@ -254,6 +501,35 @@ CopySendEndOfRow(CopyToState cstate)
     resetStringInfo(fe_msgbuf);
 }
 
+/*
+ * Wrapper function of CopySendEndOfRow for text and CSV formats. Sends the
+ * the line termination and do common appropriate things for the end of row.
+ */
+static inline void
+CopySendTextLikeEndOfRow(CopyToState cstate)
+{
+    switch (cstate->copy_dest)
+    {
+        case COPY_FILE:
+            /* Default line termination depends on platform */
+#ifndef WIN32
+            CopySendChar(cstate, '\n');
+#else
+            CopySendString(cstate, "\r\n");
+#endif
+            break;
+        case COPY_FRONTEND:
+            /* The FE/BE protocol uses \n as newline for all platforms */
+            CopySendChar(cstate, '\n');
+            break;
+        default:
+            break;
+    }
+
+    /* Now take the actions related to the end of a row */
+    CopySendEndOfRow(cstate);
+}
+
 /*
  * These functions do apply some data conversion
  */
@@ -426,6 +702,9 @@ BeginCopyTo(ParseState *pstate,
     /* Extract options from the statement node tree */
     ProcessCopyOptions(pstate, &cstate->opts, false /* is_from */ , options);
 
+    /* Set format routine */
+    cstate->routine = CopyToGetRoutine(cstate->opts);
+
     /* Process the source/target relation or query */
     if (rel)
     {
@@ -771,19 +1050,10 @@ DoCopyTo(CopyToState cstate)
     foreach(cur, cstate->attnumlist)
     {
         int            attnum = lfirst_int(cur);
-        Oid            out_func_oid;
-        bool        isvarlena;
         Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
 
-        if (cstate->opts.binary)
-            getTypeBinaryOutputInfo(attr->atttypid,
-                                    &out_func_oid,
-                                    &isvarlena);
-        else
-            getTypeOutputInfo(attr->atttypid,
-                              &out_func_oid,
-                              &isvarlena);
-        fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
+        cstate->routine->CopyToOutFunc(cstate, attr->atttypid,
+                                       &cstate->out_functions[attnum - 1]);
     }
 
     /*
@@ -796,56 +1066,7 @@ DoCopyTo(CopyToState cstate)
                                                "COPY TO",
                                                ALLOCSET_DEFAULT_SIZES);
 
-    if (cstate->opts.binary)
-    {
-        /* Generate header for a binary copy */
-        int32        tmp;
-
-        /* Signature */
-        CopySendData(cstate, BinarySignature, 11);
-        /* Flags field */
-        tmp = 0;
-        CopySendInt32(cstate, tmp);
-        /* No header extension */
-        tmp = 0;
-        CopySendInt32(cstate, tmp);
-    }
-    else
-    {
-        /*
-         * For non-binary copy, we need to convert null_print to file
-         * encoding, because it will be sent directly with CopySendString.
-         */
-        if (cstate->need_transcoding)
-            cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print,
-                                                              cstate->opts.null_print_len,
-                                                              cstate->file_encoding);
-
-        /* if a header has been requested send the line */
-        if (cstate->opts.header_line)
-        {
-            bool        hdr_delim = false;
-
-            foreach(cur, cstate->attnumlist)
-            {
-                int            attnum = lfirst_int(cur);
-                char       *colname;
-
-                if (hdr_delim)
-                    CopySendChar(cstate, cstate->opts.delim[0]);
-                hdr_delim = true;
-
-                colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
-
-                if (cstate->opts.csv_mode)
-                    CopyAttributeOutCSV(cstate, colname, false);
-                else
-                    CopyAttributeOutText(cstate, colname);
-            }
-
-            CopySendEndOfRow(cstate);
-        }
-    }
+    cstate->routine->CopyToStart(cstate, tupDesc);
 
     if (cstate->rel)
     {
@@ -884,13 +1105,7 @@ DoCopyTo(CopyToState cstate)
         processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
     }
 
-    if (cstate->opts.binary)
-    {
-        /* Generate trailer for a binary copy */
-        CopySendInt16(cstate, -1);
-        /* Need to flush out the trailer */
-        CopySendEndOfRow(cstate);
-    }
+    cstate->routine->CopyToEnd(cstate);
 
     MemoryContextDelete(cstate->rowcontext);
 
@@ -903,74 +1118,18 @@ DoCopyTo(CopyToState cstate)
 /*
  * Emit one row during DoCopyTo().
  */
-static void
+static inline void
 CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 {
-    FmgrInfo   *out_functions = cstate->out_functions;
     MemoryContext oldcontext;
 
     MemoryContextReset(cstate->rowcontext);
     oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
 
-    if (cstate->opts.binary)
-    {
-        /* Binary per-tuple header */
-        CopySendInt16(cstate, list_length(cstate->attnumlist));
-    }
-
     /* Make sure the tuple is fully deconstructed */
     slot_getallattrs(slot);
 
-    if (!cstate->opts.binary)
-    {
-        bool        need_delim = false;
-
-        foreach_int(attnum, cstate->attnumlist)
-        {
-            Datum        value = slot->tts_values[attnum - 1];
-            bool        isnull = slot->tts_isnull[attnum - 1];
-            char       *string;
-
-            if (need_delim)
-                CopySendChar(cstate, cstate->opts.delim[0]);
-            need_delim = true;
-
-            if (isnull)
-                CopySendString(cstate, cstate->opts.null_print_client);
-            else
-            {
-                string = OutputFunctionCall(&out_functions[attnum - 1],
-                                            value);
-                if (cstate->opts.csv_mode)
-                    CopyAttributeOutCSV(cstate, string,
-                                        cstate->opts.force_quote_flags[attnum - 1]);
-                else
-                    CopyAttributeOutText(cstate, string);
-            }
-        }
-    }
-    else
-    {
-        foreach_int(attnum, cstate->attnumlist)
-        {
-            Datum        value = slot->tts_values[attnum - 1];
-            bool        isnull = slot->tts_isnull[attnum - 1];
-            bytea       *outputbytes;
-
-            if (isnull)
-                CopySendInt32(cstate, -1);
-            else
-            {
-                outputbytes = SendFunctionCall(&out_functions[attnum - 1],
-                                               value);
-                CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
-                CopySendData(cstate, VARDATA(outputbytes),
-                             VARSIZE(outputbytes) - VARHDRSZ);
-            }
-        }
-    }
-
-    CopySendEndOfRow(cstate);
+    cstate->routine->CopyToOneRow(cstate, slot);
 
     MemoryContextSwitchTo(oldcontext);
 }
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
new file mode 100644
index 00000000000..de5dae9cc38
--- /dev/null
+++ b/src/include/commands/copyapi.h
@@ -0,0 +1,55 @@
+/*-------------------------------------------------------------------------
+ *
+ * copyapi.h
+ *      API for COPY TO handlers
+ *
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/commands/copyapi.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef COPYAPI_H
+#define COPYAPI_H
+
+#include "commands/copy.h"
+
+/*
+ * API structure for a COPY TO format implementation. Note this must be
+ * allocated in a server-lifetime manner, typically as a static const struct.
+ */
+typedef struct CopyToRoutine
+{
+    /*
+     * Set output function information. This callback is called once at the
+     * beginning of COPY TO.
+     *
+     * 'finfo' can be optionally filled to provide the catalog information of
+     * the output function.
+     *
+     * 'atttypid' is the OID of data type used by the relation's attribute.
+     */
+    void        (*CopyToOutFunc) (CopyToState cstate, Oid atttypid,
+                                  FmgrInfo *finfo);
+
+    /*
+     * Start a COPY TO. This callback is called once at the beginning of COPY
+     * FROM.
+     *
+     * 'tupDesc' is the tuple descriptor of the relation from where the data
+     * is read.
+     */
+    void        (*CopyToStart) (CopyToState cstate, TupleDesc tupDesc);
+
+    /*
+     * Write one row to the 'slot'.
+     */
+    void        (*CopyToOneRow) (CopyToState cstate, TupleTableSlot *slot);
+
+    /* End a COPY TO. This callback is called once at the end of COPY FROM */
+    void        (*CopyToEnd) (CopyToState cstate);
+} CopyToRoutine;
+
+#endif                            /* COPYAPI_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index a2644a2e653..1cbb3628857 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -508,6 +508,7 @@ CopyMultiInsertInfo
 CopyOnErrorChoice
 CopySource
 CopyStmt
+CopyToRoutine
 CopyToState
 CopyToStateData
 Cost
-- 
2.47.1
From 89f7a7b007ab5958dce18987ad397f6b62f5aed1 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 18 Nov 2024 16:32:43 -0800
Subject: [PATCH v31 2/9] Refactor COPY FROM to use format callback functions.
This commit introduces a new CopyFromRoutine struct, which is a set of
callback routines to read tuples in a specific format. It also makes
COPY FROM with the existing formats (text, CSV, and binary) utilize
these format callbacks.
This change is a preliminary step towards making the COPY TO command
extensible in terms of output formats.
Similar to XXXX, this refactoring contributes to a performance
improvement by reducing the number of "if" branches that need to be
checked on a per-row basis when sending field representations in text
or CSV mode. The performance benchmark results showed ~5% performance
gain in text or CSV mode.
Author: Sutou Kouhei
Reviewed-by: Michael Paquier, Tomas Vondra, Masahiko Sawada
Reviewed-by: Junwang Zhao
Discussion: https://postgr.es/m/20231204.153548.2126325458835528809.kou@clear-code.com
---
 contrib/file_fdw/file_fdw.c              |   1 -
 src/backend/commands/copyfrom.c          | 192 +++++++--
 src/backend/commands/copyfromparse.c     | 505 +++++++++++++----------
 src/include/commands/copy.h              |   2 -
 src/include/commands/copyapi.h           |  48 ++-
 src/include/commands/copyfrom_internal.h |  11 +
 src/tools/pgindent/typedefs.list         |   1 +
 7 files changed, 493 insertions(+), 267 deletions(-)
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 678e754b2b9..323c43dca4a 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -21,7 +21,6 @@
 #include "access/table.h"
 #include "catalog/pg_authid.h"
 #include "catalog/pg_foreign_table.h"
-#include "commands/copy.h"
 #include "commands/copyfrom_internal.h"
 #include "commands/defrem.h"
 #include "commands/explain.h"
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 0cbd05f5602..8b09df0581f 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -28,7 +28,7 @@
 #include "access/tableam.h"
 #include "access/xact.h"
 #include "catalog/namespace.h"
-#include "commands/copy.h"
+#include "commands/copyapi.h"
 #include "commands/copyfrom_internal.h"
 #include "commands/progress.h"
 #include "commands/trigger.h"
@@ -106,6 +106,145 @@ typedef struct CopyMultiInsertInfo
 /* non-export function prototypes */
 static void ClosePipeFromProgram(CopyFromState cstate);
 
+/*
+ * Built-in format-specific routines. One-row callbacks are defined in
+ * copyfromparse.c
+ */
+static void CopyFromTextLikeInFunc(CopyFromState cstate, Oid atttypid, FmgrInfo *finfo,
+                                   Oid *typioparam);
+static void CopyFromTextLikeStart(CopyFromState cstate, TupleDesc tupDesc);
+static void CopyFromTextLikeEnd(CopyFromState cstate);
+static void CopyFromBinaryInFunc(CopyFromState cstate, Oid atttypid,
+                                 FmgrInfo *finfo, Oid *typioparam);
+static void CopyFromBinaryStart(CopyFromState cstate, TupleDesc tupDesc);
+static void CopyFromBinaryEnd(CopyFromState cstate);
+
+
+/*
+ * COPY FROM routines for built-in formats.
+ *
+ * CSV and text formats share the same TextLike routines except for the
+ * one-row callback.
+ */
+
+/* text format */
+static const CopyFromRoutine CopyFromRoutineText = {
+    .CopyFromInFunc = CopyFromTextLikeInFunc,
+    .CopyFromStart = CopyFromTextLikeStart,
+    .CopyFromOneRow = CopyFromTextOneRow,
+    .CopyFromEnd = CopyFromTextLikeEnd,
+};
+
+/* CSV format */
+static const CopyFromRoutine CopyFromRoutineCSV = {
+    .CopyFromInFunc = CopyFromTextLikeInFunc,
+    .CopyFromStart = CopyFromTextLikeStart,
+    .CopyFromOneRow = CopyFromCSVOneRow,
+    .CopyFromEnd = CopyFromTextLikeEnd,
+};
+
+/* binary format */
+static const CopyFromRoutine CopyFromRoutineBinary = {
+    .CopyFromInFunc = CopyFromBinaryInFunc,
+    .CopyFromStart = CopyFromBinaryStart,
+    .CopyFromOneRow = CopyFromBinaryOneRow,
+    .CopyFromEnd = CopyFromBinaryEnd,
+};
+
+/* Return a COPY FROM routine for the given options */
+static const CopyFromRoutine *
+CopyFromGetRoutine(CopyFormatOptions opts)
+{
+    if (opts.csv_mode)
+        return &CopyFromRoutineCSV;
+    else if (opts.binary)
+        return &CopyFromRoutineBinary;
+
+    /* default is text */
+    return &CopyFromRoutineText;
+}
+
+/* Implementation of the start callback for text and CSV formats */
+static void
+CopyFromTextLikeStart(CopyFromState cstate, TupleDesc tupDesc)
+{
+    AttrNumber    attr_count;
+
+    /*
+     * If encoding conversion is needed, we need another buffer to hold the
+     * converted input data.  Otherwise, we can just point input_buf to the
+     * same buffer as raw_buf.
+     */
+    if (cstate->need_transcoding)
+    {
+        cstate->input_buf = (char *) palloc(INPUT_BUF_SIZE + 1);
+        cstate->input_buf_index = cstate->input_buf_len = 0;
+    }
+    else
+        cstate->input_buf = cstate->raw_buf;
+    cstate->input_reached_eof = false;
+
+    initStringInfo(&cstate->line_buf);
+
+    /*
+     * Create workspace for CopyReadAttributes results; used by CSV and text
+     * format.
+     */
+    attr_count = list_length(cstate->attnumlist);
+    cstate->max_fields = attr_count;
+    cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *));
+}
+
+/*
+ * Implementation of the infunc callback for text and CSV formats. Assign
+ * the input function data to the given *finfo.
+ */
+static void
+CopyFromTextLikeInFunc(CopyFromState cstate, Oid atttypid, FmgrInfo *finfo,
+                       Oid *typioparam)
+{
+    Oid            func_oid;
+
+    getTypeInputInfo(atttypid, &func_oid, typioparam);
+    fmgr_info(func_oid, finfo);
+}
+
+/* Implementation of the end callback for text and CSV formats */
+static void
+CopyFromTextLikeEnd(CopyFromState cstate)
+{
+    /* nothing to do */
+}
+
+/* Implementation of the start callback for binary format */
+static void
+CopyFromBinaryStart(CopyFromState cstate, TupleDesc tupDesc)
+{
+    /* Read and verify binary header */
+    ReceiveCopyBinaryHeader(cstate);
+}
+
+/*
+ * Implementation of the infunc callback for binary format. Assign
+ * the binary input function to the given *finfo.
+ */
+static void
+CopyFromBinaryInFunc(CopyFromState cstate, Oid atttypid,
+                     FmgrInfo *finfo, Oid *typioparam)
+{
+    Oid            func_oid;
+
+    getTypeBinaryInputInfo(atttypid, &func_oid, typioparam);
+    fmgr_info(func_oid, finfo);
+}
+
+/* Implementation of the end callback for binary format */
+static void
+CopyFromBinaryEnd(CopyFromState cstate)
+{
+    /* nothing to do */
+}
+
 /*
  * error context callback for COPY FROM
  *
@@ -1396,7 +1535,6 @@ BeginCopyFrom(ParseState *pstate,
                 num_defaults;
     FmgrInfo   *in_functions;
     Oid           *typioparams;
-    Oid            in_func_oid;
     int           *defmap;
     ExprState **defexprs;
     MemoryContext oldcontext;
@@ -1428,6 +1566,9 @@ BeginCopyFrom(ParseState *pstate,
     /* Extract options from the statement node tree */
     ProcessCopyOptions(pstate, &cstate->opts, true /* is_from */ , options);
 
+    /* Set the format routine */
+    cstate->routine = CopyFromGetRoutine(cstate->opts);
+
     /* Process the target relation */
     cstate->rel = rel;
 
@@ -1583,25 +1724,6 @@ BeginCopyFrom(ParseState *pstate,
     cstate->raw_buf_index = cstate->raw_buf_len = 0;
     cstate->raw_reached_eof = false;
 
-    if (!cstate->opts.binary)
-    {
-        /*
-         * If encoding conversion is needed, we need another buffer to hold
-         * the converted input data.  Otherwise, we can just point input_buf
-         * to the same buffer as raw_buf.
-         */
-        if (cstate->need_transcoding)
-        {
-            cstate->input_buf = (char *) palloc(INPUT_BUF_SIZE + 1);
-            cstate->input_buf_index = cstate->input_buf_len = 0;
-        }
-        else
-            cstate->input_buf = cstate->raw_buf;
-        cstate->input_reached_eof = false;
-
-        initStringInfo(&cstate->line_buf);
-    }
-
     initStringInfo(&cstate->attribute_buf);
 
     /* Assign range table and rteperminfos, we'll need them in CopyFrom. */
@@ -1634,13 +1756,9 @@ BeginCopyFrom(ParseState *pstate,
             continue;
 
         /* Fetch the input function and typioparam info */
-        if (cstate->opts.binary)
-            getTypeBinaryInputInfo(att->atttypid,
-                                   &in_func_oid, &typioparams[attnum - 1]);
-        else
-            getTypeInputInfo(att->atttypid,
-                             &in_func_oid, &typioparams[attnum - 1]);
-        fmgr_info(in_func_oid, &in_functions[attnum - 1]);
+        cstate->routine->CopyFromInFunc(cstate, att->atttypid,
+                                        &in_functions[attnum - 1],
+                                        &typioparams[attnum - 1]);
 
         /* Get default info if available */
         defexprs[attnum - 1] = NULL;
@@ -1775,20 +1893,7 @@ BeginCopyFrom(ParseState *pstate,
 
     pgstat_progress_update_multi_param(3, progress_cols, progress_vals);
 
-    if (cstate->opts.binary)
-    {
-        /* Read and verify binary header */
-        ReceiveCopyBinaryHeader(cstate);
-    }
-
-    /* create workspace for CopyReadAttributes results */
-    if (!cstate->opts.binary)
-    {
-        AttrNumber    attr_count = list_length(cstate->attnumlist);
-
-        cstate->max_fields = attr_count;
-        cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *));
-    }
+    cstate->routine->CopyFromStart(cstate, tupDesc);
 
     MemoryContextSwitchTo(oldcontext);
 
@@ -1801,6 +1906,9 @@ BeginCopyFrom(ParseState *pstate,
 void
 EndCopyFrom(CopyFromState cstate)
 {
+    /* Invoke the end callback */
+    cstate->routine->CopyFromEnd(cstate);
+
     /* No COPY FROM related resources except memory. */
     if (cstate->is_program)
     {
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index caccdc8563c..c1872acbbf6 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -62,7 +62,7 @@
 #include <unistd.h>
 #include <sys/stat.h>
 
-#include "commands/copy.h"
+#include "commands/copyapi.h"
 #include "commands/copyfrom_internal.h"
 #include "commands/progress.h"
 #include "executor/executor.h"
@@ -140,8 +140,8 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 
 
 /* non-export function prototypes */
-static bool CopyReadLine(CopyFromState cstate);
-static bool CopyReadLineText(CopyFromState cstate);
+static bool CopyReadLine(CopyFromState cstate, bool is_csv);
+static bool CopyReadLineText(CopyFromState cstate, bool is_csv);
 static int    CopyReadAttributesText(CopyFromState cstate);
 static int    CopyReadAttributesCSV(CopyFromState cstate);
 static Datum CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
@@ -740,9 +740,11 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes)
  * in the relation.
  *
  * NOTE: force_not_null option are not applied to the returned fields.
+ *
+ * We use pg_attribute_always_inline to reduce function call overheads.
  */
-bool
-NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
+static pg_attribute_always_inline bool
+NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields, bool is_csv)
 {
     int            fldct;
     bool        done;
@@ -759,13 +761,17 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
         tupDesc = RelationGetDescr(cstate->rel);
 
         cstate->cur_lineno++;
-        done = CopyReadLine(cstate);
+        done = CopyReadLine(cstate, is_csv);
 
         if (cstate->opts.header_line == COPY_HEADER_MATCH)
         {
             int            fldnum;
 
-            if (cstate->opts.csv_mode)
+            /*
+             * is_csv will be optimized away by compiler, as argument is
+             * constant at caller.
+             */
+            if (is_csv)
                 fldct = CopyReadAttributesCSV(cstate);
             else
                 fldct = CopyReadAttributesText(cstate);
@@ -809,7 +815,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
     cstate->cur_lineno++;
 
     /* Actually read the line into memory here */
-    done = CopyReadLine(cstate);
+    done = CopyReadLine(cstate, is_csv);
 
     /*
      * EOF at start of line means we're done.  If we see EOF after some
@@ -819,8 +825,13 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
     if (done && cstate->line_buf.len == 0)
         return false;
 
-    /* Parse the line into de-escaped field values */
-    if (cstate->opts.csv_mode)
+    /*
+     * Parse the line into de-escaped field values
+     *
+     * is_csv will be optimized away by compiler, as argument is constant at
+     * caller.
+     */
+    if (is_csv)
         fldct = CopyReadAttributesCSV(cstate);
     else
         fldct = CopyReadAttributesText(cstate);
@@ -830,6 +841,244 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
     return true;
 }
 
+/*
+ * Workhorse for CopyFromTextOneRow() and CopyFromCSVOneRow().
+ *
+ * We use pg_attribute_always_inline to reduce function call overheads.
+ */
+static pg_attribute_always_inline bool
+CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
+                       Datum *values, bool *nulls, bool is_csv)
+{
+    TupleDesc    tupDesc;
+    AttrNumber    attr_count;
+    FmgrInfo   *in_functions = cstate->in_functions;
+    Oid           *typioparams = cstate->typioparams;
+    ExprState **defexprs = cstate->defexprs;
+    char      **field_strings;
+    ListCell   *cur;
+    int            fldct;
+    int            fieldno;
+    char       *string;
+
+    tupDesc = RelationGetDescr(cstate->rel);
+    attr_count = list_length(cstate->attnumlist);
+
+    /* read raw fields in the next line */
+    if (!NextCopyFromRawFields(cstate, &field_strings, &fldct, is_csv))
+        return false;
+
+    /* check for overflowing fields */
+    if (attr_count > 0 && fldct > attr_count)
+        ereport(ERROR,
+                (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                 errmsg("extra data after last expected column")));
+
+    fieldno = 0;
+
+    /* Loop to read the user attributes on the line. */
+    foreach(cur, cstate->attnumlist)
+    {
+        int            attnum = lfirst_int(cur);
+        int            m = attnum - 1;
+        Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+        if (fieldno >= fldct)
+            ereport(ERROR,
+                    (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                     errmsg("missing data for column \"%s\"",
+                            NameStr(att->attname))));
+        string = field_strings[fieldno++];
+
+        if (cstate->convert_select_flags &&
+            !cstate->convert_select_flags[m])
+        {
+            /* ignore input field, leaving column as NULL */
+            continue;
+        }
+
+        if (is_csv)
+        {
+            if (string == NULL &&
+                cstate->opts.force_notnull_flags[m])
+            {
+                /*
+                 * FORCE_NOT_NULL option is set and column is NULL - convert
+                 * it to the NULL string.
+                 */
+                string = cstate->opts.null_print;
+            }
+            else if (string != NULL && cstate->opts.force_null_flags[m]
+                     && strcmp(string, cstate->opts.null_print) == 0)
+            {
+                /*
+                 * FORCE_NULL option is set and column matches the NULL
+                 * string. It must have been quoted, or otherwise the string
+                 * would already have been set to NULL. Convert it to NULL as
+                 * specified.
+                 */
+                string = NULL;
+            }
+        }
+
+        cstate->cur_attname = NameStr(att->attname);
+        cstate->cur_attval = string;
+
+        if (string != NULL)
+            nulls[m] = false;
+
+        if (cstate->defaults[m])
+        {
+            /*
+             * The caller must supply econtext and have switched into the
+             * per-tuple memory context in it.
+             */
+            Assert(econtext != NULL);
+            Assert(CurrentMemoryContext == econtext->ecxt_per_tuple_memory);
+
+            values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]);
+        }
+
+        /*
+         * If ON_ERROR is specified with IGNORE, skip rows with soft errors
+         */
+        else if (!InputFunctionCallSafe(&in_functions[m],
+                                        string,
+                                        typioparams[m],
+                                        att->atttypmod,
+                                        (Node *) cstate->escontext,
+                                        &values[m]))
+        {
+            Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
+
+            cstate->num_errors++;
+
+            if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+            {
+                /*
+                 * Since we emit line number and column info in the below
+                 * notice message, we suppress error context information other
+                 * than the relation name.
+                 */
+                Assert(!cstate->relname_only);
+                cstate->relname_only = true;
+
+                if (cstate->cur_attval)
+                {
+                    char       *attval;
+
+                    attval = CopyLimitPrintoutLength(cstate->cur_attval);
+                    ereport(NOTICE,
+                            errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\":
\"%s\"",
+                                   (unsigned long long) cstate->cur_lineno,
+                                   cstate->cur_attname,
+                                   attval));
+                    pfree(attval);
+                }
+                else
+                    ereport(NOTICE,
+                            errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": null
input",
+                                   (unsigned long long) cstate->cur_lineno,
+                                   cstate->cur_attname));
+
+                /* reset relname_only */
+                cstate->relname_only = false;
+            }
+
+            return true;
+        }
+
+        cstate->cur_attname = NULL;
+        cstate->cur_attval = NULL;
+    }
+
+    Assert(fieldno == attr_count);
+
+    return true;
+}
+
+/* Implementation of the per-row callback for text format */
+bool
+CopyFromTextOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values,
+                   bool *nulls)
+{
+    return CopyFromTextLikeOneRow(cstate, econtext, values, nulls, false);
+}
+
+/* Implementation of the per-row callback for CSV format */
+bool
+CopyFromCSVOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values,
+                  bool *nulls)
+{
+    return CopyFromTextLikeOneRow(cstate, econtext, values, nulls, true);
+}
+
+/* Implementation of the per-row callback for binary format */
+bool
+CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values,
+                     bool *nulls)
+{
+    TupleDesc    tupDesc;
+    AttrNumber    attr_count;
+    FmgrInfo   *in_functions = cstate->in_functions;
+    Oid           *typioparams = cstate->typioparams;
+    int16        fld_count;
+    ListCell   *cur;
+
+    tupDesc = RelationGetDescr(cstate->rel);
+    attr_count = list_length(cstate->attnumlist);
+
+    cstate->cur_lineno++;
+
+    if (!CopyGetInt16(cstate, &fld_count))
+    {
+        /* EOF detected (end of file, or protocol-level EOF) */
+        return false;
+    }
+
+    if (fld_count == -1)
+    {
+        /*
+         * Received EOF marker.  Wait for the protocol-level EOF, and complain
+         * if it doesn't come immediately.  In COPY FROM STDIN, this ensures
+         * that we correctly handle CopyFail, if client chooses to send that
+         * now.  When copying from file, we could ignore the rest of the file
+         * like in text mode, but we choose to be consistent with the COPY
+         * FROM STDIN case.
+         */
+        char        dummy;
+
+        if (CopyReadBinaryData(cstate, &dummy, 1) > 0)
+            ereport(ERROR,
+                    (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                     errmsg("received copy data after EOF marker")));
+        return false;
+    }
+
+    if (fld_count != attr_count)
+        ereport(ERROR,
+                (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                 errmsg("row field count is %d, expected %d",
+                        (int) fld_count, attr_count)));
+
+    foreach(cur, cstate->attnumlist)
+    {
+        int            attnum = lfirst_int(cur);
+        int            m = attnum - 1;
+        Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+        cstate->cur_attname = NameStr(att->attname);
+        values[m] = CopyReadBinaryAttribute(cstate,
+                                            &in_functions[m],
+                                            typioparams[m],
+                                            att->atttypmod,
+                                            &nulls[m]);
+        cstate->cur_attname = NULL;
+    }
+
+    return true;
+}
+
 /*
  * Read next tuple from file for COPY FROM. Return false if no more tuples.
  *
@@ -847,216 +1096,22 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 {
     TupleDesc    tupDesc;
     AttrNumber    num_phys_attrs,
-                attr_count,
                 num_defaults = cstate->num_defaults;
-    FmgrInfo   *in_functions = cstate->in_functions;
-    Oid           *typioparams = cstate->typioparams;
     int            i;
     int           *defmap = cstate->defmap;
     ExprState **defexprs = cstate->defexprs;
 
     tupDesc = RelationGetDescr(cstate->rel);
     num_phys_attrs = tupDesc->natts;
-    attr_count = list_length(cstate->attnumlist);
 
     /* Initialize all values for row to NULL */
     MemSet(values, 0, num_phys_attrs * sizeof(Datum));
     MemSet(nulls, true, num_phys_attrs * sizeof(bool));
     MemSet(cstate->defaults, false, num_phys_attrs * sizeof(bool));
 
-    if (!cstate->opts.binary)
-    {
-        char      **field_strings;
-        ListCell   *cur;
-        int            fldct;
-        int            fieldno;
-        char       *string;
-
-        /* read raw fields in the next line */
-        if (!NextCopyFromRawFields(cstate, &field_strings, &fldct))
-            return false;
-
-        /* check for overflowing fields */
-        if (attr_count > 0 && fldct > attr_count)
-            ereport(ERROR,
-                    (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                     errmsg("extra data after last expected column")));
-
-        fieldno = 0;
-
-        /* Loop to read the user attributes on the line. */
-        foreach(cur, cstate->attnumlist)
-        {
-            int            attnum = lfirst_int(cur);
-            int            m = attnum - 1;
-            Form_pg_attribute att = TupleDescAttr(tupDesc, m);
-
-            if (fieldno >= fldct)
-                ereport(ERROR,
-                        (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                         errmsg("missing data for column \"%s\"",
-                                NameStr(att->attname))));
-            string = field_strings[fieldno++];
-
-            if (cstate->convert_select_flags &&
-                !cstate->convert_select_flags[m])
-            {
-                /* ignore input field, leaving column as NULL */
-                continue;
-            }
-
-            if (cstate->opts.csv_mode)
-            {
-                if (string == NULL &&
-                    cstate->opts.force_notnull_flags[m])
-                {
-                    /*
-                     * FORCE_NOT_NULL option is set and column is NULL -
-                     * convert it to the NULL string.
-                     */
-                    string = cstate->opts.null_print;
-                }
-                else if (string != NULL && cstate->opts.force_null_flags[m]
-                         && strcmp(string, cstate->opts.null_print) == 0)
-                {
-                    /*
-                     * FORCE_NULL option is set and column matches the NULL
-                     * string. It must have been quoted, or otherwise the
-                     * string would already have been set to NULL. Convert it
-                     * to NULL as specified.
-                     */
-                    string = NULL;
-                }
-            }
-
-            cstate->cur_attname = NameStr(att->attname);
-            cstate->cur_attval = string;
-
-            if (string != NULL)
-                nulls[m] = false;
-
-            if (cstate->defaults[m])
-            {
-                /*
-                 * The caller must supply econtext and have switched into the
-                 * per-tuple memory context in it.
-                 */
-                Assert(econtext != NULL);
-                Assert(CurrentMemoryContext == econtext->ecxt_per_tuple_memory);
-
-                values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]);
-            }
-
-            /*
-             * If ON_ERROR is specified with IGNORE, skip rows with soft
-             * errors
-             */
-            else if (!InputFunctionCallSafe(&in_functions[m],
-                                            string,
-                                            typioparams[m],
-                                            att->atttypmod,
-                                            (Node *) cstate->escontext,
-                                            &values[m]))
-            {
-                Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
-
-                cstate->num_errors++;
-
-                if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
-                {
-                    /*
-                     * Since we emit line number and column info in the below
-                     * notice message, we suppress error context information
-                     * other than the relation name.
-                     */
-                    Assert(!cstate->relname_only);
-                    cstate->relname_only = true;
-
-                    if (cstate->cur_attval)
-                    {
-                        char       *attval;
-
-                        attval = CopyLimitPrintoutLength(cstate->cur_attval);
-                        ereport(NOTICE,
-                                errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\":
\"%s\"",
-                                       (unsigned long long) cstate->cur_lineno,
-                                       cstate->cur_attname,
-                                       attval));
-                        pfree(attval);
-                    }
-                    else
-                        ereport(NOTICE,
-                                errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\":
nullinput",
 
-                                       (unsigned long long) cstate->cur_lineno,
-                                       cstate->cur_attname));
-
-                    /* reset relname_only */
-                    cstate->relname_only = false;
-                }
-
-                return true;
-            }
-
-            cstate->cur_attname = NULL;
-            cstate->cur_attval = NULL;
-        }
-
-        Assert(fieldno == attr_count);
-    }
-    else
-    {
-        /* binary */
-        int16        fld_count;
-        ListCell   *cur;
-
-        cstate->cur_lineno++;
-
-        if (!CopyGetInt16(cstate, &fld_count))
-        {
-            /* EOF detected (end of file, or protocol-level EOF) */
-            return false;
-        }
-
-        if (fld_count == -1)
-        {
-            /*
-             * Received EOF marker.  Wait for the protocol-level EOF, and
-             * complain if it doesn't come immediately.  In COPY FROM STDIN,
-             * this ensures that we correctly handle CopyFail, if client
-             * chooses to send that now.  When copying from file, we could
-             * ignore the rest of the file like in text mode, but we choose to
-             * be consistent with the COPY FROM STDIN case.
-             */
-            char        dummy;
-
-            if (CopyReadBinaryData(cstate, &dummy, 1) > 0)
-                ereport(ERROR,
-                        (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                         errmsg("received copy data after EOF marker")));
-            return false;
-        }
-
-        if (fld_count != attr_count)
-            ereport(ERROR,
-                    (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                     errmsg("row field count is %d, expected %d",
-                            (int) fld_count, attr_count)));
-
-        foreach(cur, cstate->attnumlist)
-        {
-            int            attnum = lfirst_int(cur);
-            int            m = attnum - 1;
-            Form_pg_attribute att = TupleDescAttr(tupDesc, m);
-
-            cstate->cur_attname = NameStr(att->attname);
-            values[m] = CopyReadBinaryAttribute(cstate,
-                                                &in_functions[m],
-                                                typioparams[m],
-                                                att->atttypmod,
-                                                &nulls[m]);
-            cstate->cur_attname = NULL;
-        }
-    }
+    /* Get one row from source */
+    if (!cstate->routine->CopyFromOneRow(cstate, econtext, values, nulls))
+        return false;
 
     /*
      * Now compute and insert any defaults available for the columns not
@@ -1087,7 +1142,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
  * in the final value of line_buf.
  */
 static bool
-CopyReadLine(CopyFromState cstate)
+CopyReadLine(CopyFromState cstate, bool is_csv)
 {
     bool        result;
 
@@ -1095,7 +1150,7 @@ CopyReadLine(CopyFromState cstate)
     cstate->line_buf_valid = false;
 
     /* Parse data and transfer into line_buf */
-    result = CopyReadLineText(cstate);
+    result = CopyReadLineText(cstate, is_csv);
 
     if (result)
     {
@@ -1163,7 +1218,7 @@ CopyReadLine(CopyFromState cstate)
  * CopyReadLineText - inner loop of CopyReadLine for text mode
  */
 static bool
-CopyReadLineText(CopyFromState cstate)
+CopyReadLineText(CopyFromState cstate, bool is_csv)
 {
     char       *copy_input_buf;
     int            input_buf_ptr;
@@ -1178,7 +1233,11 @@ CopyReadLineText(CopyFromState cstate)
     char        quotec = '\0';
     char        escapec = '\0';
 
-    if (cstate->opts.csv_mode)
+    /*
+     * is_csv will be optimized away by compiler, as argument is constant at
+     * caller.
+     */
+    if (is_csv)
     {
         quotec = cstate->opts.quote[0];
         escapec = cstate->opts.escape[0];
@@ -1255,7 +1314,11 @@ CopyReadLineText(CopyFromState cstate)
         prev_raw_ptr = input_buf_ptr;
         c = copy_input_buf[input_buf_ptr++];
 
-        if (cstate->opts.csv_mode)
+        /*
+         * is_csv will be optimized away by compiler, as argument is constant
+         * at caller.
+         */
+        if (is_csv)
         {
             /*
              * If character is '\r', we may need to look ahead below.  Force
@@ -1294,7 +1357,7 @@ CopyReadLineText(CopyFromState cstate)
         }
 
         /* Process \r */
-        if (c == '\r' && (!cstate->opts.csv_mode || !in_quote))
+        if (c == '\r' && (!is_csv || !in_quote))
         {
             /* Check for \r\n on first line, _and_ handle \r\n. */
             if (cstate->eol_type == EOL_UNKNOWN ||
@@ -1322,10 +1385,10 @@ CopyReadLineText(CopyFromState cstate)
                     if (cstate->eol_type == EOL_CRNL)
                         ereport(ERROR,
                                 (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                                 !cstate->opts.csv_mode ?
+                                 !is_csv ?
                                  errmsg("literal carriage return found in data") :
                                  errmsg("unquoted carriage return found in data"),
-                                 !cstate->opts.csv_mode ?
+                                 !is_csv ?
                                  errhint("Use \"\\r\" to represent carriage return.") :
                                  errhint("Use quoted CSV field to represent carriage return.")));
 
@@ -1339,10 +1402,10 @@ CopyReadLineText(CopyFromState cstate)
             else if (cstate->eol_type == EOL_NL)
                 ereport(ERROR,
                         (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                         !cstate->opts.csv_mode ?
+                         !is_csv ?
                          errmsg("literal carriage return found in data") :
                          errmsg("unquoted carriage return found in data"),
-                         !cstate->opts.csv_mode ?
+                         !is_csv ?
                          errhint("Use \"\\r\" to represent carriage return.") :
                          errhint("Use quoted CSV field to represent carriage return.")));
             /* If reach here, we have found the line terminator */
@@ -1350,15 +1413,15 @@ CopyReadLineText(CopyFromState cstate)
         }
 
         /* Process \n */
-        if (c == '\n' && (!cstate->opts.csv_mode || !in_quote))
+        if (c == '\n' && (!is_csv || !in_quote))
         {
             if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL)
                 ereport(ERROR,
                         (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                         !cstate->opts.csv_mode ?
+                         !is_csv ?
                          errmsg("literal newline found in data") :
                          errmsg("unquoted newline found in data"),
-                         !cstate->opts.csv_mode ?
+                         !is_csv ?
                          errhint("Use \"\\n\" to represent newline.") :
                          errhint("Use quoted CSV field to represent newline.")));
             cstate->eol_type = EOL_NL;    /* in case not set yet */
@@ -1370,7 +1433,7 @@ CopyReadLineText(CopyFromState cstate)
          * Process backslash, except in CSV mode where backslash is a normal
          * character.
          */
-        if (c == '\\' && !cstate->opts.csv_mode)
+        if (c == '\\' && !is_csv)
         {
             char        c2;
 
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 06dfdfef721..7bc044e2816 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -107,8 +107,6 @@ extern CopyFromState BeginCopyFrom(ParseState *pstate, Relation rel, Node *where
 extern void EndCopyFrom(CopyFromState cstate);
 extern bool NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
                          Datum *values, bool *nulls);
-extern bool NextCopyFromRawFields(CopyFromState cstate,
-                                  char ***fields, int *nfields);
 extern void CopyFromErrorCallback(void *arg);
 extern char *CopyLimitPrintoutLength(const char *str);
 
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index de5dae9cc38..39e5a096da5 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -1,7 +1,7 @@
 /*-------------------------------------------------------------------------
  *
  * copyapi.h
- *      API for COPY TO handlers
+ *      API for COPY TO/FROM handlers
  *
  *
  * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
@@ -52,4 +52,50 @@ typedef struct CopyToRoutine
     void        (*CopyToEnd) (CopyToState cstate);
 } CopyToRoutine;
 
+/*
+ * API structure for a COPY FROM format implementation.     Note this must be
+ * allocated in a server-lifetime manner, typically as a static const struct.
+ */
+typedef struct CopyFromRoutine
+{
+    /*
+     * Set input function information. This callback is called once at the
+     * beginning of COPY FROM.
+     *
+     * 'finfo' can be optionally filled to provide the catalog information of
+     * the input function.
+     *
+     * 'typioparam' can be optionally filled to define the OID of the type to
+     * pass to the input function.'atttypid' is the OID of data type used by
+     * the relation's attribute.
+     */
+    void        (*CopyFromInFunc) (CopyFromState cstate, Oid atttypid,
+                                   FmgrInfo *finfo, Oid *typioparam);
+
+    /*
+     * Start a COPY FROM. This callback is called once at the beginning of
+     * COPY FROM.
+     *
+     * 'tupDesc' is the tuple descriptor of the relation where the data needs
+     * to be copied.  This can be used for any initialization steps required
+     * by a format.
+     */
+    void        (*CopyFromStart) (CopyFromState cstate, TupleDesc tupDesc);
+
+    /*
+     * Read one row from the source and fill *values and *nulls.
+     *
+     * 'econtext' is used to evaluate default expression for each column that
+     * is either not read from the file or is using the DEFAULT option of COPY
+     * FROM.  It is NULL if no default values are used.
+     *
+     * Returns false if there are no more tuples to read.
+     */
+    bool        (*CopyFromOneRow) (CopyFromState cstate, ExprContext *econtext,
+                                   Datum *values, bool *nulls);
+
+    /* End a COPY FROM. This callback is called once at the end of COPY FROM */
+    void        (*CopyFromEnd) (CopyFromState cstate);
+} CopyFromRoutine;
+
 #endif                            /* COPYAPI_H */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 1d8ac8f62e6..c8b22af22d8 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -58,6 +58,9 @@ typedef enum CopyInsertMethod
  */
 typedef struct CopyFromStateData
 {
+    /* format routine */
+    const struct CopyFromRoutine *routine;
+
     /* low-level state data */
     CopySource    copy_src;        /* type of copy source */
     FILE       *copy_file;        /* used if copy_src == COPY_FILE */
@@ -183,4 +186,12 @@ typedef struct CopyFromStateData
 extern void ReceiveCopyBegin(CopyFromState cstate);
 extern void ReceiveCopyBinaryHeader(CopyFromState cstate);
 
+/* One-row callbacks for built-in formats defined in copyfromparse.c */
+extern bool CopyFromTextOneRow(CopyFromState cstate, ExprContext *econtext,
+                               Datum *values, bool *nulls);
+extern bool CopyFromCSVOneRow(CopyFromState cstate, ExprContext *econtext,
+                              Datum *values, bool *nulls);
+extern bool CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext,
+                                 Datum *values, bool *nulls);
+
 #endif                            /* COPYFROM_INTERNAL_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 1cbb3628857..afdafefeb9b 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -497,6 +497,7 @@ ConvertRowtypeExpr
 CookedConstraint
 CopyDest
 CopyFormatOptions
+CopyFromRoutine
 CopyFromState
 CopyFromStateData
 CopyHeaderChoice
-- 
2.47.1
From c4a358d58553bec6e9efd5fc497b0c8e71781f7b Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Mon, 25 Nov 2024 12:19:15 +0900
Subject: [PATCH v31 3/9] Add support for adding custom COPY TO format
This uses the handler approach like tablesample. The approach creates
an internal function that returns an internal struct. In this case,
a COPY TO handler returns a CopyToRoutine.
This also add a test module for custom COPY TO handler.
---
 src/backend/commands/copy.c                   | 70 +++++++++++++++----
 src/backend/commands/copyto.c                 | 36 +++++++---
 src/backend/nodes/Makefile                    |  1 +
 src/backend/nodes/gen_node_support.pl         |  2 +
 src/backend/utils/adt/pseudotypes.c           |  1 +
 src/include/catalog/pg_proc.dat               |  6 ++
 src/include/catalog/pg_type.dat               |  6 ++
 src/include/commands/copy.h                   |  1 +
 src/include/commands/copyapi.h                |  2 +
 src/include/nodes/meson.build                 |  1 +
 src/test/modules/Makefile                     |  1 +
 src/test/modules/meson.build                  |  1 +
 src/test/modules/test_copy_format/.gitignore  |  4 ++
 src/test/modules/test_copy_format/Makefile    | 23 ++++++
 .../expected/test_copy_format.out             | 17 +++++
 src/test/modules/test_copy_format/meson.build | 33 +++++++++
 .../test_copy_format/sql/test_copy_format.sql |  5 ++
 .../test_copy_format--1.0.sql                 |  8 +++
 .../test_copy_format/test_copy_format.c       | 63 +++++++++++++++++
 .../test_copy_format/test_copy_format.control |  4 ++
 20 files changed, 264 insertions(+), 21 deletions(-)
 mode change 100644 => 100755 src/backend/nodes/gen_node_support.pl
 create mode 100644 src/test/modules/test_copy_format/.gitignore
 create mode 100644 src/test/modules/test_copy_format/Makefile
 create mode 100644 src/test/modules/test_copy_format/expected/test_copy_format.out
 create mode 100644 src/test/modules/test_copy_format/meson.build
 create mode 100644 src/test/modules/test_copy_format/sql/test_copy_format.sql
 create mode 100644 src/test/modules/test_copy_format/test_copy_format--1.0.sql
 create mode 100644 src/test/modules/test_copy_format/test_copy_format.c
 create mode 100644 src/test/modules/test_copy_format/test_copy_format.control
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfca9d9dc29..77a35831d05 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -32,6 +32,7 @@
 #include "parser/parse_coerce.h"
 #include "parser/parse_collate.h"
 #include "parser/parse_expr.h"
+#include "parser/parse_func.h"
 #include "parser/parse_relation.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
@@ -476,6 +477,61 @@ defGetCopyLogVerbosityChoice(DefElem *def, ParseState *pstate)
     return COPY_LOG_VERBOSITY_DEFAULT;    /* keep compiler quiet */
 }
 
+/*
+ * Process the "format" option.
+ *
+ * This function checks whether the option value is a built-in format such as
+ * "text" and "csv" or not. If the option value isn't a built-in format, this
+ * function finds a COPY format handler that returns a CopyToRoutine (for
+ * is_from == false). If no COPY format handler is found, this function
+ * reports an error.
+ */
+static void
+ProcessCopyOptionFormat(ParseState *pstate,
+                        CopyFormatOptions *opts_out,
+                        bool is_from,
+                        DefElem *defel)
+{
+    char       *format;
+    Oid            funcargtypes[1];
+    Oid            handlerOid = InvalidOid;
+
+    format = defGetString(defel);
+
+    opts_out->csv_mode = false;
+    opts_out->binary = false;
+    /* built-in formats */
+    if (strcmp(format, "text") == 0)
+    {
+        /* "csv_mode == false && binary == false" means "text" */
+        return;
+    }
+    else if (strcmp(format, "csv") == 0)
+    {
+        opts_out->csv_mode = true;
+        return;
+    }
+    else if (strcmp(format, "binary") == 0)
+    {
+        opts_out->binary = true;
+        return;
+    }
+
+    /* custom format */
+    if (!is_from)
+    {
+        funcargtypes[0] = INTERNALOID;
+        handlerOid = LookupFuncName(list_make1(makeString(format)), 1,
+                                    funcargtypes, true);
+    }
+    if (!OidIsValid(handlerOid))
+        ereport(ERROR,
+                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                 errmsg("COPY format \"%s\" not recognized", format),
+                 parser_errposition(pstate, defel->location)));
+    opts_out->handler = handlerOid;
+}
+
 /*
  * Process the statement option list for COPY.
  *
@@ -519,22 +575,10 @@ ProcessCopyOptions(ParseState *pstate,
 
         if (strcmp(defel->defname, "format") == 0)
         {
-            char       *fmt = defGetString(defel);
-
             if (format_specified)
                 errorConflictingDefElem(defel, pstate);
             format_specified = true;
-            if (strcmp(fmt, "text") == 0)
-                 /* default format */ ;
-            else if (strcmp(fmt, "csv") == 0)
-                opts_out->csv_mode = true;
-            else if (strcmp(fmt, "binary") == 0)
-                opts_out->binary = true;
-            else
-                ereport(ERROR,
-                        (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-                         errmsg("COPY format \"%s\" not recognized", fmt),
-                         parser_errposition(pstate, defel->location)));
+            ProcessCopyOptionFormat(pstate, opts_out, is_from, defel);
         }
         else if (strcmp(defel->defname, "freeze") == 0)
         {
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 26c67ddc351..18af2aaa2f9 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -150,6 +150,7 @@ static void CopySendInt16(CopyToState cstate, int16 val);
 
 /* text format */
 static const CopyToRoutine CopyToRoutineText = {
+    .type = T_CopyToRoutine,
     .CopyToStart = CopyToTextLikeStart,
     .CopyToOutFunc = CopyToTextLikeOutFunc,
     .CopyToOneRow = CopyToTextOneRow,
@@ -158,6 +159,7 @@ static const CopyToRoutine CopyToRoutineText = {
 
 /* CSV format */
 static const CopyToRoutine CopyToRoutineCSV = {
+    .type = T_CopyToRoutine,
     .CopyToStart = CopyToTextLikeStart,
     .CopyToOutFunc = CopyToTextLikeOutFunc,
     .CopyToOneRow = CopyToCSVOneRow,
@@ -166,6 +168,7 @@ static const CopyToRoutine CopyToRoutineCSV = {
 
 /* binary format */
 static const CopyToRoutine CopyToRoutineBinary = {
+    .type = T_CopyToRoutine,
     .CopyToStart = CopyToBinaryStart,
     .CopyToOutFunc = CopyToBinaryOutFunc,
     .CopyToOneRow = CopyToBinaryOneRow,
@@ -174,15 +177,32 @@ static const CopyToRoutine CopyToRoutineBinary = {
 
 /* Return a COPY TO routine for the given options */
 static const CopyToRoutine *
-CopyToGetRoutine(CopyFormatOptions opts)
+CopyToGetRoutine(CopyFormatOptions *opts)
 {
-    if (opts.csv_mode)
-        return &CopyToRoutineCSV;
-    else if (opts.binary)
-        return &CopyToRoutineBinary;
+    if (OidIsValid(opts->handler))
+    {
+        Datum        datum;
+        Node       *routine;
 
-    /* default is text */
-    return &CopyToRoutineText;
+        datum = OidFunctionCall1(opts->handler, BoolGetDatum(false));
+        routine = (Node *) DatumGetPointer(datum);
+        if (routine == NULL || !IsA(routine, CopyToRoutine))
+            ereport(
+                    ERROR,
+                    (errcode(
+                             ERRCODE_INVALID_PARAMETER_VALUE),
+                     errmsg("COPY handler function "
+                            "%u did not return "
+                            "CopyToRoutine struct",
+                            opts->handler)));
+        return castNode(CopyToRoutine, routine);
+    }
+    else if (opts->csv_mode)
+        return &CopyToRoutineCSV;
+    else if (opts->binary)
+        return &CopyToRoutineBinary;
+    else
+        return &CopyToRoutineText;
 }
 
 /* Implementation of the start callback for text and CSV formats */
@@ -703,7 +723,7 @@ BeginCopyTo(ParseState *pstate,
     ProcessCopyOptions(pstate, &cstate->opts, false /* is_from */ , options);
 
     /* Set format routine */
-    cstate->routine = CopyToGetRoutine(cstate->opts);
+    cstate->routine = CopyToGetRoutine(&cstate->opts);
 
     /* Process the source/target relation or query */
     if (rel)
diff --git a/src/backend/nodes/Makefile b/src/backend/nodes/Makefile
index 66bbad8e6e0..173ee11811c 100644
--- a/src/backend/nodes/Makefile
+++ b/src/backend/nodes/Makefile
@@ -49,6 +49,7 @@ node_headers = \
     access/sdir.h \
     access/tableam.h \
     access/tsmapi.h \
+    commands/copyapi.h \
     commands/event_trigger.h \
     commands/trigger.h \
     executor/tuptable.h \
diff --git a/src/backend/nodes/gen_node_support.pl b/src/backend/nodes/gen_node_support.pl
old mode 100644
new mode 100755
index 7c012c27f88..5d53d32c4a7
--- a/src/backend/nodes/gen_node_support.pl
+++ b/src/backend/nodes/gen_node_support.pl
@@ -61,6 +61,7 @@ my @all_input_files = qw(
   access/sdir.h
   access/tableam.h
   access/tsmapi.h
+  commands/copyapi.h
   commands/event_trigger.h
   commands/trigger.h
   executor/tuptable.h
@@ -85,6 +86,7 @@ my @nodetag_only_files = qw(
   access/sdir.h
   access/tableam.h
   access/tsmapi.h
+  commands/copyapi.h
   commands/event_trigger.h
   commands/trigger.h
   executor/tuptable.h
diff --git a/src/backend/utils/adt/pseudotypes.c b/src/backend/utils/adt/pseudotypes.c
index 317a1f2b282..f2ebc21ca56 100644
--- a/src/backend/utils/adt/pseudotypes.c
+++ b/src/backend/utils/adt/pseudotypes.c
@@ -370,6 +370,7 @@ PSEUDOTYPE_DUMMY_IO_FUNCS(fdw_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(table_am_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(index_am_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(tsm_handler);
+PSEUDOTYPE_DUMMY_IO_FUNCS(copy_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(internal);
 PSEUDOTYPE_DUMMY_IO_FUNCS(anyelement);
 PSEUDOTYPE_DUMMY_IO_FUNCS(anynonarray);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 5b8c2ad2a54..b231e7a041e 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -7803,6 +7803,12 @@
 { oid => '3312', descr => 'I/O',
   proname => 'tsm_handler_out', prorettype => 'cstring',
   proargtypes => 'tsm_handler', prosrc => 'tsm_handler_out' },
+{ oid => '8753', descr => 'I/O',
+  proname => 'copy_handler_in', proisstrict => 'f', prorettype => 'copy_handler',
+  proargtypes => 'cstring', prosrc => 'copy_handler_in' },
+{ oid => '8754', descr => 'I/O',
+  proname => 'copy_handler_out', prorettype => 'cstring',
+  proargtypes => 'copy_handler', prosrc => 'copy_handler_out' },
 { oid => '267', descr => 'I/O',
   proname => 'table_am_handler_in', proisstrict => 'f',
   prorettype => 'table_am_handler', proargtypes => 'cstring',
diff --git a/src/include/catalog/pg_type.dat b/src/include/catalog/pg_type.dat
index 6dca77e0a22..340e0cd0a8d 100644
--- a/src/include/catalog/pg_type.dat
+++ b/src/include/catalog/pg_type.dat
@@ -633,6 +633,12 @@
   typcategory => 'P', typinput => 'tsm_handler_in',
   typoutput => 'tsm_handler_out', typreceive => '-', typsend => '-',
   typalign => 'i' },
+{ oid => '8752',
+  descr => 'pseudo-type for the result of a copy to method function',
+  typname => 'copy_handler', typlen => '4', typbyval => 't', typtype => 'p',
+  typcategory => 'P', typinput => 'copy_handler_in',
+  typoutput => 'copy_handler_out', typreceive => '-', typsend => '-',
+  typalign => 'i' },
 { oid => '269',
   descr => 'pseudo-type for the result of a table AM handler function',
   typname => 'table_am_handler', typlen => '4', typbyval => 't', typtype => 'p',
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 7bc044e2816..285f2c8fc4f 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -87,6 +87,7 @@ typedef struct CopyFormatOptions
     CopyLogVerbosityChoice log_verbosity;    /* verbosity of logged messages */
     int64        reject_limit;    /* maximum tolerable number of errors */
     List       *convert_select; /* list of column names (can be NIL) */
+    Oid            handler;        /* handler function for custom format routine */
 } CopyFormatOptions;
 
 /* These are private in commands/copy[from|to].c */
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 39e5a096da5..c125dc3e209 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -22,6 +22,8 @@
  */
 typedef struct CopyToRoutine
 {
+    NodeTag        type;
+
     /*
      * Set output function information. This callback is called once at the
      * beginning of COPY TO.
diff --git a/src/include/nodes/meson.build b/src/include/nodes/meson.build
index f3dd5461fef..09f7443195f 100644
--- a/src/include/nodes/meson.build
+++ b/src/include/nodes/meson.build
@@ -11,6 +11,7 @@ node_support_input_i = [
   'access/sdir.h',
   'access/tableam.h',
   'access/tsmapi.h',
+  'commands/copyapi.h',
   'commands/event_trigger.h',
   'commands/trigger.h',
   'executor/tuptable.h',
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index c0d3cf0e14b..33e3a49a4fb 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -15,6 +15,7 @@ SUBDIRS = \
           spgist_name_ops \
           test_bloomfilter \
           test_copy_callbacks \
+          test_copy_format \
           test_custom_rmgrs \
           test_ddl_deparse \
           test_dsa \
diff --git a/src/test/modules/meson.build b/src/test/modules/meson.build
index 4f544a042d4..bf25658793d 100644
--- a/src/test/modules/meson.build
+++ b/src/test/modules/meson.build
@@ -14,6 +14,7 @@ subdir('spgist_name_ops')
 subdir('ssl_passphrase_callback')
 subdir('test_bloomfilter')
 subdir('test_copy_callbacks')
+subdir('test_copy_format')
 subdir('test_custom_rmgrs')
 subdir('test_ddl_deparse')
 subdir('test_dsa')
diff --git a/src/test/modules/test_copy_format/.gitignore b/src/test/modules/test_copy_format/.gitignore
new file mode 100644
index 00000000000..5dcb3ff9723
--- /dev/null
+++ b/src/test/modules/test_copy_format/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/src/test/modules/test_copy_format/Makefile b/src/test/modules/test_copy_format/Makefile
new file mode 100644
index 00000000000..8497f91624d
--- /dev/null
+++ b/src/test/modules/test_copy_format/Makefile
@@ -0,0 +1,23 @@
+# src/test/modules/test_copy_format/Makefile
+
+MODULE_big = test_copy_format
+OBJS = \
+    $(WIN32RES) \
+    test_copy_format.o
+PGFILEDESC = "test_copy_format - test custom COPY FORMAT"
+
+EXTENSION = test_copy_format
+DATA = test_copy_format--1.0.sql
+
+REGRESS = test_copy_format
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_copy_format
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out
b/src/test/modules/test_copy_format/expected/test_copy_format.out
new file mode 100644
index 00000000000..adfe7d1572a
--- /dev/null
+++ b/src/test/modules/test_copy_format/expected/test_copy_format.out
@@ -0,0 +1,17 @@
+CREATE EXTENSION test_copy_format;
+CREATE TABLE public.test (a smallint, b integer, c bigint);
+INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
+ERROR:  COPY format "test_copy_format" not recognized
+LINE 1: COPY public.test FROM stdin WITH (FORMAT 'test_copy_format')...
+                                          ^
+COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
+NOTICE:  test_copy_format: is_from=false
+NOTICE:  CopyToOutFunc: atttypid=21
+NOTICE:  CopyToOutFunc: atttypid=23
+NOTICE:  CopyToOutFunc: atttypid=20
+NOTICE:  CopyToStart: natts=3
+NOTICE:  CopyToOneRow: tts_nvalid=3
+NOTICE:  CopyToOneRow: tts_nvalid=3
+NOTICE:  CopyToOneRow: tts_nvalid=3
+NOTICE:  CopyToEnd
diff --git a/src/test/modules/test_copy_format/meson.build b/src/test/modules/test_copy_format/meson.build
new file mode 100644
index 00000000000..4cefe7b709a
--- /dev/null
+++ b/src/test/modules/test_copy_format/meson.build
@@ -0,0 +1,33 @@
+# Copyright (c) 2024, PostgreSQL Global Development Group
+
+test_copy_format_sources = files(
+  'test_copy_format.c',
+)
+
+if host_system == 'windows'
+  test_copy_format_sources += rc_lib_gen.process(win32ver_rc, extra_args: [
+    '--NAME', 'test_copy_format',
+    '--FILEDESC', 'test_copy_format - test custom COPY FORMAT',])
+endif
+
+test_copy_format = shared_module('test_copy_format',
+  test_copy_format_sources,
+  kwargs: pg_test_mod_args,
+)
+test_install_libs += test_copy_format
+
+test_install_data += files(
+  'test_copy_format.control',
+  'test_copy_format--1.0.sql',
+)
+
+tests += {
+  'name': 'test_copy_format',
+  'sd': meson.current_source_dir(),
+  'bd': meson.current_build_dir(),
+  'regress': {
+    'sql': [
+      'test_copy_format',
+    ],
+  },
+}
diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql
b/src/test/modules/test_copy_format/sql/test_copy_format.sql
new file mode 100644
index 00000000000..810b3d8cedc
--- /dev/null
+++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql
@@ -0,0 +1,5 @@
+CREATE EXTENSION test_copy_format;
+CREATE TABLE public.test (a smallint, b integer, c bigint);
+INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
+COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
diff --git a/src/test/modules/test_copy_format/test_copy_format--1.0.sql
b/src/test/modules/test_copy_format/test_copy_format--1.0.sql
new file mode 100644
index 00000000000..d24ea03ce99
--- /dev/null
+++ b/src/test/modules/test_copy_format/test_copy_format--1.0.sql
@@ -0,0 +1,8 @@
+/* src/test/modules/test_copy_format/test_copy_format--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_copy_format" to load this file. \quit
+
+CREATE FUNCTION test_copy_format(internal)
+    RETURNS copy_handler
+    AS 'MODULE_PATHNAME' LANGUAGE C;
diff --git a/src/test/modules/test_copy_format/test_copy_format.c
b/src/test/modules/test_copy_format/test_copy_format.c
new file mode 100644
index 00000000000..e064f40473b
--- /dev/null
+++ b/src/test/modules/test_copy_format/test_copy_format.c
@@ -0,0 +1,63 @@
+/*--------------------------------------------------------------------------
+ *
+ * test_copy_format.c
+ *        Code for testing custom COPY format.
+ *
+ * Portions Copyright (c) 2024, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *        src/test/modules/test_copy_format/test_copy_format.c
+ *
+ * -------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "commands/copyapi.h"
+#include "commands/defrem.h"
+
+PG_MODULE_MAGIC;
+
+static void
+CopyToOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo)
+{
+    ereport(NOTICE, (errmsg("CopyToOutFunc: atttypid=%d", atttypid)));
+}
+
+static void
+CopyToStart(CopyToState cstate, TupleDesc tupDesc)
+{
+    ereport(NOTICE, (errmsg("CopyToStart: natts=%d", tupDesc->natts)));
+}
+
+static void
+CopyToOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+    ereport(NOTICE, (errmsg("CopyToOneRow: tts_nvalid=%u", slot->tts_nvalid)));
+}
+
+static void
+CopyToEnd(CopyToState cstate)
+{
+    ereport(NOTICE, (errmsg("CopyToEnd")));
+}
+
+static const CopyToRoutine CopyToRoutineTestCopyFormat = {
+    .type = T_CopyToRoutine,
+    .CopyToOutFunc = CopyToOutFunc,
+    .CopyToStart = CopyToStart,
+    .CopyToOneRow = CopyToOneRow,
+    .CopyToEnd = CopyToEnd,
+};
+
+PG_FUNCTION_INFO_V1(test_copy_format);
+Datum
+test_copy_format(PG_FUNCTION_ARGS)
+{
+    bool        is_from = PG_GETARG_BOOL(0);
+
+    ereport(NOTICE,
+            (errmsg("test_copy_format: is_from=%s", is_from ? "true" : "false")));
+
+    PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat);
+}
diff --git a/src/test/modules/test_copy_format/test_copy_format.control
b/src/test/modules/test_copy_format/test_copy_format.control
new file mode 100644
index 00000000000..f05a6362358
--- /dev/null
+++ b/src/test/modules/test_copy_format/test_copy_format.control
@@ -0,0 +1,4 @@
+comment = 'Test code for custom COPY format'
+default_version = '1.0'
+module_pathname = '$libdir/test_copy_format'
+relocatable = true
-- 
2.47.1
From a1b7b711aeec2bc52bc50e2e9182182200459ec5 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Mon, 25 Nov 2024 13:58:33 +0900
Subject: [PATCH v31 4/9] Export CopyToStateData as private data
It's for custom COPY TO format handlers implemented as extension.
This just moves codes. This doesn't change codes except CopyDest enum
values. CopyDest/CopyFrom enum values such as COPY_FILE are conflicted
each other. So COPY_DEST_ prefix instead of COPY_ prefix is used for
CopyDest enum values. For example, COPY_FILE in CopyDest is renamed to
COPY_DEST_FILE.
Note that this isn't enough to implement custom COPY TO format
handlers as extension. We'll do the followings in a subsequent commit:
1. Add an opaque space for custom COPY TO format handler
2. Export CopySendEndOfRow() to flush buffer
---
 src/backend/commands/copyto.c          | 78 +++---------------------
 src/include/commands/copy.h            |  2 +-
 src/include/commands/copyto_internal.h | 83 ++++++++++++++++++++++++++
 3 files changed, 93 insertions(+), 70 deletions(-)
 create mode 100644 src/include/commands/copyto_internal.h
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 18af2aaa2f9..16d3b389e97 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -20,6 +20,7 @@
 
 #include "access/tableam.h"
 #include "commands/copyapi.h"
+#include "commands/copyto_internal.h"
 #include "commands/progress.h"
 #include "executor/execdesc.h"
 #include "executor/executor.h"
@@ -36,67 +37,6 @@
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
-/*
- * Represents the different dest cases we need to worry about at
- * the bottom level
- */
-typedef enum CopyDest
-{
-    COPY_FILE,                    /* to file (or a piped program) */
-    COPY_FRONTEND,                /* to frontend */
-    COPY_CALLBACK,                /* to callback function */
-} CopyDest;
-
-/*
- * This struct contains all the state variables used throughout a COPY TO
- * operation.
- *
- * Multi-byte encodings: all supported client-side encodings encode multi-byte
- * characters by having the first byte's high bit set. Subsequent bytes of the
- * character can have the high bit not set. When scanning data in such an
- * encoding to look for a match to a single-byte (ie ASCII) character, we must
- * use the full pg_encoding_mblen() machinery to skip over multibyte
- * characters, else we might find a false match to a trailing byte. In
- * supported server encodings, there is no possibility of a false match, and
- * it's faster to make useless comparisons to trailing bytes than it is to
- * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true
- * when we have to do it the hard way.
- */
-typedef struct CopyToStateData
-{
-    /* format-specific routines */
-    const CopyToRoutine *routine;
-
-    /* low-level state data */
-    CopyDest    copy_dest;        /* type of copy source/destination */
-    FILE       *copy_file;        /* used if copy_dest == COPY_FILE */
-    StringInfo    fe_msgbuf;        /* used for all dests during COPY TO */
-
-    int            file_encoding;    /* file or remote side's character encoding */
-    bool        need_transcoding;    /* file encoding diff from server? */
-    bool        encoding_embeds_ascii;    /* ASCII can be non-first byte? */
-
-    /* parameters from the COPY command */
-    Relation    rel;            /* relation to copy to */
-    QueryDesc  *queryDesc;        /* executable query to copy from */
-    List       *attnumlist;        /* integer list of attnums to copy */
-    char       *filename;        /* filename, or NULL for STDOUT */
-    bool        is_program;        /* is 'filename' a program to popen? */
-    copy_data_dest_cb data_dest_cb; /* function for writing data */
-
-    CopyFormatOptions opts;
-    Node       *whereClause;    /* WHERE condition (or NULL) */
-
-    /*
-     * Working state
-     */
-    MemoryContext copycontext;    /* per-copy execution context */
-
-    FmgrInfo   *out_functions;    /* lookup info for output functions */
-    MemoryContext rowcontext;    /* per-row evaluation context */
-    uint64        bytes_processed;    /* number of bytes processed so far */
-} CopyToStateData;
-
 /* DestReceiver for COPY (query) TO */
 typedef struct
 {
@@ -424,7 +364,7 @@ SendCopyBegin(CopyToState cstate)
     for (i = 0; i < natts; i++)
         pq_sendint16(&buf, format); /* per-column formats */
     pq_endmessage(&buf);
-    cstate->copy_dest = COPY_FRONTEND;
+    cstate->copy_dest = COPY_DEST_FRONTEND;
 }
 
 static void
@@ -471,7 +411,7 @@ CopySendEndOfRow(CopyToState cstate)
 
     switch (cstate->copy_dest)
     {
-        case COPY_FILE:
+        case COPY_DEST_FILE:
             if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1,
                        cstate->copy_file) != 1 ||
                 ferror(cstate->copy_file))
@@ -505,11 +445,11 @@ CopySendEndOfRow(CopyToState cstate)
                              errmsg("could not write to COPY file: %m")));
             }
             break;
-        case COPY_FRONTEND:
+        case COPY_DEST_FRONTEND:
             /* Dump the accumulated row as one CopyData message */
             (void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len);
             break;
-        case COPY_CALLBACK:
+        case COPY_DEST_CALLBACK:
             cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len);
             break;
     }
@@ -530,7 +470,7 @@ CopySendTextLikeEndOfRow(CopyToState cstate)
 {
     switch (cstate->copy_dest)
     {
-        case COPY_FILE:
+        case COPY_DEST_FILE:
             /* Default line termination depends on platform */
 #ifndef WIN32
             CopySendChar(cstate, '\n');
@@ -538,7 +478,7 @@ CopySendTextLikeEndOfRow(CopyToState cstate)
             CopySendString(cstate, "\r\n");
 #endif
             break;
-        case COPY_FRONTEND:
+        case COPY_DEST_FRONTEND:
             /* The FE/BE protocol uses \n as newline for all platforms */
             CopySendChar(cstate, '\n');
             break;
@@ -922,12 +862,12 @@ BeginCopyTo(ParseState *pstate,
     /* See Multibyte encoding comment above */
     cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding);
 
-    cstate->copy_dest = COPY_FILE;    /* default */
+    cstate->copy_dest = COPY_DEST_FILE; /* default */
 
     if (data_dest_cb)
     {
         progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK;
-        cstate->copy_dest = COPY_CALLBACK;
+        cstate->copy_dest = COPY_DEST_CALLBACK;
         cstate->data_dest_cb = data_dest_cb;
     }
     else if (pipe)
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 285f2c8fc4f..be97b07b559 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -90,7 +90,7 @@ typedef struct CopyFormatOptions
     Oid            handler;        /* handler function for custom format routine */
 } CopyFormatOptions;
 
-/* These are private in commands/copy[from|to].c */
+/* These are private in commands/copy[from|to]_internal.h */
 typedef struct CopyFromStateData *CopyFromState;
 typedef struct CopyToStateData *CopyToState;
 
diff --git a/src/include/commands/copyto_internal.h b/src/include/commands/copyto_internal.h
new file mode 100644
index 00000000000..1b58b36c0a3
--- /dev/null
+++ b/src/include/commands/copyto_internal.h
@@ -0,0 +1,83 @@
+/*-------------------------------------------------------------------------
+ *
+ * copyto_internal.h
+ *      Internal definitions for COPY TO command.
+ *
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/commands/copyto_internal.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef COPYTO_INTERNAL_H
+#define COPYTO_INTERNAL_H
+
+#include "commands/copy.h"
+#include "executor/execdesc.h"
+#include "executor/tuptable.h"
+#include "nodes/execnodes.h"
+
+/*
+ * Represents the different dest cases we need to worry about at
+ * the bottom level
+ */
+typedef enum CopyDest
+{
+    COPY_DEST_FILE,                /* to file (or a piped program) */
+    COPY_DEST_FRONTEND,            /* to frontend */
+    COPY_DEST_CALLBACK,            /* to callback function */
+} CopyDest;
+
+/*
+ * This struct contains all the state variables used throughout a COPY TO
+ * operation.
+ *
+ * Multi-byte encodings: all supported client-side encodings encode multi-byte
+ * characters by having the first byte's high bit set. Subsequent bytes of the
+ * character can have the high bit not set. When scanning data in such an
+ * encoding to look for a match to a single-byte (ie ASCII) character, we must
+ * use the full pg_encoding_mblen() machinery to skip over multibyte
+ * characters, else we might find a false match to a trailing byte. In
+ * supported server encodings, there is no possibility of a false match, and
+ * it's faster to make useless comparisons to trailing bytes than it is to
+ * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true
+ * when we have to do it the hard way.
+ */
+typedef struct CopyToStateData
+{
+    /* format-specific routines */
+    const struct CopyToRoutine *routine;
+
+    /* low-level state data */
+    CopyDest    copy_dest;        /* type of copy source/destination */
+    FILE       *copy_file;        /* used if copy_dest == COPY_FILE */
+    StringInfo    fe_msgbuf;        /* used for all dests during COPY TO */
+
+    int            file_encoding;    /* file or remote side's character encoding */
+    bool        need_transcoding;    /* file encoding diff from server? */
+    bool        encoding_embeds_ascii;    /* ASCII can be non-first byte? */
+
+    /* parameters from the COPY command */
+    Relation    rel;            /* relation to copy to */
+    QueryDesc  *queryDesc;        /* executable query to copy from */
+    List       *attnumlist;        /* integer list of attnums to copy */
+    char       *filename;        /* filename, or NULL for STDOUT */
+    bool        is_program;        /* is 'filename' a program to popen? */
+    copy_data_dest_cb data_dest_cb; /* function for writing data */
+
+    CopyFormatOptions opts;
+    Node       *whereClause;    /* WHERE condition (or NULL) */
+
+    /*
+     * Working state
+     */
+    MemoryContext copycontext;    /* per-copy execution context */
+
+    FmgrInfo   *out_functions;    /* lookup info for output functions */
+    MemoryContext rowcontext;    /* per-row evaluation context */
+    uint64        bytes_processed;    /* number of bytes processed so far */
+} CopyToStateData;
+
+#endif                            /* COPYTO_INTERNAL_H */
-- 
2.47.1
From 37d2ac9d84288852ad049db1145104d52f065b14 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Mon, 25 Nov 2024 14:01:18 +0900
Subject: [PATCH v31 5/9] Add support for implementing custom COPY TO format as
 extension
* Add CopyToStateData::opaque that can be used to keep data for custom
  COPY TO format implementation
* Export CopySendEndOfRow() to flush data in CopyToStateData::fe_msgbuf
  as CopyToStateFlush()
---
 src/backend/commands/copyto.c          | 12 ++++++++++++
 src/include/commands/copyapi.h         |  2 ++
 src/include/commands/copyto_internal.h |  3 +++
 3 files changed, 17 insertions(+)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 16d3b389e97..20d49d73e38 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -461,6 +461,18 @@ CopySendEndOfRow(CopyToState cstate)
     resetStringInfo(fe_msgbuf);
 }
 
+/*
+ * Export CopySendEndOfRow() for extensions. We want to keep
+ * CopySendEndOfRow() as a static function for
+ * optimization. CopySendEndOfRow() calls in this file may be optimized by a
+ * compiler.
+ */
+void
+CopyToStateFlush(CopyToState cstate)
+{
+    CopySendEndOfRow(cstate);
+}
+
 /*
  * Wrapper function of CopySendEndOfRow for text and CSV formats. Sends the
  * the line termination and do common appropriate things for the end of row.
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index c125dc3e209..d0da9e07a0d 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -54,6 +54,8 @@ typedef struct CopyToRoutine
     void        (*CopyToEnd) (CopyToState cstate);
 } CopyToRoutine;
 
+extern void CopyToStateFlush(CopyToState cstate);
+
 /*
  * API structure for a COPY FROM format implementation.     Note this must be
  * allocated in a server-lifetime manner, typically as a static const struct.
diff --git a/src/include/commands/copyto_internal.h b/src/include/commands/copyto_internal.h
index 1b58b36c0a3..ce1c33a4004 100644
--- a/src/include/commands/copyto_internal.h
+++ b/src/include/commands/copyto_internal.h
@@ -78,6 +78,9 @@ typedef struct CopyToStateData
     FmgrInfo   *out_functions;    /* lookup info for output functions */
     MemoryContext rowcontext;    /* per-row evaluation context */
     uint64        bytes_processed;    /* number of bytes processed so far */
+
+    /* For custom format implementation */
+    void       *opaque;            /* private space */
 } CopyToStateData;
 
 #endif                            /* COPYTO_INTERNAL_H */
-- 
2.47.1
From bb6976fc15bd576b40d5820a9e6a0f49bfbd759d Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Mon, 25 Nov 2024 14:11:55 +0900
Subject: [PATCH v31 6/9] Add support for adding custom COPY FROM format
This uses the same handler for COPY TO and COPY FROM but uses
different routine. This uses CopyToRoutine for COPY TO and
CopyFromRoutine for COPY FROM. PostgreSQL calls a COPY TO/FROM handler
with "is_from" argument. It's true for COPY FROM and false for COPY
TO:
    copy_handler(true) returns CopyToRoutine
    copy_handler(false) returns CopyFromRoutine
This also add a test module for custom COPY FROM handler.
---
 src/backend/commands/copy.c                   | 13 +++----
 src/backend/commands/copyfrom.c               | 36 +++++++++++++----
 src/include/catalog/pg_type.dat               |  2 +-
 src/include/commands/copyapi.h                |  2 +
 .../expected/test_copy_format.out             | 10 +++--
 .../test_copy_format/sql/test_copy_format.sql |  1 +
 .../test_copy_format/test_copy_format.c       | 39 ++++++++++++++++++-
 7 files changed, 82 insertions(+), 21 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 77a35831d05..05cc5d1232a 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -483,8 +483,8 @@ defGetCopyLogVerbosityChoice(DefElem *def, ParseState *pstate)
  * This function checks whether the option value is a built-in format such as
  * "text" and "csv" or not. If the option value isn't a built-in format, this
  * function finds a COPY format handler that returns a CopyToRoutine (for
- * is_from == false). If no COPY format handler is found, this function
- * reports an error.
+ * is_from == false) or CopyFromRountine (for is_from == true). If no COPY
+ * format handler is found, this function reports an error.
  */
 static void
 ProcessCopyOptionFormat(ParseState *pstate,
@@ -518,12 +518,9 @@ ProcessCopyOptionFormat(ParseState *pstate,
     }
 
     /* custom format */
-    if (!is_from)
-    {
-        funcargtypes[0] = INTERNALOID;
-        handlerOid = LookupFuncName(list_make1(makeString(format)), 1,
-                                    funcargtypes, true);
-    }
+    funcargtypes[0] = INTERNALOID;
+    handlerOid = LookupFuncName(list_make1(makeString(format)), 1,
+                                funcargtypes, true);
     if (!OidIsValid(handlerOid))
         ereport(ERROR,
                 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 8b09df0581f..37647949bfc 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -129,6 +129,7 @@ static void CopyFromBinaryEnd(CopyFromState cstate);
 
 /* text format */
 static const CopyFromRoutine CopyFromRoutineText = {
+    .type = T_CopyFromRoutine,
     .CopyFromInFunc = CopyFromTextLikeInFunc,
     .CopyFromStart = CopyFromTextLikeStart,
     .CopyFromOneRow = CopyFromTextOneRow,
@@ -137,6 +138,7 @@ static const CopyFromRoutine CopyFromRoutineText = {
 
 /* CSV format */
 static const CopyFromRoutine CopyFromRoutineCSV = {
+    .type = T_CopyFromRoutine,
     .CopyFromInFunc = CopyFromTextLikeInFunc,
     .CopyFromStart = CopyFromTextLikeStart,
     .CopyFromOneRow = CopyFromCSVOneRow,
@@ -145,6 +147,7 @@ static const CopyFromRoutine CopyFromRoutineCSV = {
 
 /* binary format */
 static const CopyFromRoutine CopyFromRoutineBinary = {
+    .type = T_CopyFromRoutine,
     .CopyFromInFunc = CopyFromBinaryInFunc,
     .CopyFromStart = CopyFromBinaryStart,
     .CopyFromOneRow = CopyFromBinaryOneRow,
@@ -153,15 +156,32 @@ static const CopyFromRoutine CopyFromRoutineBinary = {
 
 /* Return a COPY FROM routine for the given options */
 static const CopyFromRoutine *
-CopyFromGetRoutine(CopyFormatOptions opts)
+CopyFromGetRoutine(CopyFormatOptions *opts)
 {
-    if (opts.csv_mode)
-        return &CopyFromRoutineCSV;
-    else if (opts.binary)
-        return &CopyFromRoutineBinary;
+    if (OidIsValid(opts->handler))
+    {
+        Datum        datum;
+        Node       *routine;
 
-    /* default is text */
-    return &CopyFromRoutineText;
+        datum = OidFunctionCall1(opts->handler, BoolGetDatum(true));
+        routine = (Node *) DatumGetPointer(datum);
+        if (routine == NULL || !IsA(routine, CopyFromRoutine))
+            ereport(
+                    ERROR,
+                    (errcode(
+                             ERRCODE_INVALID_PARAMETER_VALUE),
+                     errmsg("COPY handler function "
+                            "%u did not return "
+                            "CopyFromRoutine struct",
+                            opts->handler)));
+        return castNode(CopyFromRoutine, routine);
+    }
+    else if (opts->csv_mode)
+        return &CopyFromRoutineCSV;
+    else if (opts->binary)
+        return &CopyFromRoutineBinary;
+    else
+        return &CopyFromRoutineText;
 }
 
 /* Implementation of the start callback for text and CSV formats */
@@ -1567,7 +1587,7 @@ BeginCopyFrom(ParseState *pstate,
     ProcessCopyOptions(pstate, &cstate->opts, true /* is_from */ , options);
 
     /* Set the format routine */
-    cstate->routine = CopyFromGetRoutine(cstate->opts);
+    cstate->routine = CopyFromGetRoutine(&cstate->opts);
 
     /* Process the target relation */
     cstate->rel = rel;
diff --git a/src/include/catalog/pg_type.dat b/src/include/catalog/pg_type.dat
index 340e0cd0a8d..63b7d65f982 100644
--- a/src/include/catalog/pg_type.dat
+++ b/src/include/catalog/pg_type.dat
@@ -634,7 +634,7 @@
   typoutput => 'tsm_handler_out', typreceive => '-', typsend => '-',
   typalign => 'i' },
 { oid => '8752',
-  descr => 'pseudo-type for the result of a copy to method function',
+  descr => 'pseudo-type for the result of a copy to/from method function',
   typname => 'copy_handler', typlen => '4', typbyval => 't', typtype => 'p',
   typcategory => 'P', typinput => 'copy_handler_in',
   typoutput => 'copy_handler_out', typreceive => '-', typsend => '-',
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index d0da9e07a0d..103eb21767d 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -62,6 +62,8 @@ extern void CopyToStateFlush(CopyToState cstate);
  */
 typedef struct CopyFromRoutine
 {
+    NodeTag        type;
+
     /*
      * Set input function information. This callback is called once at the
      * beginning of COPY FROM.
diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out
b/src/test/modules/test_copy_format/expected/test_copy_format.out
index adfe7d1572a..016893e7026 100644
--- a/src/test/modules/test_copy_format/expected/test_copy_format.out
+++ b/src/test/modules/test_copy_format/expected/test_copy_format.out
@@ -2,9 +2,13 @@ CREATE EXTENSION test_copy_format;
 CREATE TABLE public.test (a smallint, b integer, c bigint);
 INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
 COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
-ERROR:  COPY format "test_copy_format" not recognized
-LINE 1: COPY public.test FROM stdin WITH (FORMAT 'test_copy_format')...
-                                          ^
+NOTICE:  test_copy_format: is_from=true
+NOTICE:  CopyFromInFunc: atttypid=21
+NOTICE:  CopyFromInFunc: atttypid=23
+NOTICE:  CopyFromInFunc: atttypid=20
+NOTICE:  CopyFromStart: natts=3
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromEnd
 COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
 NOTICE:  test_copy_format: is_from=false
 NOTICE:  CopyToOutFunc: atttypid=21
diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql
b/src/test/modules/test_copy_format/sql/test_copy_format.sql
index 810b3d8cedc..0dfdfa00080 100644
--- a/src/test/modules/test_copy_format/sql/test_copy_format.sql
+++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql
@@ -2,4 +2,5 @@ CREATE EXTENSION test_copy_format;
 CREATE TABLE public.test (a smallint, b integer, c bigint);
 INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
 COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
+\.
 COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
diff --git a/src/test/modules/test_copy_format/test_copy_format.c
b/src/test/modules/test_copy_format/test_copy_format.c
index e064f40473b..f6b105659ab 100644
--- a/src/test/modules/test_copy_format/test_copy_format.c
+++ b/src/test/modules/test_copy_format/test_copy_format.c
@@ -18,6 +18,40 @@
 
 PG_MODULE_MAGIC;
 
+static void
+CopyFromInFunc(CopyFromState cstate, Oid atttypid,
+               FmgrInfo *finfo, Oid *typioparam)
+{
+    ereport(NOTICE, (errmsg("CopyFromInFunc: atttypid=%d", atttypid)));
+}
+
+static void
+CopyFromStart(CopyFromState cstate, TupleDesc tupDesc)
+{
+    ereport(NOTICE, (errmsg("CopyFromStart: natts=%d", tupDesc->natts)));
+}
+
+static bool
+CopyFromOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls)
+{
+    ereport(NOTICE, (errmsg("CopyFromOneRow")));
+    return false;
+}
+
+static void
+CopyFromEnd(CopyFromState cstate)
+{
+    ereport(NOTICE, (errmsg("CopyFromEnd")));
+}
+
+static const CopyFromRoutine CopyFromRoutineTestCopyFormat = {
+    .type = T_CopyFromRoutine,
+    .CopyFromInFunc = CopyFromInFunc,
+    .CopyFromStart = CopyFromStart,
+    .CopyFromOneRow = CopyFromOneRow,
+    .CopyFromEnd = CopyFromEnd,
+};
+
 static void
 CopyToOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo)
 {
@@ -59,5 +93,8 @@ test_copy_format(PG_FUNCTION_ARGS)
     ereport(NOTICE,
             (errmsg("test_copy_format: is_from=%s", is_from ? "true" : "false")));
 
-    PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat);
+    if (is_from)
+        PG_RETURN_POINTER(&CopyFromRoutineTestCopyFormat);
+    else
+        PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat);
 }
-- 
2.47.1
From 8ba32b6d9892011d651ead5fef317c57255f5bfd Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Mon, 25 Nov 2024 14:19:34 +0900
Subject: [PATCH v31 7/9] Use COPY_SOURCE_ prefix for CopySource enum values
This is for consistency with CopyDest.
---
 src/backend/commands/copyfrom.c          |  4 ++--
 src/backend/commands/copyfromparse.c     | 10 +++++-----
 src/include/commands/copyfrom_internal.h |  6 +++---
 3 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 37647949bfc..29e2a7d13d4 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1722,7 +1722,7 @@ BeginCopyFrom(ParseState *pstate,
                             pg_encoding_to_char(GetDatabaseEncoding()))));
     }
 
-    cstate->copy_src = COPY_FILE;    /* default */
+    cstate->copy_src = COPY_SOURCE_FILE;    /* default */
 
     cstate->whereClause = whereClause;
 
@@ -1850,7 +1850,7 @@ BeginCopyFrom(ParseState *pstate,
     if (data_source_cb)
     {
         progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK;
-        cstate->copy_src = COPY_CALLBACK;
+        cstate->copy_src = COPY_SOURCE_CALLBACK;
         cstate->data_source_cb = data_source_cb;
     }
     else if (pipe)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index c1872acbbf6..75b49629f08 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -171,7 +171,7 @@ ReceiveCopyBegin(CopyFromState cstate)
     for (i = 0; i < natts; i++)
         pq_sendint16(&buf, format); /* per-column formats */
     pq_endmessage(&buf);
-    cstate->copy_src = COPY_FRONTEND;
+    cstate->copy_src = COPY_SOURCE_FRONTEND;
     cstate->fe_msgbuf = makeStringInfo();
     /* We *must* flush here to ensure FE knows it can send. */
     pq_flush();
@@ -239,7 +239,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
 
     switch (cstate->copy_src)
     {
-        case COPY_FILE:
+        case COPY_SOURCE_FILE:
             bytesread = fread(databuf, 1, maxread, cstate->copy_file);
             if (ferror(cstate->copy_file))
                 ereport(ERROR,
@@ -248,7 +248,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
             if (bytesread == 0)
                 cstate->raw_reached_eof = true;
             break;
-        case COPY_FRONTEND:
+        case COPY_SOURCE_FRONTEND:
             while (maxread > 0 && bytesread < minread && !cstate->raw_reached_eof)
             {
                 int            avail;
@@ -331,7 +331,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
                 bytesread += avail;
             }
             break;
-        case COPY_CALLBACK:
+        case COPY_SOURCE_CALLBACK:
             bytesread = cstate->data_source_cb(databuf, minread, maxread);
             break;
     }
@@ -1159,7 +1159,7 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
          * after \. up to the protocol end of copy data.  (XXX maybe better
          * not to treat \. as special?)
          */
-        if (cstate->copy_src == COPY_FRONTEND)
+        if (cstate->copy_src == COPY_SOURCE_FRONTEND)
         {
             int            inbytes;
 
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index c8b22af22d8..3a306e3286e 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -24,9 +24,9 @@
  */
 typedef enum CopySource
 {
-    COPY_FILE,                    /* from file (or a piped program) */
-    COPY_FRONTEND,                /* from frontend */
-    COPY_CALLBACK,                /* from callback function */
+    COPY_SOURCE_FILE,            /* from file (or a piped program) */
+    COPY_SOURCE_FRONTEND,        /* from frontend */
+    COPY_SOURCE_CALLBACK,        /* from callback function */
 } CopySource;
 
 /*
-- 
2.47.1
From 6f0bf00c95964077ce070313f58876e87665842e Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Mon, 25 Nov 2024 14:21:39 +0900
Subject: [PATCH v31 8/9] Add support for implementing custom COPY FROM format
 as extension
* Add CopyFromStateData::opaque that can be used to keep data for
  custom COPY From format implementation
* Export CopyGetData() to get the next data as
  CopyFromStateGetData()
---
 src/backend/commands/copyfromparse.c     | 11 +++++++++++
 src/include/commands/copyapi.h           |  2 ++
 src/include/commands/copyfrom_internal.h |  3 +++
 3 files changed, 16 insertions(+)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 75b49629f08..01f2e7a8824 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -730,6 +730,17 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes)
     return copied_bytes;
 }
 
+/*
+ * Export CopyGetData() for extensions. We want to keep CopyGetData() as a
+ * static function for optimization. CopyGetData() calls in this file may be
+ * optimized by a compiler.
+ */
+int
+CopyFromStateGetData(CopyFromState cstate, void *dest, int minread, int maxread)
+{
+    return CopyGetData(cstate, dest, minread, maxread);
+}
+
 /*
  * Read raw fields in the next line for COPY FROM in text or csv mode.
  * Return false if no more lines.
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 103eb21767d..ac58adbd23d 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -104,4 +104,6 @@ typedef struct CopyFromRoutine
     void        (*CopyFromEnd) (CopyFromState cstate);
 } CopyFromRoutine;
 
+extern int    CopyFromStateGetData(CopyFromState cstate, void *dest, int minread, int maxread);
+
 #endif                            /* COPYAPI_H */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 3a306e3286e..af425cf5fd9 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -181,6 +181,9 @@ typedef struct CopyFromStateData
 #define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index)
 
     uint64        bytes_processed;    /* number of bytes processed so far */
+
+    /* For custom format implementation */
+    void       *opaque;            /* private space */
 } CopyFromStateData;
 
 extern void ReceiveCopyBegin(CopyFromState cstate);
-- 
2.47.1
From bec62135de0224c9bab139bc86b2263854a5b1a1 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Wed, 27 Nov 2024 16:23:55 +0900
Subject: [PATCH v31 9/9] Add CopyFromSkipErrorRow() for custom COPY format
 extension
Extensions must call CopyFromSkipErrorRow() when CopyFromOneRow
callback reports an error by errsave(). CopyFromSkipErrorRow() handles
"ON_ERROR stop" and "LOG_VERBOSITY verbose" cases.
---
 src/backend/commands/copyfromparse.c          | 82 ++++++++++--------
 src/include/commands/copyapi.h                |  2 +
 .../expected/test_copy_format.out             | 47 +++++++++++
 .../test_copy_format/sql/test_copy_format.sql | 24 ++++++
 .../test_copy_format/test_copy_format.c       | 83 ++++++++++++++++++-
 5 files changed, 200 insertions(+), 38 deletions(-)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 01f2e7a8824..7296745d6d2 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -852,6 +852,51 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields, bool i
     return true;
 }
 
+/*
+ * Call this when you report an error by errsave() in your CopyFromOneRow
+ * callback. This handles "ON_ERROR stop" and "LOG_VERBOSITY verbose" cases
+ * for you.
+ */
+void
+CopyFromSkipErrorRow(CopyFromState cstate)
+{
+    Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
+
+    cstate->num_errors++;
+
+    if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+    {
+        /*
+         * Since we emit line number and column info in the below notice
+         * message, we suppress error context information other than the
+         * relation name.
+         */
+        Assert(!cstate->relname_only);
+        cstate->relname_only = true;
+
+        if (cstate->cur_attval)
+        {
+            char       *attval;
+
+            attval = CopyLimitPrintoutLength(cstate->cur_attval);
+            ereport(NOTICE,
+                    errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": \"%s\"",
+                           (unsigned long long) cstate->cur_lineno,
+                           cstate->cur_attname,
+                           attval));
+            pfree(attval);
+        }
+        else
+            ereport(NOTICE,
+                    errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": null
input",
+                           (unsigned long long) cstate->cur_lineno,
+                           cstate->cur_attname));
+
+        /* reset relname_only */
+        cstate->relname_only = false;
+    }
+}
+
 /*
  * Workhorse for CopyFromTextOneRow() and CopyFromCSVOneRow().
  *
@@ -960,42 +1005,7 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
                                         (Node *) cstate->escontext,
                                         &values[m]))
         {
-            Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
-
-            cstate->num_errors++;
-
-            if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
-            {
-                /*
-                 * Since we emit line number and column info in the below
-                 * notice message, we suppress error context information other
-                 * than the relation name.
-                 */
-                Assert(!cstate->relname_only);
-                cstate->relname_only = true;
-
-                if (cstate->cur_attval)
-                {
-                    char       *attval;
-
-                    attval = CopyLimitPrintoutLength(cstate->cur_attval);
-                    ereport(NOTICE,
-                            errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\":
\"%s\"",
-                                   (unsigned long long) cstate->cur_lineno,
-                                   cstate->cur_attname,
-                                   attval));
-                    pfree(attval);
-                }
-                else
-                    ereport(NOTICE,
-                            errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": null
input",
-                                   (unsigned long long) cstate->cur_lineno,
-                                   cstate->cur_attname));
-
-                /* reset relname_only */
-                cstate->relname_only = false;
-            }
-
+            CopyFromSkipErrorRow(cstate);
             return true;
         }
 
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index ac58adbd23d..dfab62372a7 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -106,4 +106,6 @@ typedef struct CopyFromRoutine
 
 extern int    CopyFromStateGetData(CopyFromState cstate, void *dest, int minread, int maxread);
 
+extern void CopyFromSkipErrorRow(CopyFromState cstate);
+
 #endif                            /* COPYAPI_H */
diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out
b/src/test/modules/test_copy_format/expected/test_copy_format.out
index 016893e7026..b9a6baa85c0 100644
--- a/src/test/modules/test_copy_format/expected/test_copy_format.out
+++ b/src/test/modules/test_copy_format/expected/test_copy_format.out
@@ -1,6 +1,8 @@
 CREATE EXTENSION test_copy_format;
 CREATE TABLE public.test (a smallint, b integer, c bigint);
 INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+-- 987 is accepted.
+-- 654 is a hard error because ON_ERROR is stop by default.
 COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
 NOTICE:  test_copy_format: is_from=true
 NOTICE:  CopyFromInFunc: atttypid=21
@@ -8,7 +10,50 @@ NOTICE:  CopyFromInFunc: atttypid=23
 NOTICE:  CopyFromInFunc: atttypid=20
 NOTICE:  CopyFromStart: natts=3
 NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromOneRow
+ERROR:  invalid value: "6"
+CONTEXT:  COPY test, line 2, column a: "6"
+-- 987 is accepted.
+-- 654 is a soft error because ON_ERROR is ignore.
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore);
+NOTICE:  test_copy_format: is_from=true
+NOTICE:  CopyFromInFunc: atttypid=21
+NOTICE:  CopyFromInFunc: atttypid=23
+NOTICE:  CopyFromInFunc: atttypid=20
+NOTICE:  CopyFromStart: natts=3
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromOneRow
+NOTICE:  1 row was skipped due to data type incompatibility
 NOTICE:  CopyFromEnd
+-- 987 is accepted.
+-- 654 is a soft error because ON_ERROR is ignore.
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore, LOG_VERBOSITY verbose);
+NOTICE:  test_copy_format: is_from=true
+NOTICE:  CopyFromInFunc: atttypid=21
+NOTICE:  CopyFromInFunc: atttypid=23
+NOTICE:  CopyFromInFunc: atttypid=20
+NOTICE:  CopyFromStart: natts=3
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromOneRow
+NOTICE:  skipping row due to data type incompatibility at line 2 for column "a": "6"
+NOTICE:  CopyFromOneRow
+NOTICE:  1 row was skipped due to data type incompatibility
+NOTICE:  CopyFromEnd
+-- 987 is accepted.
+-- 654 is a soft error because ON_ERROR is ignore.
+-- 321 is a hard error.
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore);
+NOTICE:  test_copy_format: is_from=true
+NOTICE:  CopyFromInFunc: atttypid=21
+NOTICE:  CopyFromInFunc: atttypid=23
+NOTICE:  CopyFromInFunc: atttypid=20
+NOTICE:  CopyFromStart: natts=3
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromOneRow
+ERROR:  too much lines: 3
+CONTEXT:  COPY test, line 3
 COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
 NOTICE:  test_copy_format: is_from=false
 NOTICE:  CopyToOutFunc: atttypid=21
@@ -18,4 +63,6 @@ NOTICE:  CopyToStart: natts=3
 NOTICE:  CopyToOneRow: tts_nvalid=3
 NOTICE:  CopyToOneRow: tts_nvalid=3
 NOTICE:  CopyToOneRow: tts_nvalid=3
+NOTICE:  CopyToOneRow: tts_nvalid=3
+NOTICE:  CopyToOneRow: tts_nvalid=3
 NOTICE:  CopyToEnd
diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql
b/src/test/modules/test_copy_format/sql/test_copy_format.sql
index 0dfdfa00080..86db71bce7f 100644
--- a/src/test/modules/test_copy_format/sql/test_copy_format.sql
+++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql
@@ -1,6 +1,30 @@
 CREATE EXTENSION test_copy_format;
 CREATE TABLE public.test (a smallint, b integer, c bigint);
 INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+-- 987 is accepted.
+-- 654 is a hard error because ON_ERROR is stop by default.
 COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
+987
+654
+\.
+-- 987 is accepted.
+-- 654 is a soft error because ON_ERROR is ignore.
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore);
+987
+654
+\.
+-- 987 is accepted.
+-- 654 is a soft error because ON_ERROR is ignore.
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore, LOG_VERBOSITY verbose);
+987
+654
+\.
+-- 987 is accepted.
+-- 654 is a soft error because ON_ERROR is ignore.
+-- 321 is a hard error.
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore);
+987
+654
+321
 \.
 COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
diff --git a/src/test/modules/test_copy_format/test_copy_format.c
b/src/test/modules/test_copy_format/test_copy_format.c
index f6b105659ab..b766d3c96ff 100644
--- a/src/test/modules/test_copy_format/test_copy_format.c
+++ b/src/test/modules/test_copy_format/test_copy_format.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 
 #include "commands/copyapi.h"
+#include "commands/copyfrom_internal.h"
 #include "commands/defrem.h"
 
 PG_MODULE_MAGIC;
@@ -32,10 +33,88 @@ CopyFromStart(CopyFromState cstate, TupleDesc tupDesc)
 }
 
 static bool
-CopyFromOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls)
+CopyFromOneRow(CopyFromState cstate, ExprContext *econtext,
+               Datum *values, bool *nulls)
 {
+    int            n_attributes = list_length(cstate->attnumlist);
+    char       *line;
+    int            line_size = n_attributes + 1;    /* +1 is for new line */
+    int            read_bytes;
+
     ereport(NOTICE, (errmsg("CopyFromOneRow")));
-    return false;
+
+    cstate->cur_lineno++;
+    line = palloc(line_size);
+    read_bytes = CopyFromStateGetData(cstate, line, line_size, line_size);
+    if (read_bytes == 0)
+        return false;
+    if (read_bytes != line_size)
+        ereport(ERROR,
+                (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                 errmsg("one line must be %d bytes: %d",
+                        line_size, read_bytes)));
+
+    if (cstate->cur_lineno == 1)
+    {
+        /* Success */
+        TupleDesc    tupDesc = RelationGetDescr(cstate->rel);
+        ListCell   *cur;
+        int            i = 0;
+
+        foreach(cur, cstate->attnumlist)
+        {
+            int            attnum = lfirst_int(cur);
+            int            m = attnum - 1;
+            Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+            if (att->atttypid == INT2OID)
+            {
+                values[i] = Int16GetDatum(line[i] - '0');
+            }
+            else if (att->atttypid == INT4OID)
+            {
+                values[i] = Int32GetDatum(line[i] - '0');
+            }
+            else if (att->atttypid == INT8OID)
+            {
+                values[i] = Int64GetDatum(line[i] - '0');
+            }
+            nulls[i] = false;
+            i++;
+        }
+    }
+    else if (cstate->cur_lineno == 2)
+    {
+        /* Soft error */
+        TupleDesc    tupDesc = RelationGetDescr(cstate->rel);
+        int            attnum = lfirst_int(list_head(cstate->attnumlist));
+        int            m = attnum - 1;
+        Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+        char        value[2];
+
+        cstate->cur_attname = NameStr(att->attname);
+        value[0] = line[0];
+        value[1] = '\0';
+        cstate->cur_attval = value;
+        errsave((Node *) cstate->escontext,
+                (
+                 errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+                 errmsg("invalid value: \"%c\"", line[0])));
+        CopyFromSkipErrorRow(cstate);
+        cstate->cur_attname = NULL;
+        cstate->cur_attval = NULL;
+        return true;
+    }
+    else
+    {
+        /* Hard error */
+        ereport(ERROR,
+                (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                 errmsg("too much lines: %llu",
+                        (unsigned long long) cstate->cur_lineno)));
+    }
+
+    return true;
 }
 
 static void
-- 
2.47.1
			
		Sutou Kouhei писал(а) 2025-02-01 17:12: > Hi, > Hi I would like to inform about the security breach in your design of COPY TO/FROM. You use FORMAT option to add new formats, filling it with routine name in shared library. As result any caller can call any routine in PostgreSQL kernel. I think, it will start competition, who can find most dangerous routine to call just from COPY FROM command. Standard PostgreSQL realisation for new methods to use USING keyword. Every new method could have own options (FORMAT is option of internal 'copy from/to' methods), it assumes some SetOptions interface, that defines an options structure according to the new method requirements. I agree with the general direction of the extensibility, but it should be secure and consistent. -- Best regards, Vladlen Popolitov.
Hi,
In <d838025aceeb19c9ff1db702fa55cabf@postgrespro.ru>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 03 Feb 2025 13:38:04 +0700,
  Vladlen Popolitov <v.popolitov@postgrespro.ru> wrote:
> I would like to inform about the security breach in your design of
> COPY TO/FROM.
Thanks! I didn't notice it.
> You use FORMAT option to add new formats, filling it with routine name
> in shared library. As result any caller can call any routine in
> PostgreSQL kernel.
We require "FORMAT_NAME(internal)" signature:
----
    funcargtypes[0] = INTERNALOID;
    handlerOid = LookupFuncName(list_make1(makeString(format)), 1,
                                funcargtypes, true);
----
So any caller can call only routines that use the signature.
Should we add more checks for security? If so, what checks
are needed?
For example, does requiring a prefix such as "copy_" (use
"copy_json" for "json" format) improve security?
For example, we need to register a handler explicitly
(CREATE ACCESS METHOD) when we want to use a new access
method. Should we require an explicit registration for
custom COPY format too?
>  Standard PostgreSQL realisation for new methods to use USING
>  keyword. Every
> new method could have own options (FORMAT is option of internal 'copy
> from/to'
> methods),
Ah, I didn't think about USING.
You suggest "COPY ... USING json" not "COPY ... FORMAT json"
like "CREATE INDEX ... USING custom_index", right? It will
work. If we use this interface, we should reject "COPY
... FORMAT ... USING" (both of FORMAT/USING are specified).
>           it assumes some SetOptions interface, that defines
> an options structure according to the new method requirements.
Sorry. I couldn't find the SetOptions interface in source
code. I found only AT_SetOptions. Did you mean it by "some
SetOptions interface"?
I'm familiar with only access method. It has
IndexAmRoutine::amoptions. Is it a SetOptions interface
example?
FYI: The current patch set doesn't have custom options
support yet. Because we want to start from a minimal feature
set. But we'll add support for custom options eventually.
Thanks,
-- 
kou
			
		Sutou Kouhei писал(а) 2025-02-04 13:29: Hi > Hi, > > In <d838025aceeb19c9ff1db702fa55cabf@postgrespro.ru> > "Re: Make COPY format extendable: Extract COPY TO format > implementations" on Mon, 03 Feb 2025 13:38:04 +0700, > Vladlen Popolitov <v.popolitov@postgrespro.ru> wrote: > >> I would like to inform about the security breach in your design of >> COPY TO/FROM. > > Thanks! I didn't notice it. > >> You use FORMAT option to add new formats, filling it with routine name >> in shared library. As result any caller can call any routine in >> PostgreSQL kernel. > > We require "FORMAT_NAME(internal)" signature: > > ---- > funcargtypes[0] = INTERNALOID; > handlerOid = LookupFuncName(list_make1(makeString(format)), 1, > funcargtypes, true); > ---- > > So any caller can call only routines that use the signature. > > Should we add more checks for security? If so, what checks > are needed? > > For example, does requiring a prefix such as "copy_" (use > "copy_json" for "json" format) improve security? > > For example, we need to register a handler explicitly > (CREATE ACCESS METHOD) when we want to use a new access > method. Should we require an explicit registration for > custom COPY format too? > > I think, in case of USING PostgreSQL kernel will call corresponding handler, and it looks secure - the same as for table and index methods handlers. >> Standard PostgreSQL realisation for new methods to use USING >> keyword. Every >> new method could have own options (FORMAT is option of internal 'copy >> from/to' >> methods), > > Ah, I didn't think about USING. > > You suggest "COPY ... USING json" not "COPY ... FORMAT json" > like "CREATE INDEX ... USING custom_index", right? It will > work. If we use this interface, we should reject "COPY > ... FORMAT ... USING" (both of FORMAT/USING are specified). > > I cannot recommend about rejecting, I do not know details of realisation of this part of code. Just idea - FORMAT value could be additional option to copy handler or NULL if it is omitted. If you add extensibility, than every handler will be the extension, that can handle one or more formats. >> it assumes some SetOptions interface, that defines >> an options structure according to the new method requirements. > > Sorry. I couldn't find the SetOptions interface in source > code. I found only AT_SetOptions. Did you mean it by "some > SetOptions interface"? Yes. > I'm familiar with only access method. It has > IndexAmRoutine::amoptions. Is it a SetOptions interface > example? Yes. I think, it would be compatible with other modules of source code and could use the same code base to process options of COPY TO/FROM > > FYI: The current patch set doesn't have custom options > support yet. Because we want to start from a minimal feature > set. But we'll add support for custom options eventually. Sorry for disturbing. I did not have intention to stop your patch, I would like to point to that details as early as possible. -- Best regards, Vladlen Popolitov.
On Tue, Feb 4, 2025 at 2:46 AM Vladlen Popolitov <v.popolitov@postgrespro.ru> wrote: > > Sutou Kouhei писал(а) 2025-02-04 13:29: > Hi > > Hi, > > > > In <d838025aceeb19c9ff1db702fa55cabf@postgrespro.ru> > > "Re: Make COPY format extendable: Extract COPY TO format > > implementations" on Mon, 03 Feb 2025 13:38:04 +0700, > > Vladlen Popolitov <v.popolitov@postgrespro.ru> wrote: > > > >> I would like to inform about the security breach in your design of > >> COPY TO/FROM. > > > > Thanks! I didn't notice it. > > > >> You use FORMAT option to add new formats, filling it with routine name > >> in shared library. As result any caller can call any routine in > >> PostgreSQL kernel. > > > > We require "FORMAT_NAME(internal)" signature: > > > > ---- > > funcargtypes[0] = INTERNALOID; > > handlerOid = LookupFuncName(list_make1(makeString(format)), 1, > > funcargtypes, true); > > ---- > > > > So any caller can call only routines that use the signature. > > > > Should we add more checks for security? If so, what checks > > are needed? > > > > For example, does requiring a prefix such as "copy_" (use > > "copy_json" for "json" format) improve security? > > > > For example, we need to register a handler explicitly > > (CREATE ACCESS METHOD) when we want to use a new access > > method. Should we require an explicit registration for > > custom COPY format too? > > > > > > I think, in case of USING PostgreSQL kernel will call corresponding > handler, > and it looks secure - the same as for table and index methods handlers. IIUC even with custom copy format patches, we call the corresponding handler function to get the routines, which is essentially similar to what we do for table AM, index AM, and tablesample.I don't think we allow users to call any routine in PostgreSQL core via custom FORMAT option. BTW we need to check if the return value type of the handler function is copy_handler. > > >> Standard PostgreSQL realisation for new methods to use USING > >> keyword. Every > >> new method could have own options (FORMAT is option of internal 'copy > >> from/to' > >> methods), > > > > Ah, I didn't think about USING. > > > > You suggest "COPY ... USING json" not "COPY ... FORMAT json" > > like "CREATE INDEX ... USING custom_index", right? It will > > work. If we use this interface, we should reject "COPY > > ... FORMAT ... USING" (both of FORMAT/USING are specified). > > > > > I cannot recommend about rejecting, I do not know details > of realisation of this part of code. Just idea - FORMAT value > could be additional option to copy handler or NULL > if it is omitted. > If you add extensibility, than every handler will be the > extension, that can handle one or more formats. Hmm, if we use the USING clause to specify the format type, we end up having two ways to specify the format type (e.g., 'COPY ... USING text' and 'COPY .. WITH (format = text)'), which seems to confuse users. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Masahiko Sawada писал(а) 2025-02-05 08:32: > On Tue, Feb 4, 2025 at 2:46 AM Vladlen Popolitov >> >> Standard PostgreSQL realisation for new methods to use USING >> >> keyword. Every >> >> new method could have own options (FORMAT is option of internal 'copy >> >> from/to' >> >> methods), >> > >> > Ah, I didn't think about USING. >> > >> > You suggest "COPY ... USING json" not "COPY ... FORMAT json" >> > like "CREATE INDEX ... USING custom_index", right? It will >> > work. If we use this interface, we should reject "COPY >> > ... FORMAT ... USING" (both of FORMAT/USING are specified). >> > >> > >> I cannot recommend about rejecting, I do not know details >> of realisation of this part of code. Just idea - FORMAT value >> could be additional option to copy handler or NULL >> if it is omitted. >> If you add extensibility, than every handler will be the >> extension, that can handle one or more formats. > > Hmm, if we use the USING clause to specify the format type, we end up > having two ways to specify the format type (e.g., 'COPY ... USING > text' and 'COPY .. WITH (format = text)'), which seems to confuse > users. WITH clause has list of options defined by copy method define in USING. The clause WITH (format=text) has options defined for default copy method, but other methods will define own options. Probably they do not need the word 'format' in options. The same as in index access methods. For example, copy method parquete: COPY ... USING parquete WITH (row_group_size=1000000) copy method parquete need and will define the word 'row_group_size' in options, the word 'format' will be wrong for it. -- Best regards, Vladlen Popolitov.
On Tue, Feb 4, 2025 at 6:19 PM Vladlen Popolitov <v.popolitov@postgrespro.ru> wrote: > > Masahiko Sawada писал(а) 2025-02-05 08:32: > > On Tue, Feb 4, 2025 at 2:46 AM Vladlen Popolitov > > >> >> Standard PostgreSQL realisation for new methods to use USING > >> >> keyword. Every > >> >> new method could have own options (FORMAT is option of internal 'copy > >> >> from/to' > >> >> methods), > >> > > >> > Ah, I didn't think about USING. > >> > > >> > You suggest "COPY ... USING json" not "COPY ... FORMAT json" > >> > like "CREATE INDEX ... USING custom_index", right? It will > >> > work. If we use this interface, we should reject "COPY > >> > ... FORMAT ... USING" (both of FORMAT/USING are specified). > >> > > >> > > >> I cannot recommend about rejecting, I do not know details > >> of realisation of this part of code. Just idea - FORMAT value > >> could be additional option to copy handler or NULL > >> if it is omitted. > >> If you add extensibility, than every handler will be the > >> extension, that can handle one or more formats. > > > > Hmm, if we use the USING clause to specify the format type, we end up > > having two ways to specify the format type (e.g., 'COPY ... USING > > text' and 'COPY .. WITH (format = text)'), which seems to confuse > > users. > WITH clause has list of options defined by copy method define in USING. > The clause WITH (format=text) has options defined for default copy > method, > but other methods will define own options. Probably they do not need > the word 'format' in options. The same as in index access methods. > For example, copy method parquete: > COPY ... USING parquete WITH (row_group_size=1000000) > copy method parquete need and will define the word 'row_group_size' > in options, the word 'format' will be wrong for it. I think it's orthological between the syntax and options passed to the custom format extension. For example, even if we specify the options like "COPY ... WITH (format 'parquet', row_group_size '1000000', on_error 'ignore)", we can pass only non-built-in options (i.e. only row_group_size) to the extension. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
On Tue, Feb 4, 2025 at 9:10 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Sat, Feb 01, 2025 at 07:12:01PM +0900, Sutou Kouhei wrote:
> > For the propose, copyapi.h should not include
> > copy{to,from}_internal.h. If we do it, copyto.c includes
> > CopyFromState and copyfrom*.c include CopyToState.
> >
> > What do you think about the following change? Note that
> > extensions must include copy{to,from}_internal.h explicitly
> > in addition to copyapi.h.
>
> I was just looking at bit at this series of patch labelled with v31,
> to see what is happening here.
>
> In 0001, we have that:
>
> +       /* format-specific routines */
> +       const CopyToRoutine *routine;
> [...]
> -       CopySendEndOfRow(cstate);
> +       cstate->routine->CopyToOneRow(cstate, slot);
>
> Having a callback where the copy state is processed once per row is
> neat in terms of design for the callbacks and what extensions can do,
> and this is much better than what 2889fd23be5 has attempted (later
> reverted in 1aa8324b81fa) because we don't do indirect function calls
> for each attribute.  Still, I have a question here: what happens for a
> COPY TO that involves one attribute, a short field size like an int2
> and many rows (the more rows the more pronounced the effect, of
> course)?  Could this level of indirection still be the cause of some
> regressions in a case like that?  This is the worst case I can think
> about, on top of my mind, and I am not seeing tests with few
> attributes like this one, where we would try to make this callback as
> hot as possible.  This is a performance-sensitive area.
FYI when Sutou-san last measured the performance[1], it showed a
slight speed up even with fewer columns (5 columns) in both COPY TO
and COPY FROM cases. The callback design has not changed since then.
But it would be a good idea to run the benchmark with a table having a
single small size column.
Regards,
[1] https://www.postgresql.org/message-id/20241114.161948.1677325020727842666.kou%40clear-code.com
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
			
		Hi, In <eb59c12bb36207c65f23719f255eb69b@postgrespro.ru> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Tue, 04 Feb 2025 17:46:10 +0700, Vladlen Popolitov <v.popolitov@postgrespro.ru> wrote: > I think, in case of USING PostgreSQL kernel will call corresponding > handler, > and it looks secure - the same as for table and index methods > handlers. We use similar approach that is used by table sampling method. We can use a new table sampling method by just adding a "method_name(internal) RETURNS tsm_handler" function. Is it not secure too? > If you add extensibility, than every handler will be the > extension, that can handle one or more formats. Hmm. It may be a needless extensibility. Is it useful? I feel that it increases complexity when we implement a custom format handler. We can just implement one handler per custom format. If we want to share implementation details in multiple handlers, we can just share internal C functions. If we require one handler per custom format, it'll simpler than one handler for multiple custom formats. >>> it assumes some SetOptions interface, that defines >>> an options structure according to the new method requirements. >> Sorry. I couldn't find the SetOptions interface in source >> code. I found only AT_SetOptions. Did you mean it by "some >> SetOptions interface"? > Yes. >> I'm familiar with only access method. It has >> IndexAmRoutine::amoptions. Is it a SetOptions interface >> example? > Yes. I think, it would be compatible with other modules > of source code and could use the same code base to process > options of COPY TO/FROM Thanks. I thought that there is a common interface pattern for SetOptions. But it seems that it's a feature that is implemented in many extension points. If we implement custom options support eventually, does it satisfy the "SetOptions interface"? Thanks, -- kou
Hi, In <CAD21AoDCH1io_dGtsmnmZ4bUWfdPhEUe_8VQNvi31+78Pt7KdQ@mail.gmail.com> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Tue, 4 Feb 2025 17:32:07 -0800, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > BTW we need to check if the return value type of the handler function > is copy_handler. Oh, can we do it without calling a function? It seems that FmgrInfo doesn't have return value type information. Should we read pg_catalog.pg_proc or something for it? Thanks, -- kou
On 2025-Feb-03, Vladlen Popolitov wrote: > You use FORMAT option to add new formats, filling it with routine name > in shared library. As result any caller can call any routine in PostgreSQL > kernel. > I think, it will start competition, who can find most dangerous routine > to call just from COPY FROM command. Hah. Maybe it would be a better UI to require that COPY format handlers are registered explicitly before they can be used: CREATE ACCESS METHOD copy_yaml TYPE copy HANDLER copy_yaml_handler; ... and then when the FORMAT is not recognized as one of the hardcoded methods, we go look in pg_am for one with amtype='c' and the given name. That gives you the function that initializes the Copy state. This is convenient enough because system administrators can add COPY formats that anyone can use, and doesn't allow to call arbitrary functions via COPY. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "I can't go to a restaurant and order food because I keep looking at the fonts on the menu. Five minutes later I realize that it's also talking about food" (Donald Knuth)
Álvaro Herrera писал(а) 2025-02-05 18:49: > On 2025-Feb-03, Vladlen Popolitov wrote: > >> You use FORMAT option to add new formats, filling it with routine name >> in shared library. As result any caller can call any routine in >> PostgreSQL >> kernel. >> I think, it will start competition, who can find most dangerous >> routine >> to call just from COPY FROM command. > > Hah. > > Maybe it would be a better UI to require that COPY format handlers are > registered explicitly before they can be used: > > CREATE ACCESS METHOD copy_yaml TYPE copy HANDLER copy_yaml_handler; > > ... and then when the FORMAT is not recognized as one of the hardcoded > methods, we go look in pg_am for one with amtype='c' and the given > name. > That gives you the function that initializes the Copy state. > > This is convenient enough because system administrators can add COPY > formats that anyone can use, and doesn't allow to call arbitrary > functions via COPY. Yes! It is what I propose. This looks much safer and already used in access methods creation. -- Best regards, Vladlen Popolitov.
On Wed, Feb 5, 2025 at 3:49 AM Álvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > On 2025-Feb-03, Vladlen Popolitov wrote: > > > You use FORMAT option to add new formats, filling it with routine name > > in shared library. As result any caller can call any routine in PostgreSQL > > kernel. > > I think, it will start competition, who can find most dangerous routine > > to call just from COPY FROM command. > > Hah. > > Maybe it would be a better UI to require that COPY format handlers are > registered explicitly before they can be used: > > CREATE ACCESS METHOD copy_yaml TYPE copy HANDLER copy_yaml_handler; > > ... and then when the FORMAT is not recognized as one of the hardcoded > methods, we go look in pg_am for one with amtype='c' and the given name. > That gives you the function that initializes the Copy state. > > This is convenient enough because system administrators can add COPY > formats that anyone can use, and doesn't allow to call arbitrary > functions via COPY. I think that the patch needs to check if the function's result type is COPY_HANDLEROID by using get_func_rettype(), before calling it. But with this check, we can prevent arbitrary functions from being called via COPY. Why do we need to extend CREATE ACCESS METHOD too for that purpose? Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
On 2025-Feb-05, Masahiko Sawada wrote: > I think that the patch needs to check if the function's result type is > COPY_HANDLEROID by using get_func_rettype(), before calling it. But > with this check, we can prevent arbitrary functions from being called > via COPY. Why do we need to extend CREATE ACCESS METHOD too for that > purpose? It's a nicer UI than a bare CREATE FUNCTION, but perhaps it is overkill. IIRC the reason we require CREATE ACCESS METHOD for table AMs is so that we acquire a pg_am entry with an OID that can be referenced from elsewhere, for instance you can't drop an AM if tables are using it; but you can't use COPY in rules or anything like that that's going to be stored permanently. Perhaps you're right that we don't need this for extensible COPY FORMAT. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
On Tue, Feb 4, 2025 at 11:37 PM Sutou Kouhei <kou@clear-code.com> wrote:
>
> Hi,
>
> In <CAD21AoDCH1io_dGtsmnmZ4bUWfdPhEUe_8VQNvi31+78Pt7KdQ@mail.gmail.com>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Tue, 4 Feb 2025 17:32:07 -0800,
>   Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> > BTW we need to check if the return value type of the handler function
> > is copy_handler.
>
> Oh, can we do it without calling a function?
Yes.
> It seems that
> FmgrInfo doesn't have return value type information. Should
> we read pg_catalog.pg_proc or something for it?
Yes, we can do like what we do for TABLESAMPLE for example:
    /* check that handler has correct return type */
    if (get_func_rettype(handlerOid) != TSM_HANDLEROID)
        ereport(ERROR,
                (errcode(ERRCODE_WRONG_OBJECT_TYPE),
                 errmsg("function %s must return type %s",
                        NameListToString(rts->method), "tsm_handler"),
                 parser_errposition(pstate, rts->location)));
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
			
		Hi,
In <CAD21AoDCd9pKZ2XMOUmnmteC60NYBLr80FWY56Nn3NEbxVxdeQ@mail.gmail.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Wed, 5 Feb 2025 12:29:44 -0800,
  Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> It seems that
>> FmgrInfo doesn't have return value type information. Should
>> we read pg_catalog.pg_proc or something for it?
> 
> Yes, we can do like what we do for TABLESAMPLE for example:
> 
>     /* check that handler has correct return type */
>     if (get_func_rettype(handlerOid) != TSM_HANDLEROID)
>         ereport(ERROR,
>                 (errcode(ERRCODE_WRONG_OBJECT_TYPE),
>                  errmsg("function %s must return type %s",
>                         NameListToString(rts->method), "tsm_handler"),
>                  parser_errposition(pstate, rts->location)));
Thanks! I didn't know get_func_rettype().
I attach the v32 patch set. 0003 is only changed. The
following check is added. It's similar to TABLESAMPLE's one.
    /* check that handler has correct return type */
    if (get_func_rettype(handlerOid) != COPY_HANDLEROID)
        ereport(ERROR,
                (errcode(ERRCODE_WRONG_OBJECT_TYPE),
                 errmsg("function %s must return type %s",
                        format, "copy_handler"),
                 parser_errposition(pstate, defel->location)));
Thanks,
-- 
kou
From bc750922725287eb51659eb4726c2801bcb39c49 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Sat, 28 Sep 2024 23:24:49 +0900
Subject: [PATCH v32 1/9] Refactor COPY TO to use format callback functions.
This commit introduces a new CopyToRoutine struct, which is a set of
callback routines to copy tuples in a specific format. It also makes
the existing formats (text, CSV, and binary) utilize these format
callbacks.
This change is a preliminary step towards making the COPY TO command
extensible in terms of output formats.
Additionally, this refactoring contributes to a performance
improvement by reducing the number of "if" branches that need to be
checked on a per-row basis when sending field representations in text
or CSV mode. The performance benchmark results showed ~5% performance
gain in text or CSV mode.
Author: Sutou Kouhei
Reviewed-by: Michael Paquier, Tomas Vondra, Masahiko Sawada
Reviewed-by: Junwang Zhao
Discussion: https://postgr.es/m/20231204.153548.2126325458835528809.kou@clear-code.com
---
 src/backend/commands/copyto.c    | 441 +++++++++++++++++++++----------
 src/include/commands/copyapi.h   |  55 ++++
 src/tools/pgindent/typedefs.list |   1 +
 3 files changed, 356 insertions(+), 141 deletions(-)
 create mode 100644 src/include/commands/copyapi.h
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 99cb23cb347..26c67ddc351 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -19,7 +19,7 @@
 #include <sys/stat.h>
 
 #include "access/tableam.h"
-#include "commands/copy.h"
+#include "commands/copyapi.h"
 #include "commands/progress.h"
 #include "executor/execdesc.h"
 #include "executor/executor.h"
@@ -64,6 +64,9 @@ typedef enum CopyDest
  */
 typedef struct CopyToStateData
 {
+    /* format-specific routines */
+    const CopyToRoutine *routine;
+
     /* low-level state data */
     CopyDest    copy_dest;        /* type of copy source/destination */
     FILE       *copy_file;        /* used if copy_dest == COPY_FILE */
@@ -114,6 +117,19 @@ static void CopyAttributeOutText(CopyToState cstate, const char *string);
 static void CopyAttributeOutCSV(CopyToState cstate, const char *string,
                                 bool use_quote);
 
+/* built-in format-specific routines */
+static void CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc);
+static void CopyToTextLikeOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo);
+static void CopyToTextOneRow(CopyToState cstate, TupleTableSlot *slot);
+static void CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot);
+static void CopyToTextLikeOneRow(CopyToState cstate, TupleTableSlot *slot,
+                                 bool is_csv);
+static void CopyToTextLikeEnd(CopyToState cstate);
+static void CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc);
+static void CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo);
+static void CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot);
+static void CopyToBinaryEnd(CopyToState cstate);
+
 /* Low-level communications functions */
 static void SendCopyBegin(CopyToState cstate);
 static void SendCopyEnd(CopyToState cstate);
@@ -121,9 +137,254 @@ static void CopySendData(CopyToState cstate, const void *databuf, int datasize);
 static void CopySendString(CopyToState cstate, const char *str);
 static void CopySendChar(CopyToState cstate, char c);
 static void CopySendEndOfRow(CopyToState cstate);
+static void CopySendTextLikeEndOfRow(CopyToState cstate);
 static void CopySendInt32(CopyToState cstate, int32 val);
 static void CopySendInt16(CopyToState cstate, int16 val);
 
+/*
+ * COPY TO routines for built-in formats.
+ *
+ * CSV and text formats share the same TextLike routines except for the
+ * one-row callback.
+ */
+
+/* text format */
+static const CopyToRoutine CopyToRoutineText = {
+    .CopyToStart = CopyToTextLikeStart,
+    .CopyToOutFunc = CopyToTextLikeOutFunc,
+    .CopyToOneRow = CopyToTextOneRow,
+    .CopyToEnd = CopyToTextLikeEnd,
+};
+
+/* CSV format */
+static const CopyToRoutine CopyToRoutineCSV = {
+    .CopyToStart = CopyToTextLikeStart,
+    .CopyToOutFunc = CopyToTextLikeOutFunc,
+    .CopyToOneRow = CopyToCSVOneRow,
+    .CopyToEnd = CopyToTextLikeEnd,
+};
+
+/* binary format */
+static const CopyToRoutine CopyToRoutineBinary = {
+    .CopyToStart = CopyToBinaryStart,
+    .CopyToOutFunc = CopyToBinaryOutFunc,
+    .CopyToOneRow = CopyToBinaryOneRow,
+    .CopyToEnd = CopyToBinaryEnd,
+};
+
+/* Return a COPY TO routine for the given options */
+static const CopyToRoutine *
+CopyToGetRoutine(CopyFormatOptions opts)
+{
+    if (opts.csv_mode)
+        return &CopyToRoutineCSV;
+    else if (opts.binary)
+        return &CopyToRoutineBinary;
+
+    /* default is text */
+    return &CopyToRoutineText;
+}
+
+/* Implementation of the start callback for text and CSV formats */
+static void
+CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
+{
+    /*
+     * For non-binary copy, we need to convert null_print to file encoding,
+     * because it will be sent directly with CopySendString.
+     */
+    if (cstate->need_transcoding)
+        cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print,
+                                                          cstate->opts.null_print_len,
+                                                          cstate->file_encoding);
+
+    /* if a header has been requested send the line */
+    if (cstate->opts.header_line)
+    {
+        ListCell   *cur;
+        bool        hdr_delim = false;
+
+        foreach(cur, cstate->attnumlist)
+        {
+            int            attnum = lfirst_int(cur);
+            char       *colname;
+
+            if (hdr_delim)
+                CopySendChar(cstate, cstate->opts.delim[0]);
+            hdr_delim = true;
+
+            colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
+
+            if (cstate->opts.csv_mode)
+                CopyAttributeOutCSV(cstate, colname, false);
+            else
+                CopyAttributeOutText(cstate, colname);
+        }
+
+        CopySendTextLikeEndOfRow(cstate);
+    }
+}
+
+/*
+ * Implementation of the outfunc callback for text and CSV formats. Assign
+ * the output function data to the given *finfo.
+ */
+static void
+CopyToTextLikeOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo)
+{
+    Oid            func_oid;
+    bool        is_varlena;
+
+    /* Set output function for an attribute */
+    getTypeOutputInfo(atttypid, &func_oid, &is_varlena);
+    fmgr_info(func_oid, finfo);
+}
+
+/* Implementation of the per-row callback for text format */
+static void
+CopyToTextOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+    CopyToTextLikeOneRow(cstate, slot, false);
+}
+
+/* Implementation of the per-row callback for CSV format */
+static void
+CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+    CopyToTextLikeOneRow(cstate, slot, true);
+}
+
+/*
+ * Workhorse for CopyToTextOneRow() and CopyToCSVOneRow().
+ *
+ * We use pg_attribute_always_inline to reduce function call overheads.
+ */
+static pg_attribute_always_inline void
+CopyToTextLikeOneRow(CopyToState cstate,
+                     TupleTableSlot *slot,
+                     bool is_csv)
+{
+    bool        need_delim = false;
+    FmgrInfo   *out_functions = cstate->out_functions;
+
+    foreach_int(attnum, cstate->attnumlist)
+    {
+        Datum        value = slot->tts_values[attnum - 1];
+        bool        isnull = slot->tts_isnull[attnum - 1];
+
+        if (need_delim)
+            CopySendChar(cstate, cstate->opts.delim[0]);
+        need_delim = true;
+
+        if (isnull)
+        {
+            CopySendString(cstate, cstate->opts.null_print_client);
+        }
+        else
+        {
+            char       *string;
+
+            string = OutputFunctionCall(&out_functions[attnum - 1],
+                                        value);
+
+            /*
+             * is_csv will be optimized away by compiler, as argument is
+             * constant at caller.
+             */
+            if (is_csv)
+                CopyAttributeOutCSV(cstate, string,
+                                    cstate->opts.force_quote_flags[attnum - 1]);
+            else
+                CopyAttributeOutText(cstate, string);
+        }
+    }
+
+    CopySendTextLikeEndOfRow(cstate);
+}
+
+/* Implementation of the end callback for text and CSV formats */
+static void
+CopyToTextLikeEnd(CopyToState cstate)
+{
+    /* Nothing to do here */
+}
+
+/*
+ * Implementation of the start callback for binary format. Send a header
+ * for a binary copy.
+ */
+static void
+CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc)
+{
+    int32        tmp;
+
+    /* Signature */
+    CopySendData(cstate, BinarySignature, 11);
+    /* Flags field */
+    tmp = 0;
+    CopySendInt32(cstate, tmp);
+    /* No header extension */
+    tmp = 0;
+    CopySendInt32(cstate, tmp);
+}
+
+/*
+ * Implementation of the outfunc callback for binary format. Assign
+ * the binary output function to the given *finfo.
+ */
+static void
+CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo)
+{
+    Oid            func_oid;
+    bool        is_varlena;
+
+    /* Set output function for an attribute */
+    getTypeBinaryOutputInfo(atttypid, &func_oid, &is_varlena);
+    fmgr_info(func_oid, finfo);
+}
+
+/* Implementation of the per-row callback for binary format */
+static void
+CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+    FmgrInfo   *out_functions = cstate->out_functions;
+
+    /* Binary per-tuple header */
+    CopySendInt16(cstate, list_length(cstate->attnumlist));
+
+    foreach_int(attnum, cstate->attnumlist)
+    {
+        Datum        value = slot->tts_values[attnum - 1];
+        bool        isnull = slot->tts_isnull[attnum - 1];
+
+        if (isnull)
+        {
+            CopySendInt32(cstate, -1);
+        }
+        else
+        {
+            bytea       *outputbytes;
+
+            outputbytes = SendFunctionCall(&out_functions[attnum - 1],
+                                           value);
+            CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
+            CopySendData(cstate, VARDATA(outputbytes),
+                         VARSIZE(outputbytes) - VARHDRSZ);
+        }
+    }
+
+    CopySendEndOfRow(cstate);
+}
+
+/* Implementation of the end callback for binary format */
+static void
+CopyToBinaryEnd(CopyToState cstate)
+{
+    /* Generate trailer for a binary copy */
+    CopySendInt16(cstate, -1);
+    /* Need to flush out the trailer */
+    CopySendEndOfRow(cstate);
+}
 
 /*
  * Send copy start/stop messages for frontend copies.  These have changed
@@ -191,16 +452,6 @@ CopySendEndOfRow(CopyToState cstate)
     switch (cstate->copy_dest)
     {
         case COPY_FILE:
-            if (!cstate->opts.binary)
-            {
-                /* Default line termination depends on platform */
-#ifndef WIN32
-                CopySendChar(cstate, '\n');
-#else
-                CopySendString(cstate, "\r\n");
-#endif
-            }
-
             if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1,
                        cstate->copy_file) != 1 ||
                 ferror(cstate->copy_file))
@@ -235,10 +486,6 @@ CopySendEndOfRow(CopyToState cstate)
             }
             break;
         case COPY_FRONTEND:
-            /* The FE/BE protocol uses \n as newline for all platforms */
-            if (!cstate->opts.binary)
-                CopySendChar(cstate, '\n');
-
             /* Dump the accumulated row as one CopyData message */
             (void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len);
             break;
@@ -254,6 +501,35 @@ CopySendEndOfRow(CopyToState cstate)
     resetStringInfo(fe_msgbuf);
 }
 
+/*
+ * Wrapper function of CopySendEndOfRow for text and CSV formats. Sends the
+ * the line termination and do common appropriate things for the end of row.
+ */
+static inline void
+CopySendTextLikeEndOfRow(CopyToState cstate)
+{
+    switch (cstate->copy_dest)
+    {
+        case COPY_FILE:
+            /* Default line termination depends on platform */
+#ifndef WIN32
+            CopySendChar(cstate, '\n');
+#else
+            CopySendString(cstate, "\r\n");
+#endif
+            break;
+        case COPY_FRONTEND:
+            /* The FE/BE protocol uses \n as newline for all platforms */
+            CopySendChar(cstate, '\n');
+            break;
+        default:
+            break;
+    }
+
+    /* Now take the actions related to the end of a row */
+    CopySendEndOfRow(cstate);
+}
+
 /*
  * These functions do apply some data conversion
  */
@@ -426,6 +702,9 @@ BeginCopyTo(ParseState *pstate,
     /* Extract options from the statement node tree */
     ProcessCopyOptions(pstate, &cstate->opts, false /* is_from */ , options);
 
+    /* Set format routine */
+    cstate->routine = CopyToGetRoutine(cstate->opts);
+
     /* Process the source/target relation or query */
     if (rel)
     {
@@ -771,19 +1050,10 @@ DoCopyTo(CopyToState cstate)
     foreach(cur, cstate->attnumlist)
     {
         int            attnum = lfirst_int(cur);
-        Oid            out_func_oid;
-        bool        isvarlena;
         Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
 
-        if (cstate->opts.binary)
-            getTypeBinaryOutputInfo(attr->atttypid,
-                                    &out_func_oid,
-                                    &isvarlena);
-        else
-            getTypeOutputInfo(attr->atttypid,
-                              &out_func_oid,
-                              &isvarlena);
-        fmgr_info(out_func_oid, &cstate->out_functions[attnum - 1]);
+        cstate->routine->CopyToOutFunc(cstate, attr->atttypid,
+                                       &cstate->out_functions[attnum - 1]);
     }
 
     /*
@@ -796,56 +1066,7 @@ DoCopyTo(CopyToState cstate)
                                                "COPY TO",
                                                ALLOCSET_DEFAULT_SIZES);
 
-    if (cstate->opts.binary)
-    {
-        /* Generate header for a binary copy */
-        int32        tmp;
-
-        /* Signature */
-        CopySendData(cstate, BinarySignature, 11);
-        /* Flags field */
-        tmp = 0;
-        CopySendInt32(cstate, tmp);
-        /* No header extension */
-        tmp = 0;
-        CopySendInt32(cstate, tmp);
-    }
-    else
-    {
-        /*
-         * For non-binary copy, we need to convert null_print to file
-         * encoding, because it will be sent directly with CopySendString.
-         */
-        if (cstate->need_transcoding)
-            cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print,
-                                                              cstate->opts.null_print_len,
-                                                              cstate->file_encoding);
-
-        /* if a header has been requested send the line */
-        if (cstate->opts.header_line)
-        {
-            bool        hdr_delim = false;
-
-            foreach(cur, cstate->attnumlist)
-            {
-                int            attnum = lfirst_int(cur);
-                char       *colname;
-
-                if (hdr_delim)
-                    CopySendChar(cstate, cstate->opts.delim[0]);
-                hdr_delim = true;
-
-                colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
-
-                if (cstate->opts.csv_mode)
-                    CopyAttributeOutCSV(cstate, colname, false);
-                else
-                    CopyAttributeOutText(cstate, colname);
-            }
-
-            CopySendEndOfRow(cstate);
-        }
-    }
+    cstate->routine->CopyToStart(cstate, tupDesc);
 
     if (cstate->rel)
     {
@@ -884,13 +1105,7 @@ DoCopyTo(CopyToState cstate)
         processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
     }
 
-    if (cstate->opts.binary)
-    {
-        /* Generate trailer for a binary copy */
-        CopySendInt16(cstate, -1);
-        /* Need to flush out the trailer */
-        CopySendEndOfRow(cstate);
-    }
+    cstate->routine->CopyToEnd(cstate);
 
     MemoryContextDelete(cstate->rowcontext);
 
@@ -903,74 +1118,18 @@ DoCopyTo(CopyToState cstate)
 /*
  * Emit one row during DoCopyTo().
  */
-static void
+static inline void
 CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 {
-    FmgrInfo   *out_functions = cstate->out_functions;
     MemoryContext oldcontext;
 
     MemoryContextReset(cstate->rowcontext);
     oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
 
-    if (cstate->opts.binary)
-    {
-        /* Binary per-tuple header */
-        CopySendInt16(cstate, list_length(cstate->attnumlist));
-    }
-
     /* Make sure the tuple is fully deconstructed */
     slot_getallattrs(slot);
 
-    if (!cstate->opts.binary)
-    {
-        bool        need_delim = false;
-
-        foreach_int(attnum, cstate->attnumlist)
-        {
-            Datum        value = slot->tts_values[attnum - 1];
-            bool        isnull = slot->tts_isnull[attnum - 1];
-            char       *string;
-
-            if (need_delim)
-                CopySendChar(cstate, cstate->opts.delim[0]);
-            need_delim = true;
-
-            if (isnull)
-                CopySendString(cstate, cstate->opts.null_print_client);
-            else
-            {
-                string = OutputFunctionCall(&out_functions[attnum - 1],
-                                            value);
-                if (cstate->opts.csv_mode)
-                    CopyAttributeOutCSV(cstate, string,
-                                        cstate->opts.force_quote_flags[attnum - 1]);
-                else
-                    CopyAttributeOutText(cstate, string);
-            }
-        }
-    }
-    else
-    {
-        foreach_int(attnum, cstate->attnumlist)
-        {
-            Datum        value = slot->tts_values[attnum - 1];
-            bool        isnull = slot->tts_isnull[attnum - 1];
-            bytea       *outputbytes;
-
-            if (isnull)
-                CopySendInt32(cstate, -1);
-            else
-            {
-                outputbytes = SendFunctionCall(&out_functions[attnum - 1],
-                                               value);
-                CopySendInt32(cstate, VARSIZE(outputbytes) - VARHDRSZ);
-                CopySendData(cstate, VARDATA(outputbytes),
-                             VARSIZE(outputbytes) - VARHDRSZ);
-            }
-        }
-    }
-
-    CopySendEndOfRow(cstate);
+    cstate->routine->CopyToOneRow(cstate, slot);
 
     MemoryContextSwitchTo(oldcontext);
 }
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
new file mode 100644
index 00000000000..de5dae9cc38
--- /dev/null
+++ b/src/include/commands/copyapi.h
@@ -0,0 +1,55 @@
+/*-------------------------------------------------------------------------
+ *
+ * copyapi.h
+ *      API for COPY TO handlers
+ *
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/commands/copyapi.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef COPYAPI_H
+#define COPYAPI_H
+
+#include "commands/copy.h"
+
+/*
+ * API structure for a COPY TO format implementation. Note this must be
+ * allocated in a server-lifetime manner, typically as a static const struct.
+ */
+typedef struct CopyToRoutine
+{
+    /*
+     * Set output function information. This callback is called once at the
+     * beginning of COPY TO.
+     *
+     * 'finfo' can be optionally filled to provide the catalog information of
+     * the output function.
+     *
+     * 'atttypid' is the OID of data type used by the relation's attribute.
+     */
+    void        (*CopyToOutFunc) (CopyToState cstate, Oid atttypid,
+                                  FmgrInfo *finfo);
+
+    /*
+     * Start a COPY TO. This callback is called once at the beginning of COPY
+     * FROM.
+     *
+     * 'tupDesc' is the tuple descriptor of the relation from where the data
+     * is read.
+     */
+    void        (*CopyToStart) (CopyToState cstate, TupleDesc tupDesc);
+
+    /*
+     * Write one row to the 'slot'.
+     */
+    void        (*CopyToOneRow) (CopyToState cstate, TupleTableSlot *slot);
+
+    /* End a COPY TO. This callback is called once at the end of COPY FROM */
+    void        (*CopyToEnd) (CopyToState cstate);
+} CopyToRoutine;
+
+#endif                            /* COPYAPI_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 9a3bee93dec..6b2f22d8555 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -508,6 +508,7 @@ CopyMultiInsertInfo
 CopyOnErrorChoice
 CopySource
 CopyStmt
+CopyToRoutine
 CopyToState
 CopyToStateData
 Cost
-- 
2.47.1
From 2555f313a45ae162e71c89322dcf8321d9408d98 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 18 Nov 2024 16:32:43 -0800
Subject: [PATCH v32 2/9] Refactor COPY FROM to use format callback functions.
This commit introduces a new CopyFromRoutine struct, which is a set of
callback routines to read tuples in a specific format. It also makes
COPY FROM with the existing formats (text, CSV, and binary) utilize
these format callbacks.
This change is a preliminary step towards making the COPY TO command
extensible in terms of output formats.
Similar to XXXX, this refactoring contributes to a performance
improvement by reducing the number of "if" branches that need to be
checked on a per-row basis when sending field representations in text
or CSV mode. The performance benchmark results showed ~5% performance
gain in text or CSV mode.
Author: Sutou Kouhei
Reviewed-by: Michael Paquier, Tomas Vondra, Masahiko Sawada
Reviewed-by: Junwang Zhao
Discussion: https://postgr.es/m/20231204.153548.2126325458835528809.kou@clear-code.com
---
 contrib/file_fdw/file_fdw.c              |   1 -
 src/backend/commands/copyfrom.c          | 192 +++++++--
 src/backend/commands/copyfromparse.c     | 505 +++++++++++++----------
 src/include/commands/copy.h              |   2 -
 src/include/commands/copyapi.h           |  48 ++-
 src/include/commands/copyfrom_internal.h |  11 +
 src/tools/pgindent/typedefs.list         |   1 +
 7 files changed, 493 insertions(+), 267 deletions(-)
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 678e754b2b9..323c43dca4a 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -21,7 +21,6 @@
 #include "access/table.h"
 #include "catalog/pg_authid.h"
 #include "catalog/pg_foreign_table.h"
-#include "commands/copy.h"
 #include "commands/copyfrom_internal.h"
 #include "commands/defrem.h"
 #include "commands/explain.h"
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 0cbd05f5602..8b09df0581f 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -28,7 +28,7 @@
 #include "access/tableam.h"
 #include "access/xact.h"
 #include "catalog/namespace.h"
-#include "commands/copy.h"
+#include "commands/copyapi.h"
 #include "commands/copyfrom_internal.h"
 #include "commands/progress.h"
 #include "commands/trigger.h"
@@ -106,6 +106,145 @@ typedef struct CopyMultiInsertInfo
 /* non-export function prototypes */
 static void ClosePipeFromProgram(CopyFromState cstate);
 
+/*
+ * Built-in format-specific routines. One-row callbacks are defined in
+ * copyfromparse.c
+ */
+static void CopyFromTextLikeInFunc(CopyFromState cstate, Oid atttypid, FmgrInfo *finfo,
+                                   Oid *typioparam);
+static void CopyFromTextLikeStart(CopyFromState cstate, TupleDesc tupDesc);
+static void CopyFromTextLikeEnd(CopyFromState cstate);
+static void CopyFromBinaryInFunc(CopyFromState cstate, Oid atttypid,
+                                 FmgrInfo *finfo, Oid *typioparam);
+static void CopyFromBinaryStart(CopyFromState cstate, TupleDesc tupDesc);
+static void CopyFromBinaryEnd(CopyFromState cstate);
+
+
+/*
+ * COPY FROM routines for built-in formats.
+ *
+ * CSV and text formats share the same TextLike routines except for the
+ * one-row callback.
+ */
+
+/* text format */
+static const CopyFromRoutine CopyFromRoutineText = {
+    .CopyFromInFunc = CopyFromTextLikeInFunc,
+    .CopyFromStart = CopyFromTextLikeStart,
+    .CopyFromOneRow = CopyFromTextOneRow,
+    .CopyFromEnd = CopyFromTextLikeEnd,
+};
+
+/* CSV format */
+static const CopyFromRoutine CopyFromRoutineCSV = {
+    .CopyFromInFunc = CopyFromTextLikeInFunc,
+    .CopyFromStart = CopyFromTextLikeStart,
+    .CopyFromOneRow = CopyFromCSVOneRow,
+    .CopyFromEnd = CopyFromTextLikeEnd,
+};
+
+/* binary format */
+static const CopyFromRoutine CopyFromRoutineBinary = {
+    .CopyFromInFunc = CopyFromBinaryInFunc,
+    .CopyFromStart = CopyFromBinaryStart,
+    .CopyFromOneRow = CopyFromBinaryOneRow,
+    .CopyFromEnd = CopyFromBinaryEnd,
+};
+
+/* Return a COPY FROM routine for the given options */
+static const CopyFromRoutine *
+CopyFromGetRoutine(CopyFormatOptions opts)
+{
+    if (opts.csv_mode)
+        return &CopyFromRoutineCSV;
+    else if (opts.binary)
+        return &CopyFromRoutineBinary;
+
+    /* default is text */
+    return &CopyFromRoutineText;
+}
+
+/* Implementation of the start callback for text and CSV formats */
+static void
+CopyFromTextLikeStart(CopyFromState cstate, TupleDesc tupDesc)
+{
+    AttrNumber    attr_count;
+
+    /*
+     * If encoding conversion is needed, we need another buffer to hold the
+     * converted input data.  Otherwise, we can just point input_buf to the
+     * same buffer as raw_buf.
+     */
+    if (cstate->need_transcoding)
+    {
+        cstate->input_buf = (char *) palloc(INPUT_BUF_SIZE + 1);
+        cstate->input_buf_index = cstate->input_buf_len = 0;
+    }
+    else
+        cstate->input_buf = cstate->raw_buf;
+    cstate->input_reached_eof = false;
+
+    initStringInfo(&cstate->line_buf);
+
+    /*
+     * Create workspace for CopyReadAttributes results; used by CSV and text
+     * format.
+     */
+    attr_count = list_length(cstate->attnumlist);
+    cstate->max_fields = attr_count;
+    cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *));
+}
+
+/*
+ * Implementation of the infunc callback for text and CSV formats. Assign
+ * the input function data to the given *finfo.
+ */
+static void
+CopyFromTextLikeInFunc(CopyFromState cstate, Oid atttypid, FmgrInfo *finfo,
+                       Oid *typioparam)
+{
+    Oid            func_oid;
+
+    getTypeInputInfo(atttypid, &func_oid, typioparam);
+    fmgr_info(func_oid, finfo);
+}
+
+/* Implementation of the end callback for text and CSV formats */
+static void
+CopyFromTextLikeEnd(CopyFromState cstate)
+{
+    /* nothing to do */
+}
+
+/* Implementation of the start callback for binary format */
+static void
+CopyFromBinaryStart(CopyFromState cstate, TupleDesc tupDesc)
+{
+    /* Read and verify binary header */
+    ReceiveCopyBinaryHeader(cstate);
+}
+
+/*
+ * Implementation of the infunc callback for binary format. Assign
+ * the binary input function to the given *finfo.
+ */
+static void
+CopyFromBinaryInFunc(CopyFromState cstate, Oid atttypid,
+                     FmgrInfo *finfo, Oid *typioparam)
+{
+    Oid            func_oid;
+
+    getTypeBinaryInputInfo(atttypid, &func_oid, typioparam);
+    fmgr_info(func_oid, finfo);
+}
+
+/* Implementation of the end callback for binary format */
+static void
+CopyFromBinaryEnd(CopyFromState cstate)
+{
+    /* nothing to do */
+}
+
 /*
  * error context callback for COPY FROM
  *
@@ -1396,7 +1535,6 @@ BeginCopyFrom(ParseState *pstate,
                 num_defaults;
     FmgrInfo   *in_functions;
     Oid           *typioparams;
-    Oid            in_func_oid;
     int           *defmap;
     ExprState **defexprs;
     MemoryContext oldcontext;
@@ -1428,6 +1566,9 @@ BeginCopyFrom(ParseState *pstate,
     /* Extract options from the statement node tree */
     ProcessCopyOptions(pstate, &cstate->opts, true /* is_from */ , options);
 
+    /* Set the format routine */
+    cstate->routine = CopyFromGetRoutine(cstate->opts);
+
     /* Process the target relation */
     cstate->rel = rel;
 
@@ -1583,25 +1724,6 @@ BeginCopyFrom(ParseState *pstate,
     cstate->raw_buf_index = cstate->raw_buf_len = 0;
     cstate->raw_reached_eof = false;
 
-    if (!cstate->opts.binary)
-    {
-        /*
-         * If encoding conversion is needed, we need another buffer to hold
-         * the converted input data.  Otherwise, we can just point input_buf
-         * to the same buffer as raw_buf.
-         */
-        if (cstate->need_transcoding)
-        {
-            cstate->input_buf = (char *) palloc(INPUT_BUF_SIZE + 1);
-            cstate->input_buf_index = cstate->input_buf_len = 0;
-        }
-        else
-            cstate->input_buf = cstate->raw_buf;
-        cstate->input_reached_eof = false;
-
-        initStringInfo(&cstate->line_buf);
-    }
-
     initStringInfo(&cstate->attribute_buf);
 
     /* Assign range table and rteperminfos, we'll need them in CopyFrom. */
@@ -1634,13 +1756,9 @@ BeginCopyFrom(ParseState *pstate,
             continue;
 
         /* Fetch the input function and typioparam info */
-        if (cstate->opts.binary)
-            getTypeBinaryInputInfo(att->atttypid,
-                                   &in_func_oid, &typioparams[attnum - 1]);
-        else
-            getTypeInputInfo(att->atttypid,
-                             &in_func_oid, &typioparams[attnum - 1]);
-        fmgr_info(in_func_oid, &in_functions[attnum - 1]);
+        cstate->routine->CopyFromInFunc(cstate, att->atttypid,
+                                        &in_functions[attnum - 1],
+                                        &typioparams[attnum - 1]);
 
         /* Get default info if available */
         defexprs[attnum - 1] = NULL;
@@ -1775,20 +1893,7 @@ BeginCopyFrom(ParseState *pstate,
 
     pgstat_progress_update_multi_param(3, progress_cols, progress_vals);
 
-    if (cstate->opts.binary)
-    {
-        /* Read and verify binary header */
-        ReceiveCopyBinaryHeader(cstate);
-    }
-
-    /* create workspace for CopyReadAttributes results */
-    if (!cstate->opts.binary)
-    {
-        AttrNumber    attr_count = list_length(cstate->attnumlist);
-
-        cstate->max_fields = attr_count;
-        cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *));
-    }
+    cstate->routine->CopyFromStart(cstate, tupDesc);
 
     MemoryContextSwitchTo(oldcontext);
 
@@ -1801,6 +1906,9 @@ BeginCopyFrom(ParseState *pstate,
 void
 EndCopyFrom(CopyFromState cstate)
 {
+    /* Invoke the end callback */
+    cstate->routine->CopyFromEnd(cstate);
+
     /* No COPY FROM related resources except memory. */
     if (cstate->is_program)
     {
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index caccdc8563c..c1872acbbf6 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -62,7 +62,7 @@
 #include <unistd.h>
 #include <sys/stat.h>
 
-#include "commands/copy.h"
+#include "commands/copyapi.h"
 #include "commands/copyfrom_internal.h"
 #include "commands/progress.h"
 #include "executor/executor.h"
@@ -140,8 +140,8 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 
 
 /* non-export function prototypes */
-static bool CopyReadLine(CopyFromState cstate);
-static bool CopyReadLineText(CopyFromState cstate);
+static bool CopyReadLine(CopyFromState cstate, bool is_csv);
+static bool CopyReadLineText(CopyFromState cstate, bool is_csv);
 static int    CopyReadAttributesText(CopyFromState cstate);
 static int    CopyReadAttributesCSV(CopyFromState cstate);
 static Datum CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
@@ -740,9 +740,11 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes)
  * in the relation.
  *
  * NOTE: force_not_null option are not applied to the returned fields.
+ *
+ * We use pg_attribute_always_inline to reduce function call overheads.
  */
-bool
-NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
+static pg_attribute_always_inline bool
+NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields, bool is_csv)
 {
     int            fldct;
     bool        done;
@@ -759,13 +761,17 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
         tupDesc = RelationGetDescr(cstate->rel);
 
         cstate->cur_lineno++;
-        done = CopyReadLine(cstate);
+        done = CopyReadLine(cstate, is_csv);
 
         if (cstate->opts.header_line == COPY_HEADER_MATCH)
         {
             int            fldnum;
 
-            if (cstate->opts.csv_mode)
+            /*
+             * is_csv will be optimized away by compiler, as argument is
+             * constant at caller.
+             */
+            if (is_csv)
                 fldct = CopyReadAttributesCSV(cstate);
             else
                 fldct = CopyReadAttributesText(cstate);
@@ -809,7 +815,7 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
     cstate->cur_lineno++;
 
     /* Actually read the line into memory here */
-    done = CopyReadLine(cstate);
+    done = CopyReadLine(cstate, is_csv);
 
     /*
      * EOF at start of line means we're done.  If we see EOF after some
@@ -819,8 +825,13 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
     if (done && cstate->line_buf.len == 0)
         return false;
 
-    /* Parse the line into de-escaped field values */
-    if (cstate->opts.csv_mode)
+    /*
+     * Parse the line into de-escaped field values
+     *
+     * is_csv will be optimized away by compiler, as argument is constant at
+     * caller.
+     */
+    if (is_csv)
         fldct = CopyReadAttributesCSV(cstate);
     else
         fldct = CopyReadAttributesText(cstate);
@@ -830,6 +841,244 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
     return true;
 }
 
+/*
+ * Workhorse for CopyFromTextOneRow() and CopyFromCSVOneRow().
+ *
+ * We use pg_attribute_always_inline to reduce function call overheads.
+ */
+static pg_attribute_always_inline bool
+CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
+                       Datum *values, bool *nulls, bool is_csv)
+{
+    TupleDesc    tupDesc;
+    AttrNumber    attr_count;
+    FmgrInfo   *in_functions = cstate->in_functions;
+    Oid           *typioparams = cstate->typioparams;
+    ExprState **defexprs = cstate->defexprs;
+    char      **field_strings;
+    ListCell   *cur;
+    int            fldct;
+    int            fieldno;
+    char       *string;
+
+    tupDesc = RelationGetDescr(cstate->rel);
+    attr_count = list_length(cstate->attnumlist);
+
+    /* read raw fields in the next line */
+    if (!NextCopyFromRawFields(cstate, &field_strings, &fldct, is_csv))
+        return false;
+
+    /* check for overflowing fields */
+    if (attr_count > 0 && fldct > attr_count)
+        ereport(ERROR,
+                (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                 errmsg("extra data after last expected column")));
+
+    fieldno = 0;
+
+    /* Loop to read the user attributes on the line. */
+    foreach(cur, cstate->attnumlist)
+    {
+        int            attnum = lfirst_int(cur);
+        int            m = attnum - 1;
+        Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+        if (fieldno >= fldct)
+            ereport(ERROR,
+                    (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                     errmsg("missing data for column \"%s\"",
+                            NameStr(att->attname))));
+        string = field_strings[fieldno++];
+
+        if (cstate->convert_select_flags &&
+            !cstate->convert_select_flags[m])
+        {
+            /* ignore input field, leaving column as NULL */
+            continue;
+        }
+
+        if (is_csv)
+        {
+            if (string == NULL &&
+                cstate->opts.force_notnull_flags[m])
+            {
+                /*
+                 * FORCE_NOT_NULL option is set and column is NULL - convert
+                 * it to the NULL string.
+                 */
+                string = cstate->opts.null_print;
+            }
+            else if (string != NULL && cstate->opts.force_null_flags[m]
+                     && strcmp(string, cstate->opts.null_print) == 0)
+            {
+                /*
+                 * FORCE_NULL option is set and column matches the NULL
+                 * string. It must have been quoted, or otherwise the string
+                 * would already have been set to NULL. Convert it to NULL as
+                 * specified.
+                 */
+                string = NULL;
+            }
+        }
+
+        cstate->cur_attname = NameStr(att->attname);
+        cstate->cur_attval = string;
+
+        if (string != NULL)
+            nulls[m] = false;
+
+        if (cstate->defaults[m])
+        {
+            /*
+             * The caller must supply econtext and have switched into the
+             * per-tuple memory context in it.
+             */
+            Assert(econtext != NULL);
+            Assert(CurrentMemoryContext == econtext->ecxt_per_tuple_memory);
+
+            values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]);
+        }
+
+        /*
+         * If ON_ERROR is specified with IGNORE, skip rows with soft errors
+         */
+        else if (!InputFunctionCallSafe(&in_functions[m],
+                                        string,
+                                        typioparams[m],
+                                        att->atttypmod,
+                                        (Node *) cstate->escontext,
+                                        &values[m]))
+        {
+            Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
+
+            cstate->num_errors++;
+
+            if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+            {
+                /*
+                 * Since we emit line number and column info in the below
+                 * notice message, we suppress error context information other
+                 * than the relation name.
+                 */
+                Assert(!cstate->relname_only);
+                cstate->relname_only = true;
+
+                if (cstate->cur_attval)
+                {
+                    char       *attval;
+
+                    attval = CopyLimitPrintoutLength(cstate->cur_attval);
+                    ereport(NOTICE,
+                            errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\":
\"%s\"",
+                                   (unsigned long long) cstate->cur_lineno,
+                                   cstate->cur_attname,
+                                   attval));
+                    pfree(attval);
+                }
+                else
+                    ereport(NOTICE,
+                            errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": null
input",
+                                   (unsigned long long) cstate->cur_lineno,
+                                   cstate->cur_attname));
+
+                /* reset relname_only */
+                cstate->relname_only = false;
+            }
+
+            return true;
+        }
+
+        cstate->cur_attname = NULL;
+        cstate->cur_attval = NULL;
+    }
+
+    Assert(fieldno == attr_count);
+
+    return true;
+}
+
+/* Implementation of the per-row callback for text format */
+bool
+CopyFromTextOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values,
+                   bool *nulls)
+{
+    return CopyFromTextLikeOneRow(cstate, econtext, values, nulls, false);
+}
+
+/* Implementation of the per-row callback for CSV format */
+bool
+CopyFromCSVOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values,
+                  bool *nulls)
+{
+    return CopyFromTextLikeOneRow(cstate, econtext, values, nulls, true);
+}
+
+/* Implementation of the per-row callback for binary format */
+bool
+CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values,
+                     bool *nulls)
+{
+    TupleDesc    tupDesc;
+    AttrNumber    attr_count;
+    FmgrInfo   *in_functions = cstate->in_functions;
+    Oid           *typioparams = cstate->typioparams;
+    int16        fld_count;
+    ListCell   *cur;
+
+    tupDesc = RelationGetDescr(cstate->rel);
+    attr_count = list_length(cstate->attnumlist);
+
+    cstate->cur_lineno++;
+
+    if (!CopyGetInt16(cstate, &fld_count))
+    {
+        /* EOF detected (end of file, or protocol-level EOF) */
+        return false;
+    }
+
+    if (fld_count == -1)
+    {
+        /*
+         * Received EOF marker.  Wait for the protocol-level EOF, and complain
+         * if it doesn't come immediately.  In COPY FROM STDIN, this ensures
+         * that we correctly handle CopyFail, if client chooses to send that
+         * now.  When copying from file, we could ignore the rest of the file
+         * like in text mode, but we choose to be consistent with the COPY
+         * FROM STDIN case.
+         */
+        char        dummy;
+
+        if (CopyReadBinaryData(cstate, &dummy, 1) > 0)
+            ereport(ERROR,
+                    (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                     errmsg("received copy data after EOF marker")));
+        return false;
+    }
+
+    if (fld_count != attr_count)
+        ereport(ERROR,
+                (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                 errmsg("row field count is %d, expected %d",
+                        (int) fld_count, attr_count)));
+
+    foreach(cur, cstate->attnumlist)
+    {
+        int            attnum = lfirst_int(cur);
+        int            m = attnum - 1;
+        Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+        cstate->cur_attname = NameStr(att->attname);
+        values[m] = CopyReadBinaryAttribute(cstate,
+                                            &in_functions[m],
+                                            typioparams[m],
+                                            att->atttypmod,
+                                            &nulls[m]);
+        cstate->cur_attname = NULL;
+    }
+
+    return true;
+}
+
 /*
  * Read next tuple from file for COPY FROM. Return false if no more tuples.
  *
@@ -847,216 +1096,22 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 {
     TupleDesc    tupDesc;
     AttrNumber    num_phys_attrs,
-                attr_count,
                 num_defaults = cstate->num_defaults;
-    FmgrInfo   *in_functions = cstate->in_functions;
-    Oid           *typioparams = cstate->typioparams;
     int            i;
     int           *defmap = cstate->defmap;
     ExprState **defexprs = cstate->defexprs;
 
     tupDesc = RelationGetDescr(cstate->rel);
     num_phys_attrs = tupDesc->natts;
-    attr_count = list_length(cstate->attnumlist);
 
     /* Initialize all values for row to NULL */
     MemSet(values, 0, num_phys_attrs * sizeof(Datum));
     MemSet(nulls, true, num_phys_attrs * sizeof(bool));
     MemSet(cstate->defaults, false, num_phys_attrs * sizeof(bool));
 
-    if (!cstate->opts.binary)
-    {
-        char      **field_strings;
-        ListCell   *cur;
-        int            fldct;
-        int            fieldno;
-        char       *string;
-
-        /* read raw fields in the next line */
-        if (!NextCopyFromRawFields(cstate, &field_strings, &fldct))
-            return false;
-
-        /* check for overflowing fields */
-        if (attr_count > 0 && fldct > attr_count)
-            ereport(ERROR,
-                    (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                     errmsg("extra data after last expected column")));
-
-        fieldno = 0;
-
-        /* Loop to read the user attributes on the line. */
-        foreach(cur, cstate->attnumlist)
-        {
-            int            attnum = lfirst_int(cur);
-            int            m = attnum - 1;
-            Form_pg_attribute att = TupleDescAttr(tupDesc, m);
-
-            if (fieldno >= fldct)
-                ereport(ERROR,
-                        (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                         errmsg("missing data for column \"%s\"",
-                                NameStr(att->attname))));
-            string = field_strings[fieldno++];
-
-            if (cstate->convert_select_flags &&
-                !cstate->convert_select_flags[m])
-            {
-                /* ignore input field, leaving column as NULL */
-                continue;
-            }
-
-            if (cstate->opts.csv_mode)
-            {
-                if (string == NULL &&
-                    cstate->opts.force_notnull_flags[m])
-                {
-                    /*
-                     * FORCE_NOT_NULL option is set and column is NULL -
-                     * convert it to the NULL string.
-                     */
-                    string = cstate->opts.null_print;
-                }
-                else if (string != NULL && cstate->opts.force_null_flags[m]
-                         && strcmp(string, cstate->opts.null_print) == 0)
-                {
-                    /*
-                     * FORCE_NULL option is set and column matches the NULL
-                     * string. It must have been quoted, or otherwise the
-                     * string would already have been set to NULL. Convert it
-                     * to NULL as specified.
-                     */
-                    string = NULL;
-                }
-            }
-
-            cstate->cur_attname = NameStr(att->attname);
-            cstate->cur_attval = string;
-
-            if (string != NULL)
-                nulls[m] = false;
-
-            if (cstate->defaults[m])
-            {
-                /*
-                 * The caller must supply econtext and have switched into the
-                 * per-tuple memory context in it.
-                 */
-                Assert(econtext != NULL);
-                Assert(CurrentMemoryContext == econtext->ecxt_per_tuple_memory);
-
-                values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]);
-            }
-
-            /*
-             * If ON_ERROR is specified with IGNORE, skip rows with soft
-             * errors
-             */
-            else if (!InputFunctionCallSafe(&in_functions[m],
-                                            string,
-                                            typioparams[m],
-                                            att->atttypmod,
-                                            (Node *) cstate->escontext,
-                                            &values[m]))
-            {
-                Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
-
-                cstate->num_errors++;
-
-                if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
-                {
-                    /*
-                     * Since we emit line number and column info in the below
-                     * notice message, we suppress error context information
-                     * other than the relation name.
-                     */
-                    Assert(!cstate->relname_only);
-                    cstate->relname_only = true;
-
-                    if (cstate->cur_attval)
-                    {
-                        char       *attval;
-
-                        attval = CopyLimitPrintoutLength(cstate->cur_attval);
-                        ereport(NOTICE,
-                                errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\":
\"%s\"",
-                                       (unsigned long long) cstate->cur_lineno,
-                                       cstate->cur_attname,
-                                       attval));
-                        pfree(attval);
-                    }
-                    else
-                        ereport(NOTICE,
-                                errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\":
nullinput",
 
-                                       (unsigned long long) cstate->cur_lineno,
-                                       cstate->cur_attname));
-
-                    /* reset relname_only */
-                    cstate->relname_only = false;
-                }
-
-                return true;
-            }
-
-            cstate->cur_attname = NULL;
-            cstate->cur_attval = NULL;
-        }
-
-        Assert(fieldno == attr_count);
-    }
-    else
-    {
-        /* binary */
-        int16        fld_count;
-        ListCell   *cur;
-
-        cstate->cur_lineno++;
-
-        if (!CopyGetInt16(cstate, &fld_count))
-        {
-            /* EOF detected (end of file, or protocol-level EOF) */
-            return false;
-        }
-
-        if (fld_count == -1)
-        {
-            /*
-             * Received EOF marker.  Wait for the protocol-level EOF, and
-             * complain if it doesn't come immediately.  In COPY FROM STDIN,
-             * this ensures that we correctly handle CopyFail, if client
-             * chooses to send that now.  When copying from file, we could
-             * ignore the rest of the file like in text mode, but we choose to
-             * be consistent with the COPY FROM STDIN case.
-             */
-            char        dummy;
-
-            if (CopyReadBinaryData(cstate, &dummy, 1) > 0)
-                ereport(ERROR,
-                        (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                         errmsg("received copy data after EOF marker")));
-            return false;
-        }
-
-        if (fld_count != attr_count)
-            ereport(ERROR,
-                    (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                     errmsg("row field count is %d, expected %d",
-                            (int) fld_count, attr_count)));
-
-        foreach(cur, cstate->attnumlist)
-        {
-            int            attnum = lfirst_int(cur);
-            int            m = attnum - 1;
-            Form_pg_attribute att = TupleDescAttr(tupDesc, m);
-
-            cstate->cur_attname = NameStr(att->attname);
-            values[m] = CopyReadBinaryAttribute(cstate,
-                                                &in_functions[m],
-                                                typioparams[m],
-                                                att->atttypmod,
-                                                &nulls[m]);
-            cstate->cur_attname = NULL;
-        }
-    }
+    /* Get one row from source */
+    if (!cstate->routine->CopyFromOneRow(cstate, econtext, values, nulls))
+        return false;
 
     /*
      * Now compute and insert any defaults available for the columns not
@@ -1087,7 +1142,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
  * in the final value of line_buf.
  */
 static bool
-CopyReadLine(CopyFromState cstate)
+CopyReadLine(CopyFromState cstate, bool is_csv)
 {
     bool        result;
 
@@ -1095,7 +1150,7 @@ CopyReadLine(CopyFromState cstate)
     cstate->line_buf_valid = false;
 
     /* Parse data and transfer into line_buf */
-    result = CopyReadLineText(cstate);
+    result = CopyReadLineText(cstate, is_csv);
 
     if (result)
     {
@@ -1163,7 +1218,7 @@ CopyReadLine(CopyFromState cstate)
  * CopyReadLineText - inner loop of CopyReadLine for text mode
  */
 static bool
-CopyReadLineText(CopyFromState cstate)
+CopyReadLineText(CopyFromState cstate, bool is_csv)
 {
     char       *copy_input_buf;
     int            input_buf_ptr;
@@ -1178,7 +1233,11 @@ CopyReadLineText(CopyFromState cstate)
     char        quotec = '\0';
     char        escapec = '\0';
 
-    if (cstate->opts.csv_mode)
+    /*
+     * is_csv will be optimized away by compiler, as argument is constant at
+     * caller.
+     */
+    if (is_csv)
     {
         quotec = cstate->opts.quote[0];
         escapec = cstate->opts.escape[0];
@@ -1255,7 +1314,11 @@ CopyReadLineText(CopyFromState cstate)
         prev_raw_ptr = input_buf_ptr;
         c = copy_input_buf[input_buf_ptr++];
 
-        if (cstate->opts.csv_mode)
+        /*
+         * is_csv will be optimized away by compiler, as argument is constant
+         * at caller.
+         */
+        if (is_csv)
         {
             /*
              * If character is '\r', we may need to look ahead below.  Force
@@ -1294,7 +1357,7 @@ CopyReadLineText(CopyFromState cstate)
         }
 
         /* Process \r */
-        if (c == '\r' && (!cstate->opts.csv_mode || !in_quote))
+        if (c == '\r' && (!is_csv || !in_quote))
         {
             /* Check for \r\n on first line, _and_ handle \r\n. */
             if (cstate->eol_type == EOL_UNKNOWN ||
@@ -1322,10 +1385,10 @@ CopyReadLineText(CopyFromState cstate)
                     if (cstate->eol_type == EOL_CRNL)
                         ereport(ERROR,
                                 (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                                 !cstate->opts.csv_mode ?
+                                 !is_csv ?
                                  errmsg("literal carriage return found in data") :
                                  errmsg("unquoted carriage return found in data"),
-                                 !cstate->opts.csv_mode ?
+                                 !is_csv ?
                                  errhint("Use \"\\r\" to represent carriage return.") :
                                  errhint("Use quoted CSV field to represent carriage return.")));
 
@@ -1339,10 +1402,10 @@ CopyReadLineText(CopyFromState cstate)
             else if (cstate->eol_type == EOL_NL)
                 ereport(ERROR,
                         (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                         !cstate->opts.csv_mode ?
+                         !is_csv ?
                          errmsg("literal carriage return found in data") :
                          errmsg("unquoted carriage return found in data"),
-                         !cstate->opts.csv_mode ?
+                         !is_csv ?
                          errhint("Use \"\\r\" to represent carriage return.") :
                          errhint("Use quoted CSV field to represent carriage return.")));
             /* If reach here, we have found the line terminator */
@@ -1350,15 +1413,15 @@ CopyReadLineText(CopyFromState cstate)
         }
 
         /* Process \n */
-        if (c == '\n' && (!cstate->opts.csv_mode || !in_quote))
+        if (c == '\n' && (!is_csv || !in_quote))
         {
             if (cstate->eol_type == EOL_CR || cstate->eol_type == EOL_CRNL)
                 ereport(ERROR,
                         (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
-                         !cstate->opts.csv_mode ?
+                         !is_csv ?
                          errmsg("literal newline found in data") :
                          errmsg("unquoted newline found in data"),
-                         !cstate->opts.csv_mode ?
+                         !is_csv ?
                          errhint("Use \"\\n\" to represent newline.") :
                          errhint("Use quoted CSV field to represent newline.")));
             cstate->eol_type = EOL_NL;    /* in case not set yet */
@@ -1370,7 +1433,7 @@ CopyReadLineText(CopyFromState cstate)
          * Process backslash, except in CSV mode where backslash is a normal
          * character.
          */
-        if (c == '\\' && !cstate->opts.csv_mode)
+        if (c == '\\' && !is_csv)
         {
             char        c2;
 
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 06dfdfef721..7bc044e2816 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -107,8 +107,6 @@ extern CopyFromState BeginCopyFrom(ParseState *pstate, Relation rel, Node *where
 extern void EndCopyFrom(CopyFromState cstate);
 extern bool NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
                          Datum *values, bool *nulls);
-extern bool NextCopyFromRawFields(CopyFromState cstate,
-                                  char ***fields, int *nfields);
 extern void CopyFromErrorCallback(void *arg);
 extern char *CopyLimitPrintoutLength(const char *str);
 
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index de5dae9cc38..39e5a096da5 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -1,7 +1,7 @@
 /*-------------------------------------------------------------------------
  *
  * copyapi.h
- *      API for COPY TO handlers
+ *      API for COPY TO/FROM handlers
  *
  *
  * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
@@ -52,4 +52,50 @@ typedef struct CopyToRoutine
     void        (*CopyToEnd) (CopyToState cstate);
 } CopyToRoutine;
 
+/*
+ * API structure for a COPY FROM format implementation.     Note this must be
+ * allocated in a server-lifetime manner, typically as a static const struct.
+ */
+typedef struct CopyFromRoutine
+{
+    /*
+     * Set input function information. This callback is called once at the
+     * beginning of COPY FROM.
+     *
+     * 'finfo' can be optionally filled to provide the catalog information of
+     * the input function.
+     *
+     * 'typioparam' can be optionally filled to define the OID of the type to
+     * pass to the input function.'atttypid' is the OID of data type used by
+     * the relation's attribute.
+     */
+    void        (*CopyFromInFunc) (CopyFromState cstate, Oid atttypid,
+                                   FmgrInfo *finfo, Oid *typioparam);
+
+    /*
+     * Start a COPY FROM. This callback is called once at the beginning of
+     * COPY FROM.
+     *
+     * 'tupDesc' is the tuple descriptor of the relation where the data needs
+     * to be copied.  This can be used for any initialization steps required
+     * by a format.
+     */
+    void        (*CopyFromStart) (CopyFromState cstate, TupleDesc tupDesc);
+
+    /*
+     * Read one row from the source and fill *values and *nulls.
+     *
+     * 'econtext' is used to evaluate default expression for each column that
+     * is either not read from the file or is using the DEFAULT option of COPY
+     * FROM.  It is NULL if no default values are used.
+     *
+     * Returns false if there are no more tuples to read.
+     */
+    bool        (*CopyFromOneRow) (CopyFromState cstate, ExprContext *econtext,
+                                   Datum *values, bool *nulls);
+
+    /* End a COPY FROM. This callback is called once at the end of COPY FROM */
+    void        (*CopyFromEnd) (CopyFromState cstate);
+} CopyFromRoutine;
+
 #endif                            /* COPYAPI_H */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 1d8ac8f62e6..c8b22af22d8 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -58,6 +58,9 @@ typedef enum CopyInsertMethod
  */
 typedef struct CopyFromStateData
 {
+    /* format routine */
+    const struct CopyFromRoutine *routine;
+
     /* low-level state data */
     CopySource    copy_src;        /* type of copy source */
     FILE       *copy_file;        /* used if copy_src == COPY_FILE */
@@ -183,4 +186,12 @@ typedef struct CopyFromStateData
 extern void ReceiveCopyBegin(CopyFromState cstate);
 extern void ReceiveCopyBinaryHeader(CopyFromState cstate);
 
+/* One-row callbacks for built-in formats defined in copyfromparse.c */
+extern bool CopyFromTextOneRow(CopyFromState cstate, ExprContext *econtext,
+                               Datum *values, bool *nulls);
+extern bool CopyFromCSVOneRow(CopyFromState cstate, ExprContext *econtext,
+                              Datum *values, bool *nulls);
+extern bool CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext,
+                                 Datum *values, bool *nulls);
+
 #endif                            /* COPYFROM_INTERNAL_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 6b2f22d8555..a456981ca0f 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -497,6 +497,7 @@ ConvertRowtypeExpr
 CookedConstraint
 CopyDest
 CopyFormatOptions
+CopyFromRoutine
 CopyFromState
 CopyFromStateData
 CopyHeaderChoice
-- 
2.47.1
From 0876e1bfdd0c35a08aece68b059daf21487af6b3 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Mon, 25 Nov 2024 12:19:15 +0900
Subject: [PATCH v32 3/9] Add support for adding custom COPY TO format
This uses the handler approach like tablesample. The approach creates
an internal function that returns an internal struct. In this case,
a COPY TO handler returns a CopyToRoutine.
This also add a test module for custom COPY TO handler.
---
 src/backend/commands/copy.c                   | 79 ++++++++++++++++---
 src/backend/commands/copyto.c                 | 36 +++++++--
 src/backend/nodes/Makefile                    |  1 +
 src/backend/nodes/gen_node_support.pl         |  2 +
 src/backend/utils/adt/pseudotypes.c           |  1 +
 src/include/catalog/pg_proc.dat               |  6 ++
 src/include/catalog/pg_type.dat               |  6 ++
 src/include/commands/copy.h                   |  1 +
 src/include/commands/copyapi.h                |  2 +
 src/include/nodes/meson.build                 |  1 +
 src/test/modules/Makefile                     |  1 +
 src/test/modules/meson.build                  |  1 +
 src/test/modules/test_copy_format/.gitignore  |  4 +
 src/test/modules/test_copy_format/Makefile    | 23 ++++++
 .../expected/test_copy_format.out             | 17 ++++
 src/test/modules/test_copy_format/meson.build | 33 ++++++++
 .../test_copy_format/sql/test_copy_format.sql |  5 ++
 .../test_copy_format--1.0.sql                 |  8 ++
 .../test_copy_format/test_copy_format.c       | 63 +++++++++++++++
 .../test_copy_format/test_copy_format.control |  4 +
 20 files changed, 273 insertions(+), 21 deletions(-)
 mode change 100644 => 100755 src/backend/nodes/gen_node_support.pl
 create mode 100644 src/test/modules/test_copy_format/.gitignore
 create mode 100644 src/test/modules/test_copy_format/Makefile
 create mode 100644 src/test/modules/test_copy_format/expected/test_copy_format.out
 create mode 100644 src/test/modules/test_copy_format/meson.build
 create mode 100644 src/test/modules/test_copy_format/sql/test_copy_format.sql
 create mode 100644 src/test/modules/test_copy_format/test_copy_format--1.0.sql
 create mode 100644 src/test/modules/test_copy_format/test_copy_format.c
 create mode 100644 src/test/modules/test_copy_format/test_copy_format.control
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfca9d9dc29..8d94bc313eb 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -32,6 +32,7 @@
 #include "parser/parse_coerce.h"
 #include "parser/parse_collate.h"
 #include "parser/parse_expr.h"
+#include "parser/parse_func.h"
 #include "parser/parse_relation.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
@@ -476,6 +477,70 @@ defGetCopyLogVerbosityChoice(DefElem *def, ParseState *pstate)
     return COPY_LOG_VERBOSITY_DEFAULT;    /* keep compiler quiet */
 }
 
+/*
+ * Process the "format" option.
+ *
+ * This function checks whether the option value is a built-in format such as
+ * "text" and "csv" or not. If the option value isn't a built-in format, this
+ * function finds a COPY format handler that returns a CopyToRoutine (for
+ * is_from == false). If no COPY format handler is found, this function
+ * reports an error.
+ */
+static void
+ProcessCopyOptionFormat(ParseState *pstate,
+                        CopyFormatOptions *opts_out,
+                        bool is_from,
+                        DefElem *defel)
+{
+    char       *format;
+    Oid            funcargtypes[1];
+    Oid            handlerOid = InvalidOid;
+
+    format = defGetString(defel);
+
+    opts_out->csv_mode = false;
+    opts_out->binary = false;
+    /* built-in formats */
+    if (strcmp(format, "text") == 0)
+    {
+        /* "csv_mode == false && binary == false" means "text" */
+        return;
+    }
+    else if (strcmp(format, "csv") == 0)
+    {
+        opts_out->csv_mode = true;
+        return;
+    }
+    else if (strcmp(format, "binary") == 0)
+    {
+        opts_out->binary = true;
+        return;
+    }
+
+    /* custom format */
+    if (!is_from)
+    {
+        funcargtypes[0] = INTERNALOID;
+        handlerOid = LookupFuncName(list_make1(makeString(format)), 1,
+                                    funcargtypes, true);
+    }
+    if (!OidIsValid(handlerOid))
+        ereport(ERROR,
+                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                 errmsg("COPY format \"%s\" not recognized", format),
+                 parser_errposition(pstate, defel->location)));
+
+    /* check that handler has correct return type */
+    if (get_func_rettype(handlerOid) != COPY_HANDLEROID)
+        ereport(ERROR,
+                (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+                 errmsg("function %s must return type %s",
+                        format, "copy_handler"),
+                 parser_errposition(pstate, defel->location)));
+
+    opts_out->handler = handlerOid;
+}
+
 /*
  * Process the statement option list for COPY.
  *
@@ -519,22 +584,10 @@ ProcessCopyOptions(ParseState *pstate,
 
         if (strcmp(defel->defname, "format") == 0)
         {
-            char       *fmt = defGetString(defel);
-
             if (format_specified)
                 errorConflictingDefElem(defel, pstate);
             format_specified = true;
-            if (strcmp(fmt, "text") == 0)
-                 /* default format */ ;
-            else if (strcmp(fmt, "csv") == 0)
-                opts_out->csv_mode = true;
-            else if (strcmp(fmt, "binary") == 0)
-                opts_out->binary = true;
-            else
-                ereport(ERROR,
-                        (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-                         errmsg("COPY format \"%s\" not recognized", fmt),
-                         parser_errposition(pstate, defel->location)));
+            ProcessCopyOptionFormat(pstate, opts_out, is_from, defel);
         }
         else if (strcmp(defel->defname, "freeze") == 0)
         {
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 26c67ddc351..18af2aaa2f9 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -150,6 +150,7 @@ static void CopySendInt16(CopyToState cstate, int16 val);
 
 /* text format */
 static const CopyToRoutine CopyToRoutineText = {
+    .type = T_CopyToRoutine,
     .CopyToStart = CopyToTextLikeStart,
     .CopyToOutFunc = CopyToTextLikeOutFunc,
     .CopyToOneRow = CopyToTextOneRow,
@@ -158,6 +159,7 @@ static const CopyToRoutine CopyToRoutineText = {
 
 /* CSV format */
 static const CopyToRoutine CopyToRoutineCSV = {
+    .type = T_CopyToRoutine,
     .CopyToStart = CopyToTextLikeStart,
     .CopyToOutFunc = CopyToTextLikeOutFunc,
     .CopyToOneRow = CopyToCSVOneRow,
@@ -166,6 +168,7 @@ static const CopyToRoutine CopyToRoutineCSV = {
 
 /* binary format */
 static const CopyToRoutine CopyToRoutineBinary = {
+    .type = T_CopyToRoutine,
     .CopyToStart = CopyToBinaryStart,
     .CopyToOutFunc = CopyToBinaryOutFunc,
     .CopyToOneRow = CopyToBinaryOneRow,
@@ -174,15 +177,32 @@ static const CopyToRoutine CopyToRoutineBinary = {
 
 /* Return a COPY TO routine for the given options */
 static const CopyToRoutine *
-CopyToGetRoutine(CopyFormatOptions opts)
+CopyToGetRoutine(CopyFormatOptions *opts)
 {
-    if (opts.csv_mode)
-        return &CopyToRoutineCSV;
-    else if (opts.binary)
-        return &CopyToRoutineBinary;
+    if (OidIsValid(opts->handler))
+    {
+        Datum        datum;
+        Node       *routine;
 
-    /* default is text */
-    return &CopyToRoutineText;
+        datum = OidFunctionCall1(opts->handler, BoolGetDatum(false));
+        routine = (Node *) DatumGetPointer(datum);
+        if (routine == NULL || !IsA(routine, CopyToRoutine))
+            ereport(
+                    ERROR,
+                    (errcode(
+                             ERRCODE_INVALID_PARAMETER_VALUE),
+                     errmsg("COPY handler function "
+                            "%u did not return "
+                            "CopyToRoutine struct",
+                            opts->handler)));
+        return castNode(CopyToRoutine, routine);
+    }
+    else if (opts->csv_mode)
+        return &CopyToRoutineCSV;
+    else if (opts->binary)
+        return &CopyToRoutineBinary;
+    else
+        return &CopyToRoutineText;
 }
 
 /* Implementation of the start callback for text and CSV formats */
@@ -703,7 +723,7 @@ BeginCopyTo(ParseState *pstate,
     ProcessCopyOptions(pstate, &cstate->opts, false /* is_from */ , options);
 
     /* Set format routine */
-    cstate->routine = CopyToGetRoutine(cstate->opts);
+    cstate->routine = CopyToGetRoutine(&cstate->opts);
 
     /* Process the source/target relation or query */
     if (rel)
diff --git a/src/backend/nodes/Makefile b/src/backend/nodes/Makefile
index 77ddb9ca53f..dc6c1087361 100644
--- a/src/backend/nodes/Makefile
+++ b/src/backend/nodes/Makefile
@@ -50,6 +50,7 @@ node_headers = \
     access/sdir.h \
     access/tableam.h \
     access/tsmapi.h \
+    commands/copyapi.h \
     commands/event_trigger.h \
     commands/trigger.h \
     executor/tuptable.h \
diff --git a/src/backend/nodes/gen_node_support.pl b/src/backend/nodes/gen_node_support.pl
old mode 100644
new mode 100755
index 1a657f7e0ae..fb90635a245
--- a/src/backend/nodes/gen_node_support.pl
+++ b/src/backend/nodes/gen_node_support.pl
@@ -62,6 +62,7 @@ my @all_input_files = qw(
   access/sdir.h
   access/tableam.h
   access/tsmapi.h
+  commands/copyapi.h
   commands/event_trigger.h
   commands/trigger.h
   executor/tuptable.h
@@ -86,6 +87,7 @@ my @nodetag_only_files = qw(
   access/sdir.h
   access/tableam.h
   access/tsmapi.h
+  commands/copyapi.h
   commands/event_trigger.h
   commands/trigger.h
   executor/tuptable.h
diff --git a/src/backend/utils/adt/pseudotypes.c b/src/backend/utils/adt/pseudotypes.c
index 317a1f2b282..f2ebc21ca56 100644
--- a/src/backend/utils/adt/pseudotypes.c
+++ b/src/backend/utils/adt/pseudotypes.c
@@ -370,6 +370,7 @@ PSEUDOTYPE_DUMMY_IO_FUNCS(fdw_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(table_am_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(index_am_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(tsm_handler);
+PSEUDOTYPE_DUMMY_IO_FUNCS(copy_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(internal);
 PSEUDOTYPE_DUMMY_IO_FUNCS(anyelement);
 PSEUDOTYPE_DUMMY_IO_FUNCS(anynonarray);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 5b8c2ad2a54..b231e7a041e 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -7803,6 +7803,12 @@
 { oid => '3312', descr => 'I/O',
   proname => 'tsm_handler_out', prorettype => 'cstring',
   proargtypes => 'tsm_handler', prosrc => 'tsm_handler_out' },
+{ oid => '8753', descr => 'I/O',
+  proname => 'copy_handler_in', proisstrict => 'f', prorettype => 'copy_handler',
+  proargtypes => 'cstring', prosrc => 'copy_handler_in' },
+{ oid => '8754', descr => 'I/O',
+  proname => 'copy_handler_out', prorettype => 'cstring',
+  proargtypes => 'copy_handler', prosrc => 'copy_handler_out' },
 { oid => '267', descr => 'I/O',
   proname => 'table_am_handler_in', proisstrict => 'f',
   prorettype => 'table_am_handler', proargtypes => 'cstring',
diff --git a/src/include/catalog/pg_type.dat b/src/include/catalog/pg_type.dat
index 6dca77e0a22..340e0cd0a8d 100644
--- a/src/include/catalog/pg_type.dat
+++ b/src/include/catalog/pg_type.dat
@@ -633,6 +633,12 @@
   typcategory => 'P', typinput => 'tsm_handler_in',
   typoutput => 'tsm_handler_out', typreceive => '-', typsend => '-',
   typalign => 'i' },
+{ oid => '8752',
+  descr => 'pseudo-type for the result of a copy to method function',
+  typname => 'copy_handler', typlen => '4', typbyval => 't', typtype => 'p',
+  typcategory => 'P', typinput => 'copy_handler_in',
+  typoutput => 'copy_handler_out', typreceive => '-', typsend => '-',
+  typalign => 'i' },
 { oid => '269',
   descr => 'pseudo-type for the result of a table AM handler function',
   typname => 'table_am_handler', typlen => '4', typbyval => 't', typtype => 'p',
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 7bc044e2816..285f2c8fc4f 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -87,6 +87,7 @@ typedef struct CopyFormatOptions
     CopyLogVerbosityChoice log_verbosity;    /* verbosity of logged messages */
     int64        reject_limit;    /* maximum tolerable number of errors */
     List       *convert_select; /* list of column names (can be NIL) */
+    Oid            handler;        /* handler function for custom format routine */
 } CopyFormatOptions;
 
 /* These are private in commands/copy[from|to].c */
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 39e5a096da5..c125dc3e209 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -22,6 +22,8 @@
  */
 typedef struct CopyToRoutine
 {
+    NodeTag        type;
+
     /*
      * Set output function information. This callback is called once at the
      * beginning of COPY TO.
diff --git a/src/include/nodes/meson.build b/src/include/nodes/meson.build
index d1ca24dd32f..96e70e7f38b 100644
--- a/src/include/nodes/meson.build
+++ b/src/include/nodes/meson.build
@@ -12,6 +12,7 @@ node_support_input_i = [
   'access/sdir.h',
   'access/tableam.h',
   'access/tsmapi.h',
+  'commands/copyapi.h',
   'commands/event_trigger.h',
   'commands/trigger.h',
   'executor/tuptable.h',
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index c0d3cf0e14b..33e3a49a4fb 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -15,6 +15,7 @@ SUBDIRS = \
           spgist_name_ops \
           test_bloomfilter \
           test_copy_callbacks \
+          test_copy_format \
           test_custom_rmgrs \
           test_ddl_deparse \
           test_dsa \
diff --git a/src/test/modules/meson.build b/src/test/modules/meson.build
index 4f544a042d4..bf25658793d 100644
--- a/src/test/modules/meson.build
+++ b/src/test/modules/meson.build
@@ -14,6 +14,7 @@ subdir('spgist_name_ops')
 subdir('ssl_passphrase_callback')
 subdir('test_bloomfilter')
 subdir('test_copy_callbacks')
+subdir('test_copy_format')
 subdir('test_custom_rmgrs')
 subdir('test_ddl_deparse')
 subdir('test_dsa')
diff --git a/src/test/modules/test_copy_format/.gitignore b/src/test/modules/test_copy_format/.gitignore
new file mode 100644
index 00000000000..5dcb3ff9723
--- /dev/null
+++ b/src/test/modules/test_copy_format/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/src/test/modules/test_copy_format/Makefile b/src/test/modules/test_copy_format/Makefile
new file mode 100644
index 00000000000..8497f91624d
--- /dev/null
+++ b/src/test/modules/test_copy_format/Makefile
@@ -0,0 +1,23 @@
+# src/test/modules/test_copy_format/Makefile
+
+MODULE_big = test_copy_format
+OBJS = \
+    $(WIN32RES) \
+    test_copy_format.o
+PGFILEDESC = "test_copy_format - test custom COPY FORMAT"
+
+EXTENSION = test_copy_format
+DATA = test_copy_format--1.0.sql
+
+REGRESS = test_copy_format
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_copy_format
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out
b/src/test/modules/test_copy_format/expected/test_copy_format.out
new file mode 100644
index 00000000000..adfe7d1572a
--- /dev/null
+++ b/src/test/modules/test_copy_format/expected/test_copy_format.out
@@ -0,0 +1,17 @@
+CREATE EXTENSION test_copy_format;
+CREATE TABLE public.test (a smallint, b integer, c bigint);
+INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
+ERROR:  COPY format "test_copy_format" not recognized
+LINE 1: COPY public.test FROM stdin WITH (FORMAT 'test_copy_format')...
+                                          ^
+COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
+NOTICE:  test_copy_format: is_from=false
+NOTICE:  CopyToOutFunc: atttypid=21
+NOTICE:  CopyToOutFunc: atttypid=23
+NOTICE:  CopyToOutFunc: atttypid=20
+NOTICE:  CopyToStart: natts=3
+NOTICE:  CopyToOneRow: tts_nvalid=3
+NOTICE:  CopyToOneRow: tts_nvalid=3
+NOTICE:  CopyToOneRow: tts_nvalid=3
+NOTICE:  CopyToEnd
diff --git a/src/test/modules/test_copy_format/meson.build b/src/test/modules/test_copy_format/meson.build
new file mode 100644
index 00000000000..4cefe7b709a
--- /dev/null
+++ b/src/test/modules/test_copy_format/meson.build
@@ -0,0 +1,33 @@
+# Copyright (c) 2024, PostgreSQL Global Development Group
+
+test_copy_format_sources = files(
+  'test_copy_format.c',
+)
+
+if host_system == 'windows'
+  test_copy_format_sources += rc_lib_gen.process(win32ver_rc, extra_args: [
+    '--NAME', 'test_copy_format',
+    '--FILEDESC', 'test_copy_format - test custom COPY FORMAT',])
+endif
+
+test_copy_format = shared_module('test_copy_format',
+  test_copy_format_sources,
+  kwargs: pg_test_mod_args,
+)
+test_install_libs += test_copy_format
+
+test_install_data += files(
+  'test_copy_format.control',
+  'test_copy_format--1.0.sql',
+)
+
+tests += {
+  'name': 'test_copy_format',
+  'sd': meson.current_source_dir(),
+  'bd': meson.current_build_dir(),
+  'regress': {
+    'sql': [
+      'test_copy_format',
+    ],
+  },
+}
diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql
b/src/test/modules/test_copy_format/sql/test_copy_format.sql
new file mode 100644
index 00000000000..810b3d8cedc
--- /dev/null
+++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql
@@ -0,0 +1,5 @@
+CREATE EXTENSION test_copy_format;
+CREATE TABLE public.test (a smallint, b integer, c bigint);
+INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
+COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
diff --git a/src/test/modules/test_copy_format/test_copy_format--1.0.sql
b/src/test/modules/test_copy_format/test_copy_format--1.0.sql
new file mode 100644
index 00000000000..d24ea03ce99
--- /dev/null
+++ b/src/test/modules/test_copy_format/test_copy_format--1.0.sql
@@ -0,0 +1,8 @@
+/* src/test/modules/test_copy_format/test_copy_format--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_copy_format" to load this file. \quit
+
+CREATE FUNCTION test_copy_format(internal)
+    RETURNS copy_handler
+    AS 'MODULE_PATHNAME' LANGUAGE C;
diff --git a/src/test/modules/test_copy_format/test_copy_format.c
b/src/test/modules/test_copy_format/test_copy_format.c
new file mode 100644
index 00000000000..e064f40473b
--- /dev/null
+++ b/src/test/modules/test_copy_format/test_copy_format.c
@@ -0,0 +1,63 @@
+/*--------------------------------------------------------------------------
+ *
+ * test_copy_format.c
+ *        Code for testing custom COPY format.
+ *
+ * Portions Copyright (c) 2024, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *        src/test/modules/test_copy_format/test_copy_format.c
+ *
+ * -------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "commands/copyapi.h"
+#include "commands/defrem.h"
+
+PG_MODULE_MAGIC;
+
+static void
+CopyToOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo)
+{
+    ereport(NOTICE, (errmsg("CopyToOutFunc: atttypid=%d", atttypid)));
+}
+
+static void
+CopyToStart(CopyToState cstate, TupleDesc tupDesc)
+{
+    ereport(NOTICE, (errmsg("CopyToStart: natts=%d", tupDesc->natts)));
+}
+
+static void
+CopyToOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+    ereport(NOTICE, (errmsg("CopyToOneRow: tts_nvalid=%u", slot->tts_nvalid)));
+}
+
+static void
+CopyToEnd(CopyToState cstate)
+{
+    ereport(NOTICE, (errmsg("CopyToEnd")));
+}
+
+static const CopyToRoutine CopyToRoutineTestCopyFormat = {
+    .type = T_CopyToRoutine,
+    .CopyToOutFunc = CopyToOutFunc,
+    .CopyToStart = CopyToStart,
+    .CopyToOneRow = CopyToOneRow,
+    .CopyToEnd = CopyToEnd,
+};
+
+PG_FUNCTION_INFO_V1(test_copy_format);
+Datum
+test_copy_format(PG_FUNCTION_ARGS)
+{
+    bool        is_from = PG_GETARG_BOOL(0);
+
+    ereport(NOTICE,
+            (errmsg("test_copy_format: is_from=%s", is_from ? "true" : "false")));
+
+    PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat);
+}
diff --git a/src/test/modules/test_copy_format/test_copy_format.control
b/src/test/modules/test_copy_format/test_copy_format.control
new file mode 100644
index 00000000000..f05a6362358
--- /dev/null
+++ b/src/test/modules/test_copy_format/test_copy_format.control
@@ -0,0 +1,4 @@
+comment = 'Test code for custom COPY format'
+default_version = '1.0'
+module_pathname = '$libdir/test_copy_format'
+relocatable = true
-- 
2.47.1
From 92ffd2446cc17e657e2816a29930c8a31464cdb3 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Mon, 25 Nov 2024 13:58:33 +0900
Subject: [PATCH v32 4/9] Export CopyToStateData as private data
It's for custom COPY TO format handlers implemented as extension.
This just moves codes. This doesn't change codes except CopyDest enum
values. CopyDest/CopyFrom enum values such as COPY_FILE are conflicted
each other. So COPY_DEST_ prefix instead of COPY_ prefix is used for
CopyDest enum values. For example, COPY_FILE in CopyDest is renamed to
COPY_DEST_FILE.
Note that this isn't enough to implement custom COPY TO format
handlers as extension. We'll do the followings in a subsequent commit:
1. Add an opaque space for custom COPY TO format handler
2. Export CopySendEndOfRow() to flush buffer
---
 src/backend/commands/copyto.c          | 78 +++---------------------
 src/include/commands/copy.h            |  2 +-
 src/include/commands/copyto_internal.h | 83 ++++++++++++++++++++++++++
 3 files changed, 93 insertions(+), 70 deletions(-)
 create mode 100644 src/include/commands/copyto_internal.h
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 18af2aaa2f9..16d3b389e97 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -20,6 +20,7 @@
 
 #include "access/tableam.h"
 #include "commands/copyapi.h"
+#include "commands/copyto_internal.h"
 #include "commands/progress.h"
 #include "executor/execdesc.h"
 #include "executor/executor.h"
@@ -36,67 +37,6 @@
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
-/*
- * Represents the different dest cases we need to worry about at
- * the bottom level
- */
-typedef enum CopyDest
-{
-    COPY_FILE,                    /* to file (or a piped program) */
-    COPY_FRONTEND,                /* to frontend */
-    COPY_CALLBACK,                /* to callback function */
-} CopyDest;
-
-/*
- * This struct contains all the state variables used throughout a COPY TO
- * operation.
- *
- * Multi-byte encodings: all supported client-side encodings encode multi-byte
- * characters by having the first byte's high bit set. Subsequent bytes of the
- * character can have the high bit not set. When scanning data in such an
- * encoding to look for a match to a single-byte (ie ASCII) character, we must
- * use the full pg_encoding_mblen() machinery to skip over multibyte
- * characters, else we might find a false match to a trailing byte. In
- * supported server encodings, there is no possibility of a false match, and
- * it's faster to make useless comparisons to trailing bytes than it is to
- * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true
- * when we have to do it the hard way.
- */
-typedef struct CopyToStateData
-{
-    /* format-specific routines */
-    const CopyToRoutine *routine;
-
-    /* low-level state data */
-    CopyDest    copy_dest;        /* type of copy source/destination */
-    FILE       *copy_file;        /* used if copy_dest == COPY_FILE */
-    StringInfo    fe_msgbuf;        /* used for all dests during COPY TO */
-
-    int            file_encoding;    /* file or remote side's character encoding */
-    bool        need_transcoding;    /* file encoding diff from server? */
-    bool        encoding_embeds_ascii;    /* ASCII can be non-first byte? */
-
-    /* parameters from the COPY command */
-    Relation    rel;            /* relation to copy to */
-    QueryDesc  *queryDesc;        /* executable query to copy from */
-    List       *attnumlist;        /* integer list of attnums to copy */
-    char       *filename;        /* filename, or NULL for STDOUT */
-    bool        is_program;        /* is 'filename' a program to popen? */
-    copy_data_dest_cb data_dest_cb; /* function for writing data */
-
-    CopyFormatOptions opts;
-    Node       *whereClause;    /* WHERE condition (or NULL) */
-
-    /*
-     * Working state
-     */
-    MemoryContext copycontext;    /* per-copy execution context */
-
-    FmgrInfo   *out_functions;    /* lookup info for output functions */
-    MemoryContext rowcontext;    /* per-row evaluation context */
-    uint64        bytes_processed;    /* number of bytes processed so far */
-} CopyToStateData;
-
 /* DestReceiver for COPY (query) TO */
 typedef struct
 {
@@ -424,7 +364,7 @@ SendCopyBegin(CopyToState cstate)
     for (i = 0; i < natts; i++)
         pq_sendint16(&buf, format); /* per-column formats */
     pq_endmessage(&buf);
-    cstate->copy_dest = COPY_FRONTEND;
+    cstate->copy_dest = COPY_DEST_FRONTEND;
 }
 
 static void
@@ -471,7 +411,7 @@ CopySendEndOfRow(CopyToState cstate)
 
     switch (cstate->copy_dest)
     {
-        case COPY_FILE:
+        case COPY_DEST_FILE:
             if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1,
                        cstate->copy_file) != 1 ||
                 ferror(cstate->copy_file))
@@ -505,11 +445,11 @@ CopySendEndOfRow(CopyToState cstate)
                              errmsg("could not write to COPY file: %m")));
             }
             break;
-        case COPY_FRONTEND:
+        case COPY_DEST_FRONTEND:
             /* Dump the accumulated row as one CopyData message */
             (void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len);
             break;
-        case COPY_CALLBACK:
+        case COPY_DEST_CALLBACK:
             cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len);
             break;
     }
@@ -530,7 +470,7 @@ CopySendTextLikeEndOfRow(CopyToState cstate)
 {
     switch (cstate->copy_dest)
     {
-        case COPY_FILE:
+        case COPY_DEST_FILE:
             /* Default line termination depends on platform */
 #ifndef WIN32
             CopySendChar(cstate, '\n');
@@ -538,7 +478,7 @@ CopySendTextLikeEndOfRow(CopyToState cstate)
             CopySendString(cstate, "\r\n");
 #endif
             break;
-        case COPY_FRONTEND:
+        case COPY_DEST_FRONTEND:
             /* The FE/BE protocol uses \n as newline for all platforms */
             CopySendChar(cstate, '\n');
             break;
@@ -922,12 +862,12 @@ BeginCopyTo(ParseState *pstate,
     /* See Multibyte encoding comment above */
     cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding);
 
-    cstate->copy_dest = COPY_FILE;    /* default */
+    cstate->copy_dest = COPY_DEST_FILE; /* default */
 
     if (data_dest_cb)
     {
         progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK;
-        cstate->copy_dest = COPY_CALLBACK;
+        cstate->copy_dest = COPY_DEST_CALLBACK;
         cstate->data_dest_cb = data_dest_cb;
     }
     else if (pipe)
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 285f2c8fc4f..be97b07b559 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -90,7 +90,7 @@ typedef struct CopyFormatOptions
     Oid            handler;        /* handler function for custom format routine */
 } CopyFormatOptions;
 
-/* These are private in commands/copy[from|to].c */
+/* These are private in commands/copy[from|to]_internal.h */
 typedef struct CopyFromStateData *CopyFromState;
 typedef struct CopyToStateData *CopyToState;
 
diff --git a/src/include/commands/copyto_internal.h b/src/include/commands/copyto_internal.h
new file mode 100644
index 00000000000..1b58b36c0a3
--- /dev/null
+++ b/src/include/commands/copyto_internal.h
@@ -0,0 +1,83 @@
+/*-------------------------------------------------------------------------
+ *
+ * copyto_internal.h
+ *      Internal definitions for COPY TO command.
+ *
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/commands/copyto_internal.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef COPYTO_INTERNAL_H
+#define COPYTO_INTERNAL_H
+
+#include "commands/copy.h"
+#include "executor/execdesc.h"
+#include "executor/tuptable.h"
+#include "nodes/execnodes.h"
+
+/*
+ * Represents the different dest cases we need to worry about at
+ * the bottom level
+ */
+typedef enum CopyDest
+{
+    COPY_DEST_FILE,                /* to file (or a piped program) */
+    COPY_DEST_FRONTEND,            /* to frontend */
+    COPY_DEST_CALLBACK,            /* to callback function */
+} CopyDest;
+
+/*
+ * This struct contains all the state variables used throughout a COPY TO
+ * operation.
+ *
+ * Multi-byte encodings: all supported client-side encodings encode multi-byte
+ * characters by having the first byte's high bit set. Subsequent bytes of the
+ * character can have the high bit not set. When scanning data in such an
+ * encoding to look for a match to a single-byte (ie ASCII) character, we must
+ * use the full pg_encoding_mblen() machinery to skip over multibyte
+ * characters, else we might find a false match to a trailing byte. In
+ * supported server encodings, there is no possibility of a false match, and
+ * it's faster to make useless comparisons to trailing bytes than it is to
+ * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true
+ * when we have to do it the hard way.
+ */
+typedef struct CopyToStateData
+{
+    /* format-specific routines */
+    const struct CopyToRoutine *routine;
+
+    /* low-level state data */
+    CopyDest    copy_dest;        /* type of copy source/destination */
+    FILE       *copy_file;        /* used if copy_dest == COPY_FILE */
+    StringInfo    fe_msgbuf;        /* used for all dests during COPY TO */
+
+    int            file_encoding;    /* file or remote side's character encoding */
+    bool        need_transcoding;    /* file encoding diff from server? */
+    bool        encoding_embeds_ascii;    /* ASCII can be non-first byte? */
+
+    /* parameters from the COPY command */
+    Relation    rel;            /* relation to copy to */
+    QueryDesc  *queryDesc;        /* executable query to copy from */
+    List       *attnumlist;        /* integer list of attnums to copy */
+    char       *filename;        /* filename, or NULL for STDOUT */
+    bool        is_program;        /* is 'filename' a program to popen? */
+    copy_data_dest_cb data_dest_cb; /* function for writing data */
+
+    CopyFormatOptions opts;
+    Node       *whereClause;    /* WHERE condition (or NULL) */
+
+    /*
+     * Working state
+     */
+    MemoryContext copycontext;    /* per-copy execution context */
+
+    FmgrInfo   *out_functions;    /* lookup info for output functions */
+    MemoryContext rowcontext;    /* per-row evaluation context */
+    uint64        bytes_processed;    /* number of bytes processed so far */
+} CopyToStateData;
+
+#endif                            /* COPYTO_INTERNAL_H */
-- 
2.47.1
From fae5ed7cd759ef77b28d69400f0ebdf9fbfd7c4d Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Mon, 25 Nov 2024 14:01:18 +0900
Subject: [PATCH v32 5/9] Add support for implementing custom COPY TO format as
 extension
* Add CopyToStateData::opaque that can be used to keep data for custom
  COPY TO format implementation
* Export CopySendEndOfRow() to flush data in CopyToStateData::fe_msgbuf
  as CopyToStateFlush()
---
 src/backend/commands/copyto.c          | 12 ++++++++++++
 src/include/commands/copyapi.h         |  2 ++
 src/include/commands/copyto_internal.h |  3 +++
 3 files changed, 17 insertions(+)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 16d3b389e97..20d49d73e38 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -461,6 +461,18 @@ CopySendEndOfRow(CopyToState cstate)
     resetStringInfo(fe_msgbuf);
 }
 
+/*
+ * Export CopySendEndOfRow() for extensions. We want to keep
+ * CopySendEndOfRow() as a static function for
+ * optimization. CopySendEndOfRow() calls in this file may be optimized by a
+ * compiler.
+ */
+void
+CopyToStateFlush(CopyToState cstate)
+{
+    CopySendEndOfRow(cstate);
+}
+
 /*
  * Wrapper function of CopySendEndOfRow for text and CSV formats. Sends the
  * the line termination and do common appropriate things for the end of row.
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index c125dc3e209..d0da9e07a0d 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -54,6 +54,8 @@ typedef struct CopyToRoutine
     void        (*CopyToEnd) (CopyToState cstate);
 } CopyToRoutine;
 
+extern void CopyToStateFlush(CopyToState cstate);
+
 /*
  * API structure for a COPY FROM format implementation.     Note this must be
  * allocated in a server-lifetime manner, typically as a static const struct.
diff --git a/src/include/commands/copyto_internal.h b/src/include/commands/copyto_internal.h
index 1b58b36c0a3..ce1c33a4004 100644
--- a/src/include/commands/copyto_internal.h
+++ b/src/include/commands/copyto_internal.h
@@ -78,6 +78,9 @@ typedef struct CopyToStateData
     FmgrInfo   *out_functions;    /* lookup info for output functions */
     MemoryContext rowcontext;    /* per-row evaluation context */
     uint64        bytes_processed;    /* number of bytes processed so far */
+
+    /* For custom format implementation */
+    void       *opaque;            /* private space */
 } CopyToStateData;
 
 #endif                            /* COPYTO_INTERNAL_H */
-- 
2.47.1
From 298fb99716c7d392e10f5f6d24346b7c0770c05a Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Mon, 25 Nov 2024 14:11:55 +0900
Subject: [PATCH v32 6/9] Add support for adding custom COPY FROM format
This uses the same handler for COPY TO and COPY FROM but uses
different routine. This uses CopyToRoutine for COPY TO and
CopyFromRoutine for COPY FROM. PostgreSQL calls a COPY TO/FROM handler
with "is_from" argument. It's true for COPY FROM and false for COPY
TO:
    copy_handler(true) returns CopyToRoutine
    copy_handler(false) returns CopyFromRoutine
This also add a test module for custom COPY FROM handler.
---
 src/backend/commands/copy.c                   | 13 +++----
 src/backend/commands/copyfrom.c               | 36 +++++++++++++----
 src/include/catalog/pg_type.dat               |  2 +-
 src/include/commands/copyapi.h                |  2 +
 .../expected/test_copy_format.out             | 10 +++--
 .../test_copy_format/sql/test_copy_format.sql |  1 +
 .../test_copy_format/test_copy_format.c       | 39 ++++++++++++++++++-
 7 files changed, 82 insertions(+), 21 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 8d94bc313eb..b4417bb6819 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -483,8 +483,8 @@ defGetCopyLogVerbosityChoice(DefElem *def, ParseState *pstate)
  * This function checks whether the option value is a built-in format such as
  * "text" and "csv" or not. If the option value isn't a built-in format, this
  * function finds a COPY format handler that returns a CopyToRoutine (for
- * is_from == false). If no COPY format handler is found, this function
- * reports an error.
+ * is_from == false) or CopyFromRountine (for is_from == true). If no COPY
+ * format handler is found, this function reports an error.
  */
 static void
 ProcessCopyOptionFormat(ParseState *pstate,
@@ -518,12 +518,9 @@ ProcessCopyOptionFormat(ParseState *pstate,
     }
 
     /* custom format */
-    if (!is_from)
-    {
-        funcargtypes[0] = INTERNALOID;
-        handlerOid = LookupFuncName(list_make1(makeString(format)), 1,
-                                    funcargtypes, true);
-    }
+    funcargtypes[0] = INTERNALOID;
+    handlerOid = LookupFuncName(list_make1(makeString(format)), 1,
+                                funcargtypes, true);
     if (!OidIsValid(handlerOid))
         ereport(ERROR,
                 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 8b09df0581f..37647949bfc 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -129,6 +129,7 @@ static void CopyFromBinaryEnd(CopyFromState cstate);
 
 /* text format */
 static const CopyFromRoutine CopyFromRoutineText = {
+    .type = T_CopyFromRoutine,
     .CopyFromInFunc = CopyFromTextLikeInFunc,
     .CopyFromStart = CopyFromTextLikeStart,
     .CopyFromOneRow = CopyFromTextOneRow,
@@ -137,6 +138,7 @@ static const CopyFromRoutine CopyFromRoutineText = {
 
 /* CSV format */
 static const CopyFromRoutine CopyFromRoutineCSV = {
+    .type = T_CopyFromRoutine,
     .CopyFromInFunc = CopyFromTextLikeInFunc,
     .CopyFromStart = CopyFromTextLikeStart,
     .CopyFromOneRow = CopyFromCSVOneRow,
@@ -145,6 +147,7 @@ static const CopyFromRoutine CopyFromRoutineCSV = {
 
 /* binary format */
 static const CopyFromRoutine CopyFromRoutineBinary = {
+    .type = T_CopyFromRoutine,
     .CopyFromInFunc = CopyFromBinaryInFunc,
     .CopyFromStart = CopyFromBinaryStart,
     .CopyFromOneRow = CopyFromBinaryOneRow,
@@ -153,15 +156,32 @@ static const CopyFromRoutine CopyFromRoutineBinary = {
 
 /* Return a COPY FROM routine for the given options */
 static const CopyFromRoutine *
-CopyFromGetRoutine(CopyFormatOptions opts)
+CopyFromGetRoutine(CopyFormatOptions *opts)
 {
-    if (opts.csv_mode)
-        return &CopyFromRoutineCSV;
-    else if (opts.binary)
-        return &CopyFromRoutineBinary;
+    if (OidIsValid(opts->handler))
+    {
+        Datum        datum;
+        Node       *routine;
 
-    /* default is text */
-    return &CopyFromRoutineText;
+        datum = OidFunctionCall1(opts->handler, BoolGetDatum(true));
+        routine = (Node *) DatumGetPointer(datum);
+        if (routine == NULL || !IsA(routine, CopyFromRoutine))
+            ereport(
+                    ERROR,
+                    (errcode(
+                             ERRCODE_INVALID_PARAMETER_VALUE),
+                     errmsg("COPY handler function "
+                            "%u did not return "
+                            "CopyFromRoutine struct",
+                            opts->handler)));
+        return castNode(CopyFromRoutine, routine);
+    }
+    else if (opts->csv_mode)
+        return &CopyFromRoutineCSV;
+    else if (opts->binary)
+        return &CopyFromRoutineBinary;
+    else
+        return &CopyFromRoutineText;
 }
 
 /* Implementation of the start callback for text and CSV formats */
@@ -1567,7 +1587,7 @@ BeginCopyFrom(ParseState *pstate,
     ProcessCopyOptions(pstate, &cstate->opts, true /* is_from */ , options);
 
     /* Set the format routine */
-    cstate->routine = CopyFromGetRoutine(cstate->opts);
+    cstate->routine = CopyFromGetRoutine(&cstate->opts);
 
     /* Process the target relation */
     cstate->rel = rel;
diff --git a/src/include/catalog/pg_type.dat b/src/include/catalog/pg_type.dat
index 340e0cd0a8d..63b7d65f982 100644
--- a/src/include/catalog/pg_type.dat
+++ b/src/include/catalog/pg_type.dat
@@ -634,7 +634,7 @@
   typoutput => 'tsm_handler_out', typreceive => '-', typsend => '-',
   typalign => 'i' },
 { oid => '8752',
-  descr => 'pseudo-type for the result of a copy to method function',
+  descr => 'pseudo-type for the result of a copy to/from method function',
   typname => 'copy_handler', typlen => '4', typbyval => 't', typtype => 'p',
   typcategory => 'P', typinput => 'copy_handler_in',
   typoutput => 'copy_handler_out', typreceive => '-', typsend => '-',
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index d0da9e07a0d..103eb21767d 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -62,6 +62,8 @@ extern void CopyToStateFlush(CopyToState cstate);
  */
 typedef struct CopyFromRoutine
 {
+    NodeTag        type;
+
     /*
      * Set input function information. This callback is called once at the
      * beginning of COPY FROM.
diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out
b/src/test/modules/test_copy_format/expected/test_copy_format.out
index adfe7d1572a..016893e7026 100644
--- a/src/test/modules/test_copy_format/expected/test_copy_format.out
+++ b/src/test/modules/test_copy_format/expected/test_copy_format.out
@@ -2,9 +2,13 @@ CREATE EXTENSION test_copy_format;
 CREATE TABLE public.test (a smallint, b integer, c bigint);
 INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
 COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
-ERROR:  COPY format "test_copy_format" not recognized
-LINE 1: COPY public.test FROM stdin WITH (FORMAT 'test_copy_format')...
-                                          ^
+NOTICE:  test_copy_format: is_from=true
+NOTICE:  CopyFromInFunc: atttypid=21
+NOTICE:  CopyFromInFunc: atttypid=23
+NOTICE:  CopyFromInFunc: atttypid=20
+NOTICE:  CopyFromStart: natts=3
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromEnd
 COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
 NOTICE:  test_copy_format: is_from=false
 NOTICE:  CopyToOutFunc: atttypid=21
diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql
b/src/test/modules/test_copy_format/sql/test_copy_format.sql
index 810b3d8cedc..0dfdfa00080 100644
--- a/src/test/modules/test_copy_format/sql/test_copy_format.sql
+++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql
@@ -2,4 +2,5 @@ CREATE EXTENSION test_copy_format;
 CREATE TABLE public.test (a smallint, b integer, c bigint);
 INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
 COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
+\.
 COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
diff --git a/src/test/modules/test_copy_format/test_copy_format.c
b/src/test/modules/test_copy_format/test_copy_format.c
index e064f40473b..f6b105659ab 100644
--- a/src/test/modules/test_copy_format/test_copy_format.c
+++ b/src/test/modules/test_copy_format/test_copy_format.c
@@ -18,6 +18,40 @@
 
 PG_MODULE_MAGIC;
 
+static void
+CopyFromInFunc(CopyFromState cstate, Oid atttypid,
+               FmgrInfo *finfo, Oid *typioparam)
+{
+    ereport(NOTICE, (errmsg("CopyFromInFunc: atttypid=%d", atttypid)));
+}
+
+static void
+CopyFromStart(CopyFromState cstate, TupleDesc tupDesc)
+{
+    ereport(NOTICE, (errmsg("CopyFromStart: natts=%d", tupDesc->natts)));
+}
+
+static bool
+CopyFromOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls)
+{
+    ereport(NOTICE, (errmsg("CopyFromOneRow")));
+    return false;
+}
+
+static void
+CopyFromEnd(CopyFromState cstate)
+{
+    ereport(NOTICE, (errmsg("CopyFromEnd")));
+}
+
+static const CopyFromRoutine CopyFromRoutineTestCopyFormat = {
+    .type = T_CopyFromRoutine,
+    .CopyFromInFunc = CopyFromInFunc,
+    .CopyFromStart = CopyFromStart,
+    .CopyFromOneRow = CopyFromOneRow,
+    .CopyFromEnd = CopyFromEnd,
+};
+
 static void
 CopyToOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo)
 {
@@ -59,5 +93,8 @@ test_copy_format(PG_FUNCTION_ARGS)
     ereport(NOTICE,
             (errmsg("test_copy_format: is_from=%s", is_from ? "true" : "false")));
 
-    PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat);
+    if (is_from)
+        PG_RETURN_POINTER(&CopyFromRoutineTestCopyFormat);
+    else
+        PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat);
 }
-- 
2.47.1
From d113e8d9777b789d51b8cf04e869b99e74a6c669 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Mon, 25 Nov 2024 14:19:34 +0900
Subject: [PATCH v32 7/9] Use COPY_SOURCE_ prefix for CopySource enum values
This is for consistency with CopyDest.
---
 src/backend/commands/copyfrom.c          |  4 ++--
 src/backend/commands/copyfromparse.c     | 10 +++++-----
 src/include/commands/copyfrom_internal.h |  6 +++---
 3 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 37647949bfc..29e2a7d13d4 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1722,7 +1722,7 @@ BeginCopyFrom(ParseState *pstate,
                             pg_encoding_to_char(GetDatabaseEncoding()))));
     }
 
-    cstate->copy_src = COPY_FILE;    /* default */
+    cstate->copy_src = COPY_SOURCE_FILE;    /* default */
 
     cstate->whereClause = whereClause;
 
@@ -1850,7 +1850,7 @@ BeginCopyFrom(ParseState *pstate,
     if (data_source_cb)
     {
         progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK;
-        cstate->copy_src = COPY_CALLBACK;
+        cstate->copy_src = COPY_SOURCE_CALLBACK;
         cstate->data_source_cb = data_source_cb;
     }
     else if (pipe)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index c1872acbbf6..75b49629f08 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -171,7 +171,7 @@ ReceiveCopyBegin(CopyFromState cstate)
     for (i = 0; i < natts; i++)
         pq_sendint16(&buf, format); /* per-column formats */
     pq_endmessage(&buf);
-    cstate->copy_src = COPY_FRONTEND;
+    cstate->copy_src = COPY_SOURCE_FRONTEND;
     cstate->fe_msgbuf = makeStringInfo();
     /* We *must* flush here to ensure FE knows it can send. */
     pq_flush();
@@ -239,7 +239,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
 
     switch (cstate->copy_src)
     {
-        case COPY_FILE:
+        case COPY_SOURCE_FILE:
             bytesread = fread(databuf, 1, maxread, cstate->copy_file);
             if (ferror(cstate->copy_file))
                 ereport(ERROR,
@@ -248,7 +248,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
             if (bytesread == 0)
                 cstate->raw_reached_eof = true;
             break;
-        case COPY_FRONTEND:
+        case COPY_SOURCE_FRONTEND:
             while (maxread > 0 && bytesread < minread && !cstate->raw_reached_eof)
             {
                 int            avail;
@@ -331,7 +331,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
                 bytesread += avail;
             }
             break;
-        case COPY_CALLBACK:
+        case COPY_SOURCE_CALLBACK:
             bytesread = cstate->data_source_cb(databuf, minread, maxread);
             break;
     }
@@ -1159,7 +1159,7 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
          * after \. up to the protocol end of copy data.  (XXX maybe better
          * not to treat \. as special?)
          */
-        if (cstate->copy_src == COPY_FRONTEND)
+        if (cstate->copy_src == COPY_SOURCE_FRONTEND)
         {
             int            inbytes;
 
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index c8b22af22d8..3a306e3286e 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -24,9 +24,9 @@
  */
 typedef enum CopySource
 {
-    COPY_FILE,                    /* from file (or a piped program) */
-    COPY_FRONTEND,                /* from frontend */
-    COPY_CALLBACK,                /* from callback function */
+    COPY_SOURCE_FILE,            /* from file (or a piped program) */
+    COPY_SOURCE_FRONTEND,        /* from frontend */
+    COPY_SOURCE_CALLBACK,        /* from callback function */
 } CopySource;
 
 /*
-- 
2.47.1
From 65c85cbf6f2d147923352af7272a34de506cb3a4 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Mon, 25 Nov 2024 14:21:39 +0900
Subject: [PATCH v32 8/9] Add support for implementing custom COPY FROM format
 as extension
* Add CopyFromStateData::opaque that can be used to keep data for
  custom COPY From format implementation
* Export CopyGetData() to get the next data as
  CopyFromStateGetData()
---
 src/backend/commands/copyfromparse.c     | 11 +++++++++++
 src/include/commands/copyapi.h           |  2 ++
 src/include/commands/copyfrom_internal.h |  3 +++
 3 files changed, 16 insertions(+)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 75b49629f08..01f2e7a8824 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -730,6 +730,17 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes)
     return copied_bytes;
 }
 
+/*
+ * Export CopyGetData() for extensions. We want to keep CopyGetData() as a
+ * static function for optimization. CopyGetData() calls in this file may be
+ * optimized by a compiler.
+ */
+int
+CopyFromStateGetData(CopyFromState cstate, void *dest, int minread, int maxread)
+{
+    return CopyGetData(cstate, dest, minread, maxread);
+}
+
 /*
  * Read raw fields in the next line for COPY FROM in text or csv mode.
  * Return false if no more lines.
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 103eb21767d..ac58adbd23d 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -104,4 +104,6 @@ typedef struct CopyFromRoutine
     void        (*CopyFromEnd) (CopyFromState cstate);
 } CopyFromRoutine;
 
+extern int    CopyFromStateGetData(CopyFromState cstate, void *dest, int minread, int maxread);
+
 #endif                            /* COPYAPI_H */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 3a306e3286e..af425cf5fd9 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -181,6 +181,9 @@ typedef struct CopyFromStateData
 #define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index)
 
     uint64        bytes_processed;    /* number of bytes processed so far */
+
+    /* For custom format implementation */
+    void       *opaque;            /* private space */
 } CopyFromStateData;
 
 extern void ReceiveCopyBegin(CopyFromState cstate);
-- 
2.47.1
From 2a9d0d48a21c91eea7a7f623e45da11672d0d9f8 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Wed, 27 Nov 2024 16:23:55 +0900
Subject: [PATCH v32 9/9] Add CopyFromSkipErrorRow() for custom COPY format
 extension
Extensions must call CopyFromSkipErrorRow() when CopyFromOneRow
callback reports an error by errsave(). CopyFromSkipErrorRow() handles
"ON_ERROR stop" and "LOG_VERBOSITY verbose" cases.
---
 src/backend/commands/copyfromparse.c          | 82 ++++++++++--------
 src/include/commands/copyapi.h                |  2 +
 .../expected/test_copy_format.out             | 47 +++++++++++
 .../test_copy_format/sql/test_copy_format.sql | 24 ++++++
 .../test_copy_format/test_copy_format.c       | 83 ++++++++++++++++++-
 5 files changed, 200 insertions(+), 38 deletions(-)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 01f2e7a8824..7296745d6d2 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -852,6 +852,51 @@ NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields, bool i
     return true;
 }
 
+/*
+ * Call this when you report an error by errsave() in your CopyFromOneRow
+ * callback. This handles "ON_ERROR stop" and "LOG_VERBOSITY verbose" cases
+ * for you.
+ */
+void
+CopyFromSkipErrorRow(CopyFromState cstate)
+{
+    Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
+
+    cstate->num_errors++;
+
+    if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+    {
+        /*
+         * Since we emit line number and column info in the below notice
+         * message, we suppress error context information other than the
+         * relation name.
+         */
+        Assert(!cstate->relname_only);
+        cstate->relname_only = true;
+
+        if (cstate->cur_attval)
+        {
+            char       *attval;
+
+            attval = CopyLimitPrintoutLength(cstate->cur_attval);
+            ereport(NOTICE,
+                    errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": \"%s\"",
+                           (unsigned long long) cstate->cur_lineno,
+                           cstate->cur_attname,
+                           attval));
+            pfree(attval);
+        }
+        else
+            ereport(NOTICE,
+                    errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": null
input",
+                           (unsigned long long) cstate->cur_lineno,
+                           cstate->cur_attname));
+
+        /* reset relname_only */
+        cstate->relname_only = false;
+    }
+}
+
 /*
  * Workhorse for CopyFromTextOneRow() and CopyFromCSVOneRow().
  *
@@ -960,42 +1005,7 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
                                         (Node *) cstate->escontext,
                                         &values[m]))
         {
-            Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
-
-            cstate->num_errors++;
-
-            if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
-            {
-                /*
-                 * Since we emit line number and column info in the below
-                 * notice message, we suppress error context information other
-                 * than the relation name.
-                 */
-                Assert(!cstate->relname_only);
-                cstate->relname_only = true;
-
-                if (cstate->cur_attval)
-                {
-                    char       *attval;
-
-                    attval = CopyLimitPrintoutLength(cstate->cur_attval);
-                    ereport(NOTICE,
-                            errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\":
\"%s\"",
-                                   (unsigned long long) cstate->cur_lineno,
-                                   cstate->cur_attname,
-                                   attval));
-                    pfree(attval);
-                }
-                else
-                    ereport(NOTICE,
-                            errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": null
input",
-                                   (unsigned long long) cstate->cur_lineno,
-                                   cstate->cur_attname));
-
-                /* reset relname_only */
-                cstate->relname_only = false;
-            }
-
+            CopyFromSkipErrorRow(cstate);
             return true;
         }
 
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index ac58adbd23d..dfab62372a7 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -106,4 +106,6 @@ typedef struct CopyFromRoutine
 
 extern int    CopyFromStateGetData(CopyFromState cstate, void *dest, int minread, int maxread);
 
+extern void CopyFromSkipErrorRow(CopyFromState cstate);
+
 #endif                            /* COPYAPI_H */
diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out
b/src/test/modules/test_copy_format/expected/test_copy_format.out
index 016893e7026..b9a6baa85c0 100644
--- a/src/test/modules/test_copy_format/expected/test_copy_format.out
+++ b/src/test/modules/test_copy_format/expected/test_copy_format.out
@@ -1,6 +1,8 @@
 CREATE EXTENSION test_copy_format;
 CREATE TABLE public.test (a smallint, b integer, c bigint);
 INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+-- 987 is accepted.
+-- 654 is a hard error because ON_ERROR is stop by default.
 COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
 NOTICE:  test_copy_format: is_from=true
 NOTICE:  CopyFromInFunc: atttypid=21
@@ -8,7 +10,50 @@ NOTICE:  CopyFromInFunc: atttypid=23
 NOTICE:  CopyFromInFunc: atttypid=20
 NOTICE:  CopyFromStart: natts=3
 NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromOneRow
+ERROR:  invalid value: "6"
+CONTEXT:  COPY test, line 2, column a: "6"
+-- 987 is accepted.
+-- 654 is a soft error because ON_ERROR is ignore.
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore);
+NOTICE:  test_copy_format: is_from=true
+NOTICE:  CopyFromInFunc: atttypid=21
+NOTICE:  CopyFromInFunc: atttypid=23
+NOTICE:  CopyFromInFunc: atttypid=20
+NOTICE:  CopyFromStart: natts=3
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromOneRow
+NOTICE:  1 row was skipped due to data type incompatibility
 NOTICE:  CopyFromEnd
+-- 987 is accepted.
+-- 654 is a soft error because ON_ERROR is ignore.
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore, LOG_VERBOSITY verbose);
+NOTICE:  test_copy_format: is_from=true
+NOTICE:  CopyFromInFunc: atttypid=21
+NOTICE:  CopyFromInFunc: atttypid=23
+NOTICE:  CopyFromInFunc: atttypid=20
+NOTICE:  CopyFromStart: natts=3
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromOneRow
+NOTICE:  skipping row due to data type incompatibility at line 2 for column "a": "6"
+NOTICE:  CopyFromOneRow
+NOTICE:  1 row was skipped due to data type incompatibility
+NOTICE:  CopyFromEnd
+-- 987 is accepted.
+-- 654 is a soft error because ON_ERROR is ignore.
+-- 321 is a hard error.
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore);
+NOTICE:  test_copy_format: is_from=true
+NOTICE:  CopyFromInFunc: atttypid=21
+NOTICE:  CopyFromInFunc: atttypid=23
+NOTICE:  CopyFromInFunc: atttypid=20
+NOTICE:  CopyFromStart: natts=3
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromOneRow
+ERROR:  too much lines: 3
+CONTEXT:  COPY test, line 3
 COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
 NOTICE:  test_copy_format: is_from=false
 NOTICE:  CopyToOutFunc: atttypid=21
@@ -18,4 +63,6 @@ NOTICE:  CopyToStart: natts=3
 NOTICE:  CopyToOneRow: tts_nvalid=3
 NOTICE:  CopyToOneRow: tts_nvalid=3
 NOTICE:  CopyToOneRow: tts_nvalid=3
+NOTICE:  CopyToOneRow: tts_nvalid=3
+NOTICE:  CopyToOneRow: tts_nvalid=3
 NOTICE:  CopyToEnd
diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql
b/src/test/modules/test_copy_format/sql/test_copy_format.sql
index 0dfdfa00080..86db71bce7f 100644
--- a/src/test/modules/test_copy_format/sql/test_copy_format.sql
+++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql
@@ -1,6 +1,30 @@
 CREATE EXTENSION test_copy_format;
 CREATE TABLE public.test (a smallint, b integer, c bigint);
 INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+-- 987 is accepted.
+-- 654 is a hard error because ON_ERROR is stop by default.
 COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
+987
+654
+\.
+-- 987 is accepted.
+-- 654 is a soft error because ON_ERROR is ignore.
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore);
+987
+654
+\.
+-- 987 is accepted.
+-- 654 is a soft error because ON_ERROR is ignore.
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore, LOG_VERBOSITY verbose);
+987
+654
+\.
+-- 987 is accepted.
+-- 654 is a soft error because ON_ERROR is ignore.
+-- 321 is a hard error.
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore);
+987
+654
+321
 \.
 COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
diff --git a/src/test/modules/test_copy_format/test_copy_format.c
b/src/test/modules/test_copy_format/test_copy_format.c
index f6b105659ab..b766d3c96ff 100644
--- a/src/test/modules/test_copy_format/test_copy_format.c
+++ b/src/test/modules/test_copy_format/test_copy_format.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 
 #include "commands/copyapi.h"
+#include "commands/copyfrom_internal.h"
 #include "commands/defrem.h"
 
 PG_MODULE_MAGIC;
@@ -32,10 +33,88 @@ CopyFromStart(CopyFromState cstate, TupleDesc tupDesc)
 }
 
 static bool
-CopyFromOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls)
+CopyFromOneRow(CopyFromState cstate, ExprContext *econtext,
+               Datum *values, bool *nulls)
 {
+    int            n_attributes = list_length(cstate->attnumlist);
+    char       *line;
+    int            line_size = n_attributes + 1;    /* +1 is for new line */
+    int            read_bytes;
+
     ereport(NOTICE, (errmsg("CopyFromOneRow")));
-    return false;
+
+    cstate->cur_lineno++;
+    line = palloc(line_size);
+    read_bytes = CopyFromStateGetData(cstate, line, line_size, line_size);
+    if (read_bytes == 0)
+        return false;
+    if (read_bytes != line_size)
+        ereport(ERROR,
+                (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                 errmsg("one line must be %d bytes: %d",
+                        line_size, read_bytes)));
+
+    if (cstate->cur_lineno == 1)
+    {
+        /* Success */
+        TupleDesc    tupDesc = RelationGetDescr(cstate->rel);
+        ListCell   *cur;
+        int            i = 0;
+
+        foreach(cur, cstate->attnumlist)
+        {
+            int            attnum = lfirst_int(cur);
+            int            m = attnum - 1;
+            Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+            if (att->atttypid == INT2OID)
+            {
+                values[i] = Int16GetDatum(line[i] - '0');
+            }
+            else if (att->atttypid == INT4OID)
+            {
+                values[i] = Int32GetDatum(line[i] - '0');
+            }
+            else if (att->atttypid == INT8OID)
+            {
+                values[i] = Int64GetDatum(line[i] - '0');
+            }
+            nulls[i] = false;
+            i++;
+        }
+    }
+    else if (cstate->cur_lineno == 2)
+    {
+        /* Soft error */
+        TupleDesc    tupDesc = RelationGetDescr(cstate->rel);
+        int            attnum = lfirst_int(list_head(cstate->attnumlist));
+        int            m = attnum - 1;
+        Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+        char        value[2];
+
+        cstate->cur_attname = NameStr(att->attname);
+        value[0] = line[0];
+        value[1] = '\0';
+        cstate->cur_attval = value;
+        errsave((Node *) cstate->escontext,
+                (
+                 errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+                 errmsg("invalid value: \"%c\"", line[0])));
+        CopyFromSkipErrorRow(cstate);
+        cstate->cur_attname = NULL;
+        cstate->cur_attval = NULL;
+        return true;
+    }
+    else
+    {
+        /* Hard error */
+        ereport(ERROR,
+                (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                 errmsg("too much lines: %llu",
+                        (unsigned long long) cstate->cur_lineno)));
+    }
+
+    return true;
 }
 
 static void
-- 
2.47.1
			
		On Fri, Feb 7, 2025 at 5:01 AM Sutou Kouhei <kou@clear-code.com> wrote: > > Hi, > > In <CAD21AoBkDE4JwjPgcLxSEwqu3nN4VXjkYS9vpRQDwA2GwNQCsg@mail.gmail.com> > "Re: Make COPY format extendable: Extract COPY TO format implementations" on Tue, 4 Feb 2025 22:20:51 -0800, > Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > >> I was just looking at bit at this series of patch labelled with v31, > >> to see what is happening here. > >> > >> In 0001, we have that: > >> > >> + /* format-specific routines */ > >> + const CopyToRoutine *routine; > >> [...] > >> - CopySendEndOfRow(cstate); > >> + cstate->routine->CopyToOneRow(cstate, slot); > >> > >> Having a callback where the copy state is processed once per row is > >> neat in terms of design for the callbacks and what extensions can do, > >> and this is much better than what 2889fd23be5 has attempted (later > >> reverted in 1aa8324b81fa) because we don't do indirect function calls > >> for each attribute. Still, I have a question here: what happens for a > >> COPY TO that involves one attribute, a short field size like an int2 > >> and many rows (the more rows the more pronounced the effect, of > >> course)? Could this level of indirection still be the cause of some > >> regressions in a case like that? This is the worst case I can think > >> about, on top of my mind, and I am not seeing tests with few > >> attributes like this one, where we would try to make this callback as > >> hot as possible. This is a performance-sensitive area. > > > > FYI when Sutou-san last measured the performance[1], it showed a > > slight speed up even with fewer columns (5 columns) in both COPY TO > > and COPY FROM cases. The callback design has not changed since then. > > But it would be a good idea to run the benchmark with a table having a > > single small size column. > > > > [1] https://www.postgresql.org/message-id/20241114.161948.1677325020727842666.kou%40clear-code.com > > I measured v31 patch set with 1,6,11,16,21,26,31 int2 > columns. See the attached PDF for 0001 and 0002 result. > > 0001 - to: > > It's faster than master when the number of rows are > 1,000,000-5,000,000. > > It's almost same as master when the number of rows are > 6,000,000-10,000,000. > > There is no significant slow down when the number of columns > is 1. > > 0001 - from: > > 0001 doesn't change COPY FROM code. So the differences are > not real difference. > > 0002 - to: > > 0002 doesn't change COPY TO code. So "0001 - to" and "0002 - > to" must be the same result. But 0002 is faster than master > for all cases. It shows that the CopyToOneRow() approach > doesn't have significant slow down. > > 0002 - from: > > 0002 changes COPY FROM code. So this may have performance > impact. > > It's almost same as master when data is smaller > ((1,000,000-2,000,000 rows) or (3,000,000 rows and 1,6,11,16 > columns)). > > It's faster than master when data is larger. > > There is no significant slow down by 0002. > Thank you for sharing the benchmark results. That looks good to me. Looking at the 0001 patch again, I have a question: we have CopyToTextLikeOneRow() for both CSV and text format: +/* Implementation of the per-row callback for text format */ +static void +CopyToTextOneRow(CopyToState cstate, TupleTableSlot *slot) +{ + CopyToTextLikeOneRow(cstate, slot, false); +} + +/* Implementation of the per-row callback for CSV format */ +static void +CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot) +{ + CopyToTextLikeOneRow(cstate, slot, true); +} These two functions pass different is_csv value to that function, which is used as follows: + if (is_csv) + CopyAttributeOutCSV(cstate, string, + cstate->opts.force_quote_flags[attnum - 1]); + else + CopyAttributeOutText(cstate, string); However, we can know whether the format is CSV or text by checking cstate->opts.csv_mode instead of passing is_csv. That way, we can directly call CopyToTextLikeOneRow() but not via CopyToCSVOneRow() or CopyToTextOneRow(). It would not help performance since we already inline CopyToTextLikeOneRow(), but it looks simpler. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Hi,
In <CAD21AoAni3cKToPfdShTsc0NmaJOtbJuUb=skyz3Udj7HZY7dA@mail.gmail.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Thu, 20 Feb 2025 15:28:26 -0800,
  Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> Looking at the 0001 patch again, I have a question: we have
> CopyToTextLikeOneRow() for both CSV and text format:
> 
> +/* Implementation of the per-row callback for text format */
> +static void
> +CopyToTextOneRow(CopyToState cstate, TupleTableSlot *slot)
> +{
> +       CopyToTextLikeOneRow(cstate, slot, false);
> +}
> +
> +/* Implementation of the per-row callback for CSV format */
> +static void
> +CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot)
> +{
> +       CopyToTextLikeOneRow(cstate, slot, true);
> +}
> 
> These two functions pass different is_csv value to that function,
> which is used as follows:
> 
> +                       if (is_csv)
> +                               CopyAttributeOutCSV(cstate, string,
> +
>  cstate->opts.force_quote_flags[attnum - 1]);
> +                       else
> +                               CopyAttributeOutText(cstate, string);
> 
> However, we can know whether the format is CSV or text by checking
> cstate->opts.csv_mode instead of passing is_csv. That way, we can
> directly call CopyToTextLikeOneRow() but not via CopyToCSVOneRow() or
> CopyToTextOneRow(). It would not help performance since we already
> inline CopyToTextLikeOneRow(), but it looks simpler.
This means the following, right?
1. We remove CopyToTextOneRow() and CopyToCSVOneRow()
2. We remove "bool is_csv" parameter from CopyToTextLikeOneRow()
   and use cstate->opts.csv_mode in CopyToTextLikeOneRow()
   instead of is_csv
3. We use CopyToTextLikeOneRow() for
   CopyToRoutineText::CopyToOneRow and
   CopyToRoutineCSV::CopyToOneRow
If we use this approach, we can't remove the following
branch in compile time:
+            if (is_csv)
+                CopyAttributeOutCSV(cstate, string,
+                                    cstate->opts.force_quote_flags[attnum - 1]);
+            else
+                CopyAttributeOutText(cstate, string);
We can remove the branch in compile time with the current
approach (constant argument + inline).
It may have a negative performance impact because the "if"
is used many times with large data. (That's why we choose
the constant argument + inline approach in this thread.)
Thanks,
-- 
kou
			
		On Thu, Feb 20, 2025 at 6:48 PM Sutou Kouhei <kou@clear-code.com> wrote:
>
> Hi,
>
> In <CAD21AoAni3cKToPfdShTsc0NmaJOtbJuUb=skyz3Udj7HZY7dA@mail.gmail.com>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Thu, 20 Feb 2025 15:28:26 -0800,
>   Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> > Looking at the 0001 patch again, I have a question: we have
> > CopyToTextLikeOneRow() for both CSV and text format:
> >
> > +/* Implementation of the per-row callback for text format */
> > +static void
> > +CopyToTextOneRow(CopyToState cstate, TupleTableSlot *slot)
> > +{
> > +       CopyToTextLikeOneRow(cstate, slot, false);
> > +}
> > +
> > +/* Implementation of the per-row callback for CSV format */
> > +static void
> > +CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot)
> > +{
> > +       CopyToTextLikeOneRow(cstate, slot, true);
> > +}
> >
> > These two functions pass different is_csv value to that function,
> > which is used as follows:
> >
> > +                       if (is_csv)
> > +                               CopyAttributeOutCSV(cstate, string,
> > +
> >  cstate->opts.force_quote_flags[attnum - 1]);
> > +                       else
> > +                               CopyAttributeOutText(cstate, string);
> >
> > However, we can know whether the format is CSV or text by checking
> > cstate->opts.csv_mode instead of passing is_csv. That way, we can
> > directly call CopyToTextLikeOneRow() but not via CopyToCSVOneRow() or
> > CopyToTextOneRow(). It would not help performance since we already
> > inline CopyToTextLikeOneRow(), but it looks simpler.
>
> This means the following, right?
>
> 1. We remove CopyToTextOneRow() and CopyToCSVOneRow()
> 2. We remove "bool is_csv" parameter from CopyToTextLikeOneRow()
>    and use cstate->opts.csv_mode in CopyToTextLikeOneRow()
>    instead of is_csv
> 3. We use CopyToTextLikeOneRow() for
>    CopyToRoutineText::CopyToOneRow and
>    CopyToRoutineCSV::CopyToOneRow
>
> If we use this approach, we can't remove the following
> branch in compile time:
>
> +                       if (is_csv)
> +                               CopyAttributeOutCSV(cstate, string,
> +                                                                       cstate->opts.force_quote_flags[attnum - 1]);
> +                       else
> +                               CopyAttributeOutText(cstate, string);
>
> We can remove the branch in compile time with the current
> approach (constant argument + inline).
>
> It may have a negative performance impact because the "if"
> is used many times with large data. (That's why we choose
> the constant argument + inline approach in this thread.)
>
Thank you for the explanation, I missed that fact. I'm fine with having is_csv.
The first two patches are refactoring patches (+ small performance
improvements). I've reviewed these patches again and attached the
updated patches. I reorganized the function order and updated comments
etc. I find that these patches are reasonably ready to push. Could you
review these versions? I'm going to push them, barring objections and
further comments.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
			
		Вложения
Hi, In <CAD21AoBjzkL2Lv7j4teaHBZvNmKctQtH6X71kN_sj6Fm-+VvJQ@mail.gmail.com> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Tue, 25 Feb 2025 14:05:28 -0800, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > The first two patches are refactoring patches (+ small performance > improvements). I've reviewed these patches again and attached the > updated patches. I reorganized the function order and updated comments > etc. I find that these patches are reasonably ready to push. Could you > review these versions? I'm going to push them, barring objections and > further comments. Sure. Here are some minor comments: 0001: Commit message: > or CSV mode. The performance benchmark results showed ~5% performance > gain intext or CSV mode. intext -> in text > --- a/src/backend/commands/copyto.c > +++ b/src/backend/commands/copyto.c > @@ -20,6 +20,7 @@ > #include "commands/copy.h" > +#include "commands/copyapi.h" We can remove '#include "commands/copy.h"' because it's included in copyapi.h. (0002 does it.) > @@ -254,6 +502,35 @@ CopySendEndOfRow(CopyToState cstate) > +/* > + * Wrapper function of CopySendEndOfRow for text and CSV formats. Sends the > + * the line termination and do common appropriate things for the end of row. > + */ Sends the the line -> Sends the line > --- /dev/null > +++ b/src/include/commands/copyapi.h > + /* End a COPY TO. This callback is called once at the end of COPY FROM */ The last "." is missing: ... COPY FROM. 0002: Commit message: > This change is a preliminary step towards making the COPY TO command > extensible in terms of output formats. COPY TO -> COPY FROM > --- a/src/backend/commands/copyfromparse.c > +++ b/src/backend/commands/copyfromparse.c > @@ -1087,7 +1132,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext, > static bool > -CopyReadLine(CopyFromState cstate) > +CopyReadLine(CopyFromState cstate, bool is_csv) > @@ -1163,7 +1208,7 @@ CopyReadLine(CopyFromState cstate) > static bool > -CopyReadLineText(CopyFromState cstate) > +CopyReadLineText(CopyFromState cstate, bool is_csv) We may want to add a comment why we don't use "inline" nor "pg_attribute_always_inline" here: https://www.postgresql.org/message-id/CAD21AoBNfKDbJnu-zONNpG820ZXYC0fuTSLrJ-UdRqU4qp2wog%40mail.gmail.com > Yes, I'm not sure it's really necessary to make it inline since the > benchmark results don't show much difference. Probably this is because > the function has 'is_csv' in some 'if' branches but the compiler > cannot optimize out the whole 'if' branches as most 'if' branches > check 'is_csv' and other variables. Or we can add "inline" not "pg_attribute_always_inline" here as a hint for compiler. > --- a/src/include/commands/copyapi.h > +++ b/src/include/commands/copyapi.h > @@ -52,4 +52,50 @@ typedef struct CopyToRoutine > + /* End a COPY FROM. This callback is called once at the end of COPY FROM */ The last "." is missing: ... COPY FROM. I think that these patches are ready to push too. Thanks, -- kou
On Tue, Feb 25, 2025 at 3:52 PM Sutou Kouhei <kou@clear-code.com> wrote: > > Thank you for reviewing the patches. I've addressed comments except for the following comment: > > --- a/src/backend/commands/copyfromparse.c > > +++ b/src/backend/commands/copyfromparse.c > > > @@ -1087,7 +1132,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext, > > > static bool > > -CopyReadLine(CopyFromState cstate) > > +CopyReadLine(CopyFromState cstate, bool is_csv) > > > @@ -1163,7 +1208,7 @@ CopyReadLine(CopyFromState cstate) > > > static bool > > -CopyReadLineText(CopyFromState cstate) > > +CopyReadLineText(CopyFromState cstate, bool is_csv) > > We may want to add a comment why we don't use "inline" nor > "pg_attribute_always_inline" here: > > https://www.postgresql.org/message-id/CAD21AoBNfKDbJnu-zONNpG820ZXYC0fuTSLrJ-UdRqU4qp2wog%40mail.gmail.com > > > Yes, I'm not sure it's really necessary to make it inline since the > > benchmark results don't show much difference. Probably this is because > > the function has 'is_csv' in some 'if' branches but the compiler > > cannot optimize out the whole 'if' branches as most 'if' branches > > check 'is_csv' and other variables. > > Or we can add "inline" not "pg_attribute_always_inline" here > as a hint for compiler. I think we should not add inline unless we see a performance improvement. Also, I find that it would be independent with this refactoring so we can add it later if needed. I've attached updated patches. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Вложения
Hi, In <CAD21AoB3TiyuCcu02itGktUE6L4YGqwWT_LRtYrFkW7xedoe+g@mail.gmail.com> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Tue, 25 Feb 2025 17:14:43 -0800, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > I've attached updated patches. Thanks. I found one more missing last ".": 0002: > --- a/src/backend/commands/copyfrom.c > +++ b/src/backend/commands/copyfrom.c > @@ -106,6 +106,145 @@ typedef struct CopyMultiInsertInfo > +/* > + * Built-in format-specific routines. One-row callbacks are defined in > + * copyfromparse.c > + */ copyfromparse.c -> copyfromparse.c. Could you push them? Thanks, -- kou
On Tue, Feb 25, 2025 at 6:08 PM Sutou Kouhei <kou@clear-code.com> wrote:
>
> Hi,
>
> In <CAD21AoB3TiyuCcu02itGktUE6L4YGqwWT_LRtYrFkW7xedoe+g@mail.gmail.com>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Tue, 25 Feb 2025 17:14:43 -0800,
>   Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> > I've attached updated patches.
>
> Thanks.
>
> I found one more missing last ".":
>
> 0002:
>
> > --- a/src/backend/commands/copyfrom.c
> > +++ b/src/backend/commands/copyfrom.c
>
> > @@ -106,6 +106,145 @@ typedef struct CopyMultiInsertInfo
>
> > +/*
> > + * Built-in format-specific routines. One-row callbacks are defined in
> > + * copyfromparse.c
> > + */
>
> copyfromparse.c -> copyfromparse.c.
>
>
> Could you push them?
>
Pushed the 0001 patch.
Regarding the 0002 patch, I realized we stopped exposing
NextCopyFromRawFields() function:
 --- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -107,8 +107,6 @@ extern CopyFromState BeginCopyFrom(ParseState
*pstate, Relation rel, Node *where
 extern void EndCopyFrom(CopyFromState cstate);
 extern bool NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
                         Datum *values, bool *nulls);
-extern bool NextCopyFromRawFields(CopyFromState cstate,
-                                 char ***fields, int *nfields);
I think that this change is not relevant with the refactoring and
probably we should keep it exposed as extension might be using it.
Considering that we added pg_attribute_always_inline to the function,
does it work even if we omit pg_attribute_always_inline to its
function declaration in the copy.h file?
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
			
		Hi,
In <CAD21AoDABLkUTTOwWa1he6gbc=nM46COMu-BvWjc_i6USnNbHw@mail.gmail.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Thu, 27 Feb 2025 15:24:26 -0800,
  Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> Pushed the 0001 patch.
Thanks!
> Regarding the 0002 patch, I realized we stopped exposing
> NextCopyFromRawFields() function:
> 
>  --- a/src/include/commands/copy.h
> +++ b/src/include/commands/copy.h
> @@ -107,8 +107,6 @@ extern CopyFromState BeginCopyFrom(ParseState
> *pstate, Relation rel, Node *where
>  extern void EndCopyFrom(CopyFromState cstate);
>  extern bool NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
>                          Datum *values, bool *nulls);
> -extern bool NextCopyFromRawFields(CopyFromState cstate,
> -                                 char ***fields, int *nfields);
> 
> I think that this change is not relevant with the refactoring and
> probably we should keep it exposed as extension might be using it.
> Considering that we added pg_attribute_always_inline to the function,
> does it work even if we omit pg_attribute_always_inline to its
> function declaration in the copy.h file?
Unfortunately, no. The inline + constant argument
optimization requires "static".
How about the following?
static pg_attribute_always_inline bool
NextCopyFromRawFieldsInternal(CopyFromState cstate, char ***fields, int *nfields, bool is_csv)
{
    ...
}
bool
NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
{
    return NextCopyFromRawFieldsInternal(cstate, fields, nfields, cstate->opts.csv_mode);
}
Thanks,
-- 
kou
			
		On Thu, Feb 27, 2025 at 7:57 PM Sutou Kouhei <kou@clear-code.com> wrote:
>
> Hi,
>
> In <CAD21AoDABLkUTTOwWa1he6gbc=nM46COMu-BvWjc_i6USnNbHw@mail.gmail.com>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Thu, 27 Feb 2025 15:24:26 -0800,
>   Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> > Pushed the 0001 patch.
>
> Thanks!
>
> > Regarding the 0002 patch, I realized we stopped exposing
> > NextCopyFromRawFields() function:
> >
> >  --- a/src/include/commands/copy.h
> > +++ b/src/include/commands/copy.h
> > @@ -107,8 +107,6 @@ extern CopyFromState BeginCopyFrom(ParseState
> > *pstate, Relation rel, Node *where
> >  extern void EndCopyFrom(CopyFromState cstate);
> >  extern bool NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
> >                          Datum *values, bool *nulls);
> > -extern bool NextCopyFromRawFields(CopyFromState cstate,
> > -                                 char ***fields, int *nfields);
> >
> > I think that this change is not relevant with the refactoring and
> > probably we should keep it exposed as extension might be using it.
> > Considering that we added pg_attribute_always_inline to the function,
> > does it work even if we omit pg_attribute_always_inline to its
> > function declaration in the copy.h file?
>
> Unfortunately, no. The inline + constant argument
> optimization requires "static".
>
> How about the following?
>
> static pg_attribute_always_inline bool
> NextCopyFromRawFieldsInternal(CopyFromState cstate, char ***fields, int *nfields, bool is_csv)
> {
>     ...
> }
>
> bool
> NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
> {
>     return NextCopyFromRawFieldsInternal(cstate, fields, nfields, cstate->opts.csv_mode);
> }
>
Thank you for the confirmation.
I initially thought it would be acceptable to stop
NextCopyFromRawFields exposed since NextCopyFrom() could serve as an
alternative. For example, the NextCopyFromRawFields() function was
originally exposed in commit 8ddc05fb01ee2c primarily to support
extension modules like file_fdw but file_fdw wasn't utilizing this
API. I pushed the patch without the above change. Unfortunately, this
commit subsequently broke file_text_array_fdw[1] and made BF animal
crake unhappy[2].
Upon examining file_text_array_fdw more closely, I realized that
NextCopyFrom() may not be a suitable replacement for
NextCopyFromRawFields() in certain scenarios. Specifically,
NextCopyFrom() assumes that the caller has prior knowledge of the
source data's column count, making it inadequate for cases where
extensions like file_text_array_fdw need to construct an array of
source data with an unknown number of columns. In such situations,
NextCopyFromRawFields() proves to be more practical. Given these
considerations, I'm now leaning towards implementing the proposed
change. Thoughts?
Regards,
[1] https://github.com/adunstan/file_text_array_fdw
[2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2025-02-28%2018%3A47%3A02
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
			
		Hi, In <CAD21AoDr13=dx+k8gmQnR5_bY+NskyN4mbSWN0KhQncL6xuPMA@mail.gmail.com> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 28 Feb 2025 11:50:39 -0800, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > I initially thought it would be acceptable to stop > NextCopyFromRawFields exposed since NextCopyFrom() could serve as an > alternative. For example, the NextCopyFromRawFields() function was > originally exposed in commit 8ddc05fb01ee2c primarily to support > extension modules like file_fdw but file_fdw wasn't utilizing this > API. I pushed the patch without the above change. Unfortunately, this > commit subsequently broke file_text_array_fdw[1] and made BF animal > crake unhappy[2]. > > [1] https://github.com/adunstan/file_text_array_fdw > [2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2025-02-28%2018%3A47%3A02 Thanks for the try! > Upon examining file_text_array_fdw more closely, I realized that > NextCopyFrom() may not be a suitable replacement for > NextCopyFromRawFields() in certain scenarios. Specifically, > NextCopyFrom() assumes that the caller has prior knowledge of the > source data's column count, making it inadequate for cases where > extensions like file_text_array_fdw need to construct an array of > source data with an unknown number of columns. In such situations, > NextCopyFromRawFields() proves to be more practical. Given these > considerations, I'm now leaning towards implementing the proposed > change. Thoughts? You suggest that we re-export NextCopyFromRawFields() (as a wrapper of static inline version) for backward compatibility, right? It makes sense. We should keep backward compatibility because there is a use-case of NextCopyFromRawFields(). Thanks, -- kou
On Fri, Feb 28, 2025 at 1:58 PM Sutou Kouhei <kou@clear-code.com> wrote: > > Hi, > > In <CAD21AoDr13=dx+k8gmQnR5_bY+NskyN4mbSWN0KhQncL6xuPMA@mail.gmail.com> > "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 28 Feb 2025 11:50:39 -0800, > Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > I initially thought it would be acceptable to stop > > NextCopyFromRawFields exposed since NextCopyFrom() could serve as an > > alternative. For example, the NextCopyFromRawFields() function was > > originally exposed in commit 8ddc05fb01ee2c primarily to support > > extension modules like file_fdw but file_fdw wasn't utilizing this > > API. I pushed the patch without the above change. Unfortunately, this > > commit subsequently broke file_text_array_fdw[1] and made BF animal > > crake unhappy[2]. > > > > [1] https://github.com/adunstan/file_text_array_fdw > > [2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2025-02-28%2018%3A47%3A02 > > Thanks for the try! > > > Upon examining file_text_array_fdw more closely, I realized that > > NextCopyFrom() may not be a suitable replacement for > > NextCopyFromRawFields() in certain scenarios. Specifically, > > NextCopyFrom() assumes that the caller has prior knowledge of the > > source data's column count, making it inadequate for cases where > > extensions like file_text_array_fdw need to construct an array of > > source data with an unknown number of columns. In such situations, > > NextCopyFromRawFields() proves to be more practical. Given these > > considerations, I'm now leaning towards implementing the proposed > > change. Thoughts? > > You suggest that we re-export NextCopyFromRawFields() (as a > wrapper of static inline version) for backward > compatibility, right? It makes sense. We should keep > backward compatibility because there is a use-case of > NextCopyFromRawFields(). Yes, I've submitted the patch to re-export that function[1]. Could you review it? Regards, [1] https://www.postgresql.org/message-id/CAD21AoBA414Q76LthY65NJfWbjOxXn1bdFFsD_NBhT2wPUS1SQ%40mail.gmail.com -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Hi, In <CAD21AoB=FBiUB-ER7dmyE-QBBytUxqmv-sgbeP0DKTvYKXsOEA@mail.gmail.com> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 28 Feb 2025 14:00:18 -0800, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > Yes, I've submitted the patch to re-export that function[1]. Could you > review it? > > [1] https://www.postgresql.org/message-id/CAD21AoBA414Q76LthY65NJfWbjOxXn1bdFFsD_NBhT2wPUS1SQ%40mail.gmail.com Sure! Done! https://www.postgresql.org/message-id/20250301.071641.1257013931056303227.kou%40clear-code.com Thanks, -- kou
Hi, Our 0001/0002 patches were merged into master. I've rebased on master. Can we discuss how to proceed rest patches? The contents of them aren't changed but I'll show a summary of them again: 0001-0003 are for COPY TO and 0004-0007 are for COPY FROM. For COPY TO: 0001: Add support for adding custom COPY TO format. This uses tablesample like handler approach. We've discussed other approaches such as USING+CREATE XXX approach but it seems that other approaches are overkill for this case. See also: https://www.postgresql.org/message-id/flat/d838025aceeb19c9ff1db702fa55cabf%40postgrespro.ru#caca2799effc859f82f40ee8bec531d8 0002: Export CopyToStateData to implement custom COPY TO format as extension. 0003: Export a function and add a private space to CopyToStateData to implement custom COPY TO format as extension. We may want to squash 0002 and 0003 but splitting them will be easy to review. Because 0002 just moves existing codes (with some rename) and 0003 just adds some codes. If we squash 0002 and 0003, moving and adding are mixed. For COPY FROM: 0004: This is COPY FROM version of 0001. 0005: 0002 has COPY_ prefix -> COPY_DEST_ prefix change for enum CopyDest. This is similar change for enum CopySource. 0006: This is COPY FROM version of 0003. 0007: This is for easy to implement "ON_ERROR stop" and "LOG_VERBOSITY verbose" in extension. We may want to squash 0005-0007 like for 0002-0003. Thanks, -- kou From 5ccc5d1a54d0f6c7c47381533c879a9432fb925f Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Mon, 25 Nov 2024 12:19:15 +0900 Subject: [PATCH v35 1/7] Add support for adding custom COPY TO format This uses the handler approach like tablesample. The approach creates an internal function that returns an internal struct. In this case, a COPY TO handler returns a CopyToRoutine. This also add a test module for custom COPY TO handler. --- src/backend/commands/copy.c | 79 ++++++++++++++++--- src/backend/commands/copyto.c | 36 +++++++-- src/backend/nodes/Makefile | 1 + src/backend/nodes/gen_node_support.pl | 2 + src/backend/utils/adt/pseudotypes.c | 1 + src/include/catalog/pg_proc.dat | 6 ++ src/include/catalog/pg_type.dat | 6 ++ src/include/commands/copy.h | 1 + src/include/commands/copyapi.h | 2 + src/include/nodes/meson.build | 1 + src/test/modules/Makefile | 1 + src/test/modules/meson.build | 1 + src/test/modules/test_copy_format/.gitignore | 4 + src/test/modules/test_copy_format/Makefile | 23 ++++++ .../expected/test_copy_format.out | 17 ++++ src/test/modules/test_copy_format/meson.build | 33 ++++++++ .../test_copy_format/sql/test_copy_format.sql | 5 ++ .../test_copy_format--1.0.sql | 8 ++ .../test_copy_format/test_copy_format.c | 63 +++++++++++++++ .../test_copy_format/test_copy_format.control | 4 + 20 files changed, 273 insertions(+), 21 deletions(-) mode change 100644 => 100755 src/backend/nodes/gen_node_support.pl create mode 100644 src/test/modules/test_copy_format/.gitignore create mode 100644 src/test/modules/test_copy_format/Makefile create mode 100644 src/test/modules/test_copy_format/expected/test_copy_format.out create mode 100644 src/test/modules/test_copy_format/meson.build create mode 100644 src/test/modules/test_copy_format/sql/test_copy_format.sql create mode 100644 src/test/modules/test_copy_format/test_copy_format--1.0.sql create mode 100644 src/test/modules/test_copy_format/test_copy_format.c create mode 100644 src/test/modules/test_copy_format/test_copy_format.control diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index cfca9d9dc29..8d94bc313eb 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -32,6 +32,7 @@ #include "parser/parse_coerce.h" #include "parser/parse_collate.h" #include "parser/parse_expr.h" +#include "parser/parse_func.h" #include "parser/parse_relation.h" #include "utils/acl.h" #include "utils/builtins.h" @@ -476,6 +477,70 @@ defGetCopyLogVerbosityChoice(DefElem *def, ParseState *pstate) return COPY_LOG_VERBOSITY_DEFAULT; /* keep compiler quiet */ } +/* + * Process the "format" option. + * + * This function checks whether the option value is a built-in format such as + * "text" and "csv" or not. If the option value isn't a built-in format, this + * function finds a COPY format handler that returns a CopyToRoutine (for + * is_from == false). If no COPY format handler is found, this function + * reports an error. + */ +static void +ProcessCopyOptionFormat(ParseState *pstate, + CopyFormatOptions *opts_out, + bool is_from, + DefElem *defel) +{ + char *format; + Oid funcargtypes[1]; + Oid handlerOid = InvalidOid; + + format = defGetString(defel); + + opts_out->csv_mode = false; + opts_out->binary = false; + /* built-in formats */ + if (strcmp(format, "text") == 0) + { + /* "csv_mode == false && binary == false" means "text" */ + return; + } + else if (strcmp(format, "csv") == 0) + { + opts_out->csv_mode = true; + return; + } + else if (strcmp(format, "binary") == 0) + { + opts_out->binary = true; + return; + } + + /* custom format */ + if (!is_from) + { + funcargtypes[0] = INTERNALOID; + handlerOid = LookupFuncName(list_make1(makeString(format)), 1, + funcargtypes, true); + } + if (!OidIsValid(handlerOid)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("COPY format \"%s\" not recognized", format), + parser_errposition(pstate, defel->location))); + + /* check that handler has correct return type */ + if (get_func_rettype(handlerOid) != COPY_HANDLEROID) + ereport(ERROR, + (errcode(ERRCODE_WRONG_OBJECT_TYPE), + errmsg("function %s must return type %s", + format, "copy_handler"), + parser_errposition(pstate, defel->location))); + + opts_out->handler = handlerOid; +} + /* * Process the statement option list for COPY. * @@ -519,22 +584,10 @@ ProcessCopyOptions(ParseState *pstate, if (strcmp(defel->defname, "format") == 0) { - char *fmt = defGetString(defel); - if (format_specified) errorConflictingDefElem(defel, pstate); format_specified = true; - if (strcmp(fmt, "text") == 0) - /* default format */ ; - else if (strcmp(fmt, "csv") == 0) - opts_out->csv_mode = true; - else if (strcmp(fmt, "binary") == 0) - opts_out->binary = true; - else - ereport(ERROR, - (errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("COPY format \"%s\" not recognized", fmt), - parser_errposition(pstate, defel->location))); + ProcessCopyOptionFormat(pstate, opts_out, is_from, defel); } else if (strcmp(defel->defname, "freeze") == 0) { diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c index 721d29f8e53..0d33d101735 100644 --- a/src/backend/commands/copyto.c +++ b/src/backend/commands/copyto.c @@ -150,6 +150,7 @@ static void CopySendInt16(CopyToState cstate, int16 val); /* text format */ static const CopyToRoutine CopyToRoutineText = { + .type = T_CopyToRoutine, .CopyToStart = CopyToTextLikeStart, .CopyToOutFunc = CopyToTextLikeOutFunc, .CopyToOneRow = CopyToTextOneRow, @@ -158,6 +159,7 @@ static const CopyToRoutine CopyToRoutineText = { /* CSV format */ static const CopyToRoutine CopyToRoutineCSV = { + .type = T_CopyToRoutine, .CopyToStart = CopyToTextLikeStart, .CopyToOutFunc = CopyToTextLikeOutFunc, .CopyToOneRow = CopyToCSVOneRow, @@ -166,6 +168,7 @@ static const CopyToRoutine CopyToRoutineCSV = { /* binary format */ static const CopyToRoutine CopyToRoutineBinary = { + .type = T_CopyToRoutine, .CopyToStart = CopyToBinaryStart, .CopyToOutFunc = CopyToBinaryOutFunc, .CopyToOneRow = CopyToBinaryOneRow, @@ -174,15 +177,32 @@ static const CopyToRoutine CopyToRoutineBinary = { /* Return a COPY TO routine for the given options */ static const CopyToRoutine * -CopyToGetRoutine(CopyFormatOptions opts) +CopyToGetRoutine(CopyFormatOptions *opts) { - if (opts.csv_mode) - return &CopyToRoutineCSV; - else if (opts.binary) - return &CopyToRoutineBinary; + if (OidIsValid(opts->handler)) + { + Datum datum; + Node *routine; - /* default is text */ - return &CopyToRoutineText; + datum = OidFunctionCall1(opts->handler, BoolGetDatum(false)); + routine = (Node *) DatumGetPointer(datum); + if (routine == NULL || !IsA(routine, CopyToRoutine)) + ereport( + ERROR, + (errcode( + ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("COPY handler function " + "%u did not return " + "CopyToRoutine struct", + opts->handler))); + return castNode(CopyToRoutine, routine); + } + else if (opts->csv_mode) + return &CopyToRoutineCSV; + else if (opts->binary) + return &CopyToRoutineBinary; + else + return &CopyToRoutineText; } /* Implementation of the start callback for text and CSV formats */ @@ -700,7 +720,7 @@ BeginCopyTo(ParseState *pstate, ProcessCopyOptions(pstate, &cstate->opts, false /* is_from */ , options); /* Set format routine */ - cstate->routine = CopyToGetRoutine(cstate->opts); + cstate->routine = CopyToGetRoutine(&cstate->opts); /* Process the source/target relation or query */ if (rel) diff --git a/src/backend/nodes/Makefile b/src/backend/nodes/Makefile index 77ddb9ca53f..dc6c1087361 100644 --- a/src/backend/nodes/Makefile +++ b/src/backend/nodes/Makefile @@ -50,6 +50,7 @@ node_headers = \ access/sdir.h \ access/tableam.h \ access/tsmapi.h \ + commands/copyapi.h \ commands/event_trigger.h \ commands/trigger.h \ executor/tuptable.h \ diff --git a/src/backend/nodes/gen_node_support.pl b/src/backend/nodes/gen_node_support.pl old mode 100644 new mode 100755 index 1a657f7e0ae..fb90635a245 --- a/src/backend/nodes/gen_node_support.pl +++ b/src/backend/nodes/gen_node_support.pl @@ -62,6 +62,7 @@ my @all_input_files = qw( access/sdir.h access/tableam.h access/tsmapi.h + commands/copyapi.h commands/event_trigger.h commands/trigger.h executor/tuptable.h @@ -86,6 +87,7 @@ my @nodetag_only_files = qw( access/sdir.h access/tableam.h access/tsmapi.h + commands/copyapi.h commands/event_trigger.h commands/trigger.h executor/tuptable.h diff --git a/src/backend/utils/adt/pseudotypes.c b/src/backend/utils/adt/pseudotypes.c index 317a1f2b282..f2ebc21ca56 100644 --- a/src/backend/utils/adt/pseudotypes.c +++ b/src/backend/utils/adt/pseudotypes.c @@ -370,6 +370,7 @@ PSEUDOTYPE_DUMMY_IO_FUNCS(fdw_handler); PSEUDOTYPE_DUMMY_IO_FUNCS(table_am_handler); PSEUDOTYPE_DUMMY_IO_FUNCS(index_am_handler); PSEUDOTYPE_DUMMY_IO_FUNCS(tsm_handler); +PSEUDOTYPE_DUMMY_IO_FUNCS(copy_handler); PSEUDOTYPE_DUMMY_IO_FUNCS(internal); PSEUDOTYPE_DUMMY_IO_FUNCS(anyelement); PSEUDOTYPE_DUMMY_IO_FUNCS(anynonarray); diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index cd9422d0bac..9e7737168c4 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -7809,6 +7809,12 @@ { oid => '3312', descr => 'I/O', proname => 'tsm_handler_out', prorettype => 'cstring', proargtypes => 'tsm_handler', prosrc => 'tsm_handler_out' }, +{ oid => '8753', descr => 'I/O', + proname => 'copy_handler_in', proisstrict => 'f', prorettype => 'copy_handler', + proargtypes => 'cstring', prosrc => 'copy_handler_in' }, +{ oid => '8754', descr => 'I/O', + proname => 'copy_handler_out', prorettype => 'cstring', + proargtypes => 'copy_handler', prosrc => 'copy_handler_out' }, { oid => '267', descr => 'I/O', proname => 'table_am_handler_in', proisstrict => 'f', prorettype => 'table_am_handler', proargtypes => 'cstring', diff --git a/src/include/catalog/pg_type.dat b/src/include/catalog/pg_type.dat index 6dca77e0a22..340e0cd0a8d 100644 --- a/src/include/catalog/pg_type.dat +++ b/src/include/catalog/pg_type.dat @@ -633,6 +633,12 @@ typcategory => 'P', typinput => 'tsm_handler_in', typoutput => 'tsm_handler_out', typreceive => '-', typsend => '-', typalign => 'i' }, +{ oid => '8752', + descr => 'pseudo-type for the result of a copy to method function', + typname => 'copy_handler', typlen => '4', typbyval => 't', typtype => 'p', + typcategory => 'P', typinput => 'copy_handler_in', + typoutput => 'copy_handler_out', typreceive => '-', typsend => '-', + typalign => 'i' }, { oid => '269', descr => 'pseudo-type for the result of a table AM handler function', typname => 'table_am_handler', typlen => '4', typbyval => 't', typtype => 'p', diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h index 06dfdfef721..332628d67cc 100644 --- a/src/include/commands/copy.h +++ b/src/include/commands/copy.h @@ -87,6 +87,7 @@ typedef struct CopyFormatOptions CopyLogVerbosityChoice log_verbosity; /* verbosity of logged messages */ int64 reject_limit; /* maximum tolerable number of errors */ List *convert_select; /* list of column names (can be NIL) */ + Oid handler; /* handler function for custom format routine */ } CopyFormatOptions; /* These are private in commands/copy[from|to].c */ diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index 2a2d2f9876b..4f4ffabf882 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -22,6 +22,8 @@ */ typedef struct CopyToRoutine { + NodeTag type; + /* * Set output function information. This callback is called once at the * beginning of COPY TO. diff --git a/src/include/nodes/meson.build b/src/include/nodes/meson.build index d1ca24dd32f..96e70e7f38b 100644 --- a/src/include/nodes/meson.build +++ b/src/include/nodes/meson.build @@ -12,6 +12,7 @@ node_support_input_i = [ 'access/sdir.h', 'access/tableam.h', 'access/tsmapi.h', + 'commands/copyapi.h', 'commands/event_trigger.h', 'commands/trigger.h', 'executor/tuptable.h', diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile index 4e4be3fa511..c9da440eed0 100644 --- a/src/test/modules/Makefile +++ b/src/test/modules/Makefile @@ -16,6 +16,7 @@ SUBDIRS = \ spgist_name_ops \ test_bloomfilter \ test_copy_callbacks \ + test_copy_format \ test_custom_rmgrs \ test_ddl_deparse \ test_dsa \ diff --git a/src/test/modules/meson.build b/src/test/modules/meson.build index 2b057451473..d33bbbd4092 100644 --- a/src/test/modules/meson.build +++ b/src/test/modules/meson.build @@ -15,6 +15,7 @@ subdir('spgist_name_ops') subdir('ssl_passphrase_callback') subdir('test_bloomfilter') subdir('test_copy_callbacks') +subdir('test_copy_format') subdir('test_custom_rmgrs') subdir('test_ddl_deparse') subdir('test_dsa') diff --git a/src/test/modules/test_copy_format/.gitignore b/src/test/modules/test_copy_format/.gitignore new file mode 100644 index 00000000000..5dcb3ff9723 --- /dev/null +++ b/src/test/modules/test_copy_format/.gitignore @@ -0,0 +1,4 @@ +# Generated subdirectories +/log/ +/results/ +/tmp_check/ diff --git a/src/test/modules/test_copy_format/Makefile b/src/test/modules/test_copy_format/Makefile new file mode 100644 index 00000000000..8497f91624d --- /dev/null +++ b/src/test/modules/test_copy_format/Makefile @@ -0,0 +1,23 @@ +# src/test/modules/test_copy_format/Makefile + +MODULE_big = test_copy_format +OBJS = \ + $(WIN32RES) \ + test_copy_format.o +PGFILEDESC = "test_copy_format - test custom COPY FORMAT" + +EXTENSION = test_copy_format +DATA = test_copy_format--1.0.sql + +REGRESS = test_copy_format + +ifdef USE_PGXS +PG_CONFIG = pg_config +PGXS := $(shell $(PG_CONFIG) --pgxs) +include $(PGXS) +else +subdir = src/test/modules/test_copy_format +top_builddir = ../../../.. +include $(top_builddir)/src/Makefile.global +include $(top_srcdir)/contrib/contrib-global.mk +endif diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out b/src/test/modules/test_copy_format/expected/test_copy_format.out new file mode 100644 index 00000000000..adfe7d1572a --- /dev/null +++ b/src/test/modules/test_copy_format/expected/test_copy_format.out @@ -0,0 +1,17 @@ +CREATE EXTENSION test_copy_format; +CREATE TABLE public.test (a smallint, b integer, c bigint); +INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789); +COPY public.test FROM stdin WITH (FORMAT 'test_copy_format'); +ERROR: COPY format "test_copy_format" not recognized +LINE 1: COPY public.test FROM stdin WITH (FORMAT 'test_copy_format')... + ^ +COPY public.test TO stdout WITH (FORMAT 'test_copy_format'); +NOTICE: test_copy_format: is_from=false +NOTICE: CopyToOutFunc: atttypid=21 +NOTICE: CopyToOutFunc: atttypid=23 +NOTICE: CopyToOutFunc: atttypid=20 +NOTICE: CopyToStart: natts=3 +NOTICE: CopyToOneRow: tts_nvalid=3 +NOTICE: CopyToOneRow: tts_nvalid=3 +NOTICE: CopyToOneRow: tts_nvalid=3 +NOTICE: CopyToEnd diff --git a/src/test/modules/test_copy_format/meson.build b/src/test/modules/test_copy_format/meson.build new file mode 100644 index 00000000000..a45a2e0a039 --- /dev/null +++ b/src/test/modules/test_copy_format/meson.build @@ -0,0 +1,33 @@ +# Copyright (c) 2025, PostgreSQL Global Development Group + +test_copy_format_sources = files( + 'test_copy_format.c', +) + +if host_system == 'windows' + test_copy_format_sources += rc_lib_gen.process(win32ver_rc, extra_args: [ + '--NAME', 'test_copy_format', + '--FILEDESC', 'test_copy_format - test custom COPY FORMAT',]) +endif + +test_copy_format = shared_module('test_copy_format', + test_copy_format_sources, + kwargs: pg_test_mod_args, +) +test_install_libs += test_copy_format + +test_install_data += files( + 'test_copy_format.control', + 'test_copy_format--1.0.sql', +) + +tests += { + 'name': 'test_copy_format', + 'sd': meson.current_source_dir(), + 'bd': meson.current_build_dir(), + 'regress': { + 'sql': [ + 'test_copy_format', + ], + }, +} diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql b/src/test/modules/test_copy_format/sql/test_copy_format.sql new file mode 100644 index 00000000000..810b3d8cedc --- /dev/null +++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql @@ -0,0 +1,5 @@ +CREATE EXTENSION test_copy_format; +CREATE TABLE public.test (a smallint, b integer, c bigint); +INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789); +COPY public.test FROM stdin WITH (FORMAT 'test_copy_format'); +COPY public.test TO stdout WITH (FORMAT 'test_copy_format'); diff --git a/src/test/modules/test_copy_format/test_copy_format--1.0.sql b/src/test/modules/test_copy_format/test_copy_format--1.0.sql new file mode 100644 index 00000000000..d24ea03ce99 --- /dev/null +++ b/src/test/modules/test_copy_format/test_copy_format--1.0.sql @@ -0,0 +1,8 @@ +/* src/test/modules/test_copy_format/test_copy_format--1.0.sql */ + +-- complain if script is sourced in psql, rather than via CREATE EXTENSION +\echo Use "CREATE EXTENSION test_copy_format" to load this file. \quit + +CREATE FUNCTION test_copy_format(internal) + RETURNS copy_handler + AS 'MODULE_PATHNAME' LANGUAGE C; diff --git a/src/test/modules/test_copy_format/test_copy_format.c b/src/test/modules/test_copy_format/test_copy_format.c new file mode 100644 index 00000000000..b42d472d851 --- /dev/null +++ b/src/test/modules/test_copy_format/test_copy_format.c @@ -0,0 +1,63 @@ +/*-------------------------------------------------------------------------- + * + * test_copy_format.c + * Code for testing custom COPY format. + * + * Portions Copyright (c) 2025, PostgreSQL Global Development Group + * + * IDENTIFICATION + * src/test/modules/test_copy_format/test_copy_format.c + * + * ------------------------------------------------------------------------- + */ + +#include "postgres.h" + +#include "commands/copyapi.h" +#include "commands/defrem.h" + +PG_MODULE_MAGIC; + +static void +CopyToOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo) +{ + ereport(NOTICE, (errmsg("CopyToOutFunc: atttypid=%d", atttypid))); +} + +static void +CopyToStart(CopyToState cstate, TupleDesc tupDesc) +{ + ereport(NOTICE, (errmsg("CopyToStart: natts=%d", tupDesc->natts))); +} + +static void +CopyToOneRow(CopyToState cstate, TupleTableSlot *slot) +{ + ereport(NOTICE, (errmsg("CopyToOneRow: tts_nvalid=%u", slot->tts_nvalid))); +} + +static void +CopyToEnd(CopyToState cstate) +{ + ereport(NOTICE, (errmsg("CopyToEnd"))); +} + +static const CopyToRoutine CopyToRoutineTestCopyFormat = { + .type = T_CopyToRoutine, + .CopyToOutFunc = CopyToOutFunc, + .CopyToStart = CopyToStart, + .CopyToOneRow = CopyToOneRow, + .CopyToEnd = CopyToEnd, +}; + +PG_FUNCTION_INFO_V1(test_copy_format); +Datum +test_copy_format(PG_FUNCTION_ARGS) +{ + bool is_from = PG_GETARG_BOOL(0); + + ereport(NOTICE, + (errmsg("test_copy_format: is_from=%s", is_from ? "true" : "false"))); + + PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat); +} diff --git a/src/test/modules/test_copy_format/test_copy_format.control b/src/test/modules/test_copy_format/test_copy_format.control new file mode 100644 index 00000000000..f05a6362358 --- /dev/null +++ b/src/test/modules/test_copy_format/test_copy_format.control @@ -0,0 +1,4 @@ +comment = 'Test code for custom COPY format' +default_version = '1.0' +module_pathname = '$libdir/test_copy_format' +relocatable = true -- 2.47.2 From 45b0c1976b7e2b758745222ddec9194dba45f8a5 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Mon, 25 Nov 2024 13:58:33 +0900 Subject: [PATCH v35 2/7] Export CopyToStateData as private data It's for custom COPY TO format handlers implemented as extension. This just moves codes. This doesn't change codes except CopyDest enum values. CopyDest/CopyFrom enum values such as COPY_FILE are conflicted each other. So COPY_DEST_ prefix instead of COPY_ prefix is used for CopyDest enum values. For example, COPY_FILE in CopyDest is renamed to COPY_DEST_FILE. Note that this isn't enough to implement custom COPY TO format handlers as extension. We'll do the followings in a subsequent commit: 1. Add an opaque space for custom COPY TO format handler 2. Export CopySendEndOfRow() to flush buffer --- src/backend/commands/copyto.c | 78 +++--------------------- src/include/commands/copy.h | 2 +- src/include/commands/copyto_internal.h | 83 ++++++++++++++++++++++++++ 3 files changed, 93 insertions(+), 70 deletions(-) create mode 100644 src/include/commands/copyto_internal.h diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c index 0d33d101735..17d89c23af0 100644 --- a/src/backend/commands/copyto.c +++ b/src/backend/commands/copyto.c @@ -20,6 +20,7 @@ #include "access/tableam.h" #include "commands/copyapi.h" +#include "commands/copyto_internal.h" #include "commands/progress.h" #include "executor/execdesc.h" #include "executor/executor.h" @@ -36,67 +37,6 @@ #include "utils/rel.h" #include "utils/snapmgr.h" -/* - * Represents the different dest cases we need to worry about at - * the bottom level - */ -typedef enum CopyDest -{ - COPY_FILE, /* to file (or a piped program) */ - COPY_FRONTEND, /* to frontend */ - COPY_CALLBACK, /* to callback function */ -} CopyDest; - -/* - * This struct contains all the state variables used throughout a COPY TO - * operation. - * - * Multi-byte encodings: all supported client-side encodings encode multi-byte - * characters by having the first byte's high bit set. Subsequent bytes of the - * character can have the high bit not set. When scanning data in such an - * encoding to look for a match to a single-byte (ie ASCII) character, we must - * use the full pg_encoding_mblen() machinery to skip over multibyte - * characters, else we might find a false match to a trailing byte. In - * supported server encodings, there is no possibility of a false match, and - * it's faster to make useless comparisons to trailing bytes than it is to - * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true - * when we have to do it the hard way. - */ -typedef struct CopyToStateData -{ - /* format-specific routines */ - const CopyToRoutine *routine; - - /* low-level state data */ - CopyDest copy_dest; /* type of copy source/destination */ - FILE *copy_file; /* used if copy_dest == COPY_FILE */ - StringInfo fe_msgbuf; /* used for all dests during COPY TO */ - - int file_encoding; /* file or remote side's character encoding */ - bool need_transcoding; /* file encoding diff from server? */ - bool encoding_embeds_ascii; /* ASCII can be non-first byte? */ - - /* parameters from the COPY command */ - Relation rel; /* relation to copy to */ - QueryDesc *queryDesc; /* executable query to copy from */ - List *attnumlist; /* integer list of attnums to copy */ - char *filename; /* filename, or NULL for STDOUT */ - bool is_program; /* is 'filename' a program to popen? */ - copy_data_dest_cb data_dest_cb; /* function for writing data */ - - CopyFormatOptions opts; - Node *whereClause; /* WHERE condition (or NULL) */ - - /* - * Working state - */ - MemoryContext copycontext; /* per-copy execution context */ - - FmgrInfo *out_functions; /* lookup info for output functions */ - MemoryContext rowcontext; /* per-row evaluation context */ - uint64 bytes_processed; /* number of bytes processed so far */ -} CopyToStateData; - /* DestReceiver for COPY (query) TO */ typedef struct { @@ -421,7 +361,7 @@ SendCopyBegin(CopyToState cstate) for (i = 0; i < natts; i++) pq_sendint16(&buf, format); /* per-column formats */ pq_endmessage(&buf); - cstate->copy_dest = COPY_FRONTEND; + cstate->copy_dest = COPY_DEST_FRONTEND; } static void @@ -468,7 +408,7 @@ CopySendEndOfRow(CopyToState cstate) switch (cstate->copy_dest) { - case COPY_FILE: + case COPY_DEST_FILE: if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1, cstate->copy_file) != 1 || ferror(cstate->copy_file)) @@ -502,11 +442,11 @@ CopySendEndOfRow(CopyToState cstate) errmsg("could not write to COPY file: %m"))); } break; - case COPY_FRONTEND: + case COPY_DEST_FRONTEND: /* Dump the accumulated row as one CopyData message */ (void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len); break; - case COPY_CALLBACK: + case COPY_DEST_CALLBACK: cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len); break; } @@ -527,7 +467,7 @@ CopySendTextLikeEndOfRow(CopyToState cstate) { switch (cstate->copy_dest) { - case COPY_FILE: + case COPY_DEST_FILE: /* Default line termination depends on platform */ #ifndef WIN32 CopySendChar(cstate, '\n'); @@ -535,7 +475,7 @@ CopySendTextLikeEndOfRow(CopyToState cstate) CopySendString(cstate, "\r\n"); #endif break; - case COPY_FRONTEND: + case COPY_DEST_FRONTEND: /* The FE/BE protocol uses \n as newline for all platforms */ CopySendChar(cstate, '\n'); break; @@ -920,12 +860,12 @@ BeginCopyTo(ParseState *pstate, /* See Multibyte encoding comment above */ cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding); - cstate->copy_dest = COPY_FILE; /* default */ + cstate->copy_dest = COPY_DEST_FILE; /* default */ if (data_dest_cb) { progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK; - cstate->copy_dest = COPY_CALLBACK; + cstate->copy_dest = COPY_DEST_CALLBACK; cstate->data_dest_cb = data_dest_cb; } else if (pipe) diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h index 332628d67cc..6df1f8a3b9b 100644 --- a/src/include/commands/copy.h +++ b/src/include/commands/copy.h @@ -90,7 +90,7 @@ typedef struct CopyFormatOptions Oid handler; /* handler function for custom format routine */ } CopyFormatOptions; -/* These are private in commands/copy[from|to].c */ +/* These are private in commands/copy[from|to]_internal.h */ typedef struct CopyFromStateData *CopyFromState; typedef struct CopyToStateData *CopyToState; diff --git a/src/include/commands/copyto_internal.h b/src/include/commands/copyto_internal.h new file mode 100644 index 00000000000..1b58b36c0a3 --- /dev/null +++ b/src/include/commands/copyto_internal.h @@ -0,0 +1,83 @@ +/*------------------------------------------------------------------------- + * + * copyto_internal.h + * Internal definitions for COPY TO command. + * + * + * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + * src/include/commands/copyto_internal.h + * + *------------------------------------------------------------------------- + */ +#ifndef COPYTO_INTERNAL_H +#define COPYTO_INTERNAL_H + +#include "commands/copy.h" +#include "executor/execdesc.h" +#include "executor/tuptable.h" +#include "nodes/execnodes.h" + +/* + * Represents the different dest cases we need to worry about at + * the bottom level + */ +typedef enum CopyDest +{ + COPY_DEST_FILE, /* to file (or a piped program) */ + COPY_DEST_FRONTEND, /* to frontend */ + COPY_DEST_CALLBACK, /* to callback function */ +} CopyDest; + +/* + * This struct contains all the state variables used throughout a COPY TO + * operation. + * + * Multi-byte encodings: all supported client-side encodings encode multi-byte + * characters by having the first byte's high bit set. Subsequent bytes of the + * character can have the high bit not set. When scanning data in such an + * encoding to look for a match to a single-byte (ie ASCII) character, we must + * use the full pg_encoding_mblen() machinery to skip over multibyte + * characters, else we might find a false match to a trailing byte. In + * supported server encodings, there is no possibility of a false match, and + * it's faster to make useless comparisons to trailing bytes than it is to + * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true + * when we have to do it the hard way. + */ +typedef struct CopyToStateData +{ + /* format-specific routines */ + const struct CopyToRoutine *routine; + + /* low-level state data */ + CopyDest copy_dest; /* type of copy source/destination */ + FILE *copy_file; /* used if copy_dest == COPY_FILE */ + StringInfo fe_msgbuf; /* used for all dests during COPY TO */ + + int file_encoding; /* file or remote side's character encoding */ + bool need_transcoding; /* file encoding diff from server? */ + bool encoding_embeds_ascii; /* ASCII can be non-first byte? */ + + /* parameters from the COPY command */ + Relation rel; /* relation to copy to */ + QueryDesc *queryDesc; /* executable query to copy from */ + List *attnumlist; /* integer list of attnums to copy */ + char *filename; /* filename, or NULL for STDOUT */ + bool is_program; /* is 'filename' a program to popen? */ + copy_data_dest_cb data_dest_cb; /* function for writing data */ + + CopyFormatOptions opts; + Node *whereClause; /* WHERE condition (or NULL) */ + + /* + * Working state + */ + MemoryContext copycontext; /* per-copy execution context */ + + FmgrInfo *out_functions; /* lookup info for output functions */ + MemoryContext rowcontext; /* per-row evaluation context */ + uint64 bytes_processed; /* number of bytes processed so far */ +} CopyToStateData; + +#endif /* COPYTO_INTERNAL_H */ -- 2.47.2 From 6750a77f22b65464685f0d855ebcf69ebf25a135 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Mon, 25 Nov 2024 14:01:18 +0900 Subject: [PATCH v35 3/7] Add support for implementing custom COPY TO format as extension * Add CopyToStateData::opaque that can be used to keep data for custom COPY TO format implementation * Export CopySendEndOfRow() to flush data in CopyToStateData::fe_msgbuf as CopyToStateFlush() --- src/backend/commands/copyto.c | 12 ++++++++++++ src/include/commands/copyapi.h | 2 ++ src/include/commands/copyto_internal.h | 3 +++ 3 files changed, 17 insertions(+) diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c index 17d89c23af0..35f9035141a 100644 --- a/src/backend/commands/copyto.c +++ b/src/backend/commands/copyto.c @@ -458,6 +458,18 @@ CopySendEndOfRow(CopyToState cstate) resetStringInfo(fe_msgbuf); } +/* + * Export CopySendEndOfRow() for extensions. We want to keep + * CopySendEndOfRow() as a static function for + * optimization. CopySendEndOfRow() calls in this file may be optimized by a + * compiler. + */ +void +CopyToStateFlush(CopyToState cstate) +{ + CopySendEndOfRow(cstate); +} + /* * Wrapper function of CopySendEndOfRow for text and CSV formats. Sends the * line termination and do common appropriate things for the end of row. diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index 4f4ffabf882..5c5ea6592e3 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -56,6 +56,8 @@ typedef struct CopyToRoutine void (*CopyToEnd) (CopyToState cstate); } CopyToRoutine; +extern void CopyToStateFlush(CopyToState cstate); + /* * API structure for a COPY FROM format implementation. Note this must be * allocated in a server-lifetime manner, typically as a static const struct. diff --git a/src/include/commands/copyto_internal.h b/src/include/commands/copyto_internal.h index 1b58b36c0a3..ce1c33a4004 100644 --- a/src/include/commands/copyto_internal.h +++ b/src/include/commands/copyto_internal.h @@ -78,6 +78,9 @@ typedef struct CopyToStateData FmgrInfo *out_functions; /* lookup info for output functions */ MemoryContext rowcontext; /* per-row evaluation context */ uint64 bytes_processed; /* number of bytes processed so far */ + + /* For custom format implementation */ + void *opaque; /* private space */ } CopyToStateData; #endif /* COPYTO_INTERNAL_H */ -- 2.47.2 From 53cdd9defcee3c6f0f2e2743d885cec3d16f6b9f Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Mon, 25 Nov 2024 14:11:55 +0900 Subject: [PATCH v35 4/7] Add support for adding custom COPY FROM format This uses the same handler for COPY TO and COPY FROM but uses different routine. This uses CopyToRoutine for COPY TO and CopyFromRoutine for COPY FROM. PostgreSQL calls a COPY TO/FROM handler with "is_from" argument. It's true for COPY FROM and false for COPY TO: copy_handler(true) returns CopyToRoutine copy_handler(false) returns CopyFromRoutine This also add a test module for custom COPY FROM handler. --- src/backend/commands/copy.c | 13 +++---- src/backend/commands/copyfrom.c | 36 +++++++++++++---- src/include/catalog/pg_type.dat | 2 +- src/include/commands/copyapi.h | 2 + .../expected/test_copy_format.out | 10 +++-- .../test_copy_format/sql/test_copy_format.sql | 1 + .../test_copy_format/test_copy_format.c | 39 ++++++++++++++++++- 7 files changed, 82 insertions(+), 21 deletions(-) diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index 8d94bc313eb..b4417bb6819 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -483,8 +483,8 @@ defGetCopyLogVerbosityChoice(DefElem *def, ParseState *pstate) * This function checks whether the option value is a built-in format such as * "text" and "csv" or not. If the option value isn't a built-in format, this * function finds a COPY format handler that returns a CopyToRoutine (for - * is_from == false). If no COPY format handler is found, this function - * reports an error. + * is_from == false) or CopyFromRountine (for is_from == true). If no COPY + * format handler is found, this function reports an error. */ static void ProcessCopyOptionFormat(ParseState *pstate, @@ -518,12 +518,9 @@ ProcessCopyOptionFormat(ParseState *pstate, } /* custom format */ - if (!is_from) - { - funcargtypes[0] = INTERNALOID; - handlerOid = LookupFuncName(list_make1(makeString(format)), 1, - funcargtypes, true); - } + funcargtypes[0] = INTERNALOID; + handlerOid = LookupFuncName(list_make1(makeString(format)), 1, + funcargtypes, true); if (!OidIsValid(handlerOid)) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index 198cee2bc48..114ea969dfa 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -129,6 +129,7 @@ static void CopyFromBinaryEnd(CopyFromState cstate); /* text format */ static const CopyFromRoutine CopyFromRoutineText = { + .type = T_CopyFromRoutine, .CopyFromInFunc = CopyFromTextLikeInFunc, .CopyFromStart = CopyFromTextLikeStart, .CopyFromOneRow = CopyFromTextOneRow, @@ -137,6 +138,7 @@ static const CopyFromRoutine CopyFromRoutineText = { /* CSV format */ static const CopyFromRoutine CopyFromRoutineCSV = { + .type = T_CopyFromRoutine, .CopyFromInFunc = CopyFromTextLikeInFunc, .CopyFromStart = CopyFromTextLikeStart, .CopyFromOneRow = CopyFromCSVOneRow, @@ -145,6 +147,7 @@ static const CopyFromRoutine CopyFromRoutineCSV = { /* binary format */ static const CopyFromRoutine CopyFromRoutineBinary = { + .type = T_CopyFromRoutine, .CopyFromInFunc = CopyFromBinaryInFunc, .CopyFromStart = CopyFromBinaryStart, .CopyFromOneRow = CopyFromBinaryOneRow, @@ -153,15 +156,32 @@ static const CopyFromRoutine CopyFromRoutineBinary = { /* Return a COPY FROM routine for the given options */ static const CopyFromRoutine * -CopyFromGetRoutine(CopyFormatOptions opts) +CopyFromGetRoutine(CopyFormatOptions *opts) { - if (opts.csv_mode) - return &CopyFromRoutineCSV; - else if (opts.binary) - return &CopyFromRoutineBinary; + if (OidIsValid(opts->handler)) + { + Datum datum; + Node *routine; - /* default is text */ - return &CopyFromRoutineText; + datum = OidFunctionCall1(opts->handler, BoolGetDatum(true)); + routine = (Node *) DatumGetPointer(datum); + if (routine == NULL || !IsA(routine, CopyFromRoutine)) + ereport( + ERROR, + (errcode( + ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("COPY handler function " + "%u did not return " + "CopyFromRoutine struct", + opts->handler))); + return castNode(CopyFromRoutine, routine); + } + else if (opts->csv_mode) + return &CopyFromRoutineCSV; + else if (opts->binary) + return &CopyFromRoutineBinary; + else + return &CopyFromRoutineText; } /* Implementation of the start callback for text and CSV formats */ @@ -1574,7 +1594,7 @@ BeginCopyFrom(ParseState *pstate, ProcessCopyOptions(pstate, &cstate->opts, true /* is_from */ , options); /* Set the format routine */ - cstate->routine = CopyFromGetRoutine(cstate->opts); + cstate->routine = CopyFromGetRoutine(&cstate->opts); /* Process the target relation */ cstate->rel = rel; diff --git a/src/include/catalog/pg_type.dat b/src/include/catalog/pg_type.dat index 340e0cd0a8d..63b7d65f982 100644 --- a/src/include/catalog/pg_type.dat +++ b/src/include/catalog/pg_type.dat @@ -634,7 +634,7 @@ typoutput => 'tsm_handler_out', typreceive => '-', typsend => '-', typalign => 'i' }, { oid => '8752', - descr => 'pseudo-type for the result of a copy to method function', + descr => 'pseudo-type for the result of a copy to/from method function', typname => 'copy_handler', typlen => '4', typbyval => 't', typtype => 'p', typcategory => 'P', typinput => 'copy_handler_in', typoutput => 'copy_handler_out', typreceive => '-', typsend => '-', diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index 5c5ea6592e3..895c105d8d8 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -64,6 +64,8 @@ extern void CopyToStateFlush(CopyToState cstate); */ typedef struct CopyFromRoutine { + NodeTag type; + /* * Set input function information. This callback is called once at the * beginning of COPY FROM. diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out b/src/test/modules/test_copy_format/expected/test_copy_format.out index adfe7d1572a..016893e7026 100644 --- a/src/test/modules/test_copy_format/expected/test_copy_format.out +++ b/src/test/modules/test_copy_format/expected/test_copy_format.out @@ -2,9 +2,13 @@ CREATE EXTENSION test_copy_format; CREATE TABLE public.test (a smallint, b integer, c bigint); INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789); COPY public.test FROM stdin WITH (FORMAT 'test_copy_format'); -ERROR: COPY format "test_copy_format" not recognized -LINE 1: COPY public.test FROM stdin WITH (FORMAT 'test_copy_format')... - ^ +NOTICE: test_copy_format: is_from=true +NOTICE: CopyFromInFunc: atttypid=21 +NOTICE: CopyFromInFunc: atttypid=23 +NOTICE: CopyFromInFunc: atttypid=20 +NOTICE: CopyFromStart: natts=3 +NOTICE: CopyFromOneRow +NOTICE: CopyFromEnd COPY public.test TO stdout WITH (FORMAT 'test_copy_format'); NOTICE: test_copy_format: is_from=false NOTICE: CopyToOutFunc: atttypid=21 diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql b/src/test/modules/test_copy_format/sql/test_copy_format.sql index 810b3d8cedc..0dfdfa00080 100644 --- a/src/test/modules/test_copy_format/sql/test_copy_format.sql +++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql @@ -2,4 +2,5 @@ CREATE EXTENSION test_copy_format; CREATE TABLE public.test (a smallint, b integer, c bigint); INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789); COPY public.test FROM stdin WITH (FORMAT 'test_copy_format'); +\. COPY public.test TO stdout WITH (FORMAT 'test_copy_format'); diff --git a/src/test/modules/test_copy_format/test_copy_format.c b/src/test/modules/test_copy_format/test_copy_format.c index b42d472d851..abafc668463 100644 --- a/src/test/modules/test_copy_format/test_copy_format.c +++ b/src/test/modules/test_copy_format/test_copy_format.c @@ -18,6 +18,40 @@ PG_MODULE_MAGIC; +static void +CopyFromInFunc(CopyFromState cstate, Oid atttypid, + FmgrInfo *finfo, Oid *typioparam) +{ + ereport(NOTICE, (errmsg("CopyFromInFunc: atttypid=%d", atttypid))); +} + +static void +CopyFromStart(CopyFromState cstate, TupleDesc tupDesc) +{ + ereport(NOTICE, (errmsg("CopyFromStart: natts=%d", tupDesc->natts))); +} + +static bool +CopyFromOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls) +{ + ereport(NOTICE, (errmsg("CopyFromOneRow"))); + return false; +} + +static void +CopyFromEnd(CopyFromState cstate) +{ + ereport(NOTICE, (errmsg("CopyFromEnd"))); +} + +static const CopyFromRoutine CopyFromRoutineTestCopyFormat = { + .type = T_CopyFromRoutine, + .CopyFromInFunc = CopyFromInFunc, + .CopyFromStart = CopyFromStart, + .CopyFromOneRow = CopyFromOneRow, + .CopyFromEnd = CopyFromEnd, +}; + static void CopyToOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo) { @@ -59,5 +93,8 @@ test_copy_format(PG_FUNCTION_ARGS) ereport(NOTICE, (errmsg("test_copy_format: is_from=%s", is_from ? "true" : "false"))); - PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat); + if (is_from) + PG_RETURN_POINTER(&CopyFromRoutineTestCopyFormat); + else + PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat); } -- 2.47.2 From aad169e8d9701649f3451e567fc2a56ba53647c4 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Mon, 25 Nov 2024 14:19:34 +0900 Subject: [PATCH v35 5/7] Use COPY_SOURCE_ prefix for CopySource enum values This is for consistency with CopyDest. --- src/backend/commands/copyfrom.c | 4 ++-- src/backend/commands/copyfromparse.c | 10 +++++----- src/include/commands/copyfrom_internal.h | 6 +++--- 3 files changed, 10 insertions(+), 10 deletions(-) diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index 114ea969dfa..9b9b44aa17b 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -1729,7 +1729,7 @@ BeginCopyFrom(ParseState *pstate, pg_encoding_to_char(GetDatabaseEncoding())))); } - cstate->copy_src = COPY_FILE; /* default */ + cstate->copy_src = COPY_SOURCE_FILE; /* default */ cstate->whereClause = whereClause; @@ -1857,7 +1857,7 @@ BeginCopyFrom(ParseState *pstate, if (data_source_cb) { progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK; - cstate->copy_src = COPY_CALLBACK; + cstate->copy_src = COPY_SOURCE_CALLBACK; cstate->data_source_cb = data_source_cb; } else if (pipe) diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index e8128f85e6b..17e51f02e04 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -180,7 +180,7 @@ ReceiveCopyBegin(CopyFromState cstate) for (i = 0; i < natts; i++) pq_sendint16(&buf, format); /* per-column formats */ pq_endmessage(&buf); - cstate->copy_src = COPY_FRONTEND; + cstate->copy_src = COPY_SOURCE_FRONTEND; cstate->fe_msgbuf = makeStringInfo(); /* We *must* flush here to ensure FE knows it can send. */ pq_flush(); @@ -248,7 +248,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread) switch (cstate->copy_src) { - case COPY_FILE: + case COPY_SOURCE_FILE: bytesread = fread(databuf, 1, maxread, cstate->copy_file); if (ferror(cstate->copy_file)) ereport(ERROR, @@ -257,7 +257,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread) if (bytesread == 0) cstate->raw_reached_eof = true; break; - case COPY_FRONTEND: + case COPY_SOURCE_FRONTEND: while (maxread > 0 && bytesread < minread && !cstate->raw_reached_eof) { int avail; @@ -340,7 +340,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread) bytesread += avail; } break; - case COPY_CALLBACK: + case COPY_SOURCE_CALLBACK: bytesread = cstate->data_source_cb(databuf, minread, maxread); break; } @@ -1172,7 +1172,7 @@ CopyReadLine(CopyFromState cstate, bool is_csv) * after \. up to the protocol end of copy data. (XXX maybe better * not to treat \. as special?) */ - if (cstate->copy_src == COPY_FRONTEND) + if (cstate->copy_src == COPY_SOURCE_FRONTEND) { int inbytes; diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h index c8b22af22d8..3a306e3286e 100644 --- a/src/include/commands/copyfrom_internal.h +++ b/src/include/commands/copyfrom_internal.h @@ -24,9 +24,9 @@ */ typedef enum CopySource { - COPY_FILE, /* from file (or a piped program) */ - COPY_FRONTEND, /* from frontend */ - COPY_CALLBACK, /* from callback function */ + COPY_SOURCE_FILE, /* from file (or a piped program) */ + COPY_SOURCE_FRONTEND, /* from frontend */ + COPY_SOURCE_CALLBACK, /* from callback function */ } CopySource; /* -- 2.47.2 From 4db2a85dfec5376a63fdb65c3bb2221ba8b12540 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Mon, 25 Nov 2024 14:21:39 +0900 Subject: [PATCH v35 6/7] Add support for implementing custom COPY FROM format as extension * Add CopyFromStateData::opaque that can be used to keep data for custom COPY From format implementation * Export CopyGetData() to get the next data as CopyFromStateGetData() --- src/backend/commands/copyfromparse.c | 11 +++++++++++ src/include/commands/copyapi.h | 2 ++ src/include/commands/copyfrom_internal.h | 3 +++ 3 files changed, 16 insertions(+) diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index 17e51f02e04..d8fd238e72b 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -739,6 +739,17 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes) return copied_bytes; } +/* + * Export CopyGetData() for extensions. We want to keep CopyGetData() as a + * static function for optimization. CopyGetData() calls in this file may be + * optimized by a compiler. + */ +int +CopyFromStateGetData(CopyFromState cstate, void *dest, int minread, int maxread) +{ + return CopyGetData(cstate, dest, minread, maxread); +} + /* * This function is exposed for use by extensions that read raw fields in the * next line. See NextCopyFromRawFieldsInternal() for details. diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index 895c105d8d8..2044d8b8c4c 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -108,4 +108,6 @@ typedef struct CopyFromRoutine void (*CopyFromEnd) (CopyFromState cstate); } CopyFromRoutine; +extern int CopyFromStateGetData(CopyFromState cstate, void *dest, int minread, int maxread); + #endif /* COPYAPI_H */ diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h index 3a306e3286e..af425cf5fd9 100644 --- a/src/include/commands/copyfrom_internal.h +++ b/src/include/commands/copyfrom_internal.h @@ -181,6 +181,9 @@ typedef struct CopyFromStateData #define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index) uint64 bytes_processed; /* number of bytes processed so far */ + + /* For custom format implementation */ + void *opaque; /* private space */ } CopyFromStateData; extern void ReceiveCopyBegin(CopyFromState cstate); -- 2.47.2 From 8066477384ad660bfa5438f0b42ae5a24c3f17fb Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Wed, 27 Nov 2024 16:23:55 +0900 Subject: [PATCH v35 7/7] Add CopyFromSkipErrorRow() for custom COPY format extension Extensions must call CopyFromSkipErrorRow() when CopyFromOneRow callback reports an error by errsave(). CopyFromSkipErrorRow() handles "ON_ERROR stop" and "LOG_VERBOSITY verbose" cases. --- src/backend/commands/copyfromparse.c | 82 +++++++++++-------- src/include/commands/copyapi.h | 2 + .../expected/test_copy_format.out | 47 +++++++++++ .../test_copy_format/sql/test_copy_format.sql | 24 ++++++ .../test_copy_format/test_copy_format.c | 80 +++++++++++++++++- 5 files changed, 198 insertions(+), 37 deletions(-) diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index d8fd238e72b..2070f51a963 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -938,6 +938,51 @@ CopyFromCSVOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, return CopyFromTextLikeOneRow(cstate, econtext, values, nulls, true); } +/* + * Call this when you report an error by errsave() in your CopyFromOneRow + * callback. This handles "ON_ERROR stop" and "LOG_VERBOSITY verbose" cases + * for you. + */ +void +CopyFromSkipErrorRow(CopyFromState cstate) +{ + Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP); + + cstate->num_errors++; + + if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE) + { + /* + * Since we emit line number and column info in the below notice + * message, we suppress error context information other than the + * relation name. + */ + Assert(!cstate->relname_only); + cstate->relname_only = true; + + if (cstate->cur_attval) + { + char *attval; + + attval = CopyLimitPrintoutLength(cstate->cur_attval); + ereport(NOTICE, + errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": \"%s\"", + (unsigned long long) cstate->cur_lineno, + cstate->cur_attname, + attval)); + pfree(attval); + } + else + ereport(NOTICE, + errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": null input", + (unsigned long long) cstate->cur_lineno, + cstate->cur_attname)); + + /* reset relname_only */ + cstate->relname_only = false; + } +} + /* * Workhorse for CopyFromTextOneRow() and CopyFromCSVOneRow(). * @@ -1044,42 +1089,7 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext, (Node *) cstate->escontext, &values[m])) { - Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP); - - cstate->num_errors++; - - if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE) - { - /* - * Since we emit line number and column info in the below - * notice message, we suppress error context information other - * than the relation name. - */ - Assert(!cstate->relname_only); - cstate->relname_only = true; - - if (cstate->cur_attval) - { - char *attval; - - attval = CopyLimitPrintoutLength(cstate->cur_attval); - ereport(NOTICE, - errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": \"%s\"", - (unsigned long long) cstate->cur_lineno, - cstate->cur_attname, - attval)); - pfree(attval); - } - else - ereport(NOTICE, - errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": null input", - (unsigned long long) cstate->cur_lineno, - cstate->cur_attname)); - - /* reset relname_only */ - cstate->relname_only = false; - } - + CopyFromSkipErrorRow(cstate); return true; } diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index 2044d8b8c4c..500ece7d5bb 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -110,4 +110,6 @@ typedef struct CopyFromRoutine extern int CopyFromStateGetData(CopyFromState cstate, void *dest, int minread, int maxread); +extern void CopyFromSkipErrorRow(CopyFromState cstate); + #endif /* COPYAPI_H */ diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out b/src/test/modules/test_copy_format/expected/test_copy_format.out index 016893e7026..b9a6baa85c0 100644 --- a/src/test/modules/test_copy_format/expected/test_copy_format.out +++ b/src/test/modules/test_copy_format/expected/test_copy_format.out @@ -1,6 +1,8 @@ CREATE EXTENSION test_copy_format; CREATE TABLE public.test (a smallint, b integer, c bigint); INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789); +-- 987 is accepted. +-- 654 is a hard error because ON_ERROR is stop by default. COPY public.test FROM stdin WITH (FORMAT 'test_copy_format'); NOTICE: test_copy_format: is_from=true NOTICE: CopyFromInFunc: atttypid=21 @@ -8,7 +10,50 @@ NOTICE: CopyFromInFunc: atttypid=23 NOTICE: CopyFromInFunc: atttypid=20 NOTICE: CopyFromStart: natts=3 NOTICE: CopyFromOneRow +NOTICE: CopyFromOneRow +ERROR: invalid value: "6" +CONTEXT: COPY test, line 2, column a: "6" +-- 987 is accepted. +-- 654 is a soft error because ON_ERROR is ignore. +COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore); +NOTICE: test_copy_format: is_from=true +NOTICE: CopyFromInFunc: atttypid=21 +NOTICE: CopyFromInFunc: atttypid=23 +NOTICE: CopyFromInFunc: atttypid=20 +NOTICE: CopyFromStart: natts=3 +NOTICE: CopyFromOneRow +NOTICE: CopyFromOneRow +NOTICE: CopyFromOneRow +NOTICE: 1 row was skipped due to data type incompatibility NOTICE: CopyFromEnd +-- 987 is accepted. +-- 654 is a soft error because ON_ERROR is ignore. +COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore, LOG_VERBOSITY verbose); +NOTICE: test_copy_format: is_from=true +NOTICE: CopyFromInFunc: atttypid=21 +NOTICE: CopyFromInFunc: atttypid=23 +NOTICE: CopyFromInFunc: atttypid=20 +NOTICE: CopyFromStart: natts=3 +NOTICE: CopyFromOneRow +NOTICE: CopyFromOneRow +NOTICE: skipping row due to data type incompatibility at line 2 for column "a": "6" +NOTICE: CopyFromOneRow +NOTICE: 1 row was skipped due to data type incompatibility +NOTICE: CopyFromEnd +-- 987 is accepted. +-- 654 is a soft error because ON_ERROR is ignore. +-- 321 is a hard error. +COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore); +NOTICE: test_copy_format: is_from=true +NOTICE: CopyFromInFunc: atttypid=21 +NOTICE: CopyFromInFunc: atttypid=23 +NOTICE: CopyFromInFunc: atttypid=20 +NOTICE: CopyFromStart: natts=3 +NOTICE: CopyFromOneRow +NOTICE: CopyFromOneRow +NOTICE: CopyFromOneRow +ERROR: too much lines: 3 +CONTEXT: COPY test, line 3 COPY public.test TO stdout WITH (FORMAT 'test_copy_format'); NOTICE: test_copy_format: is_from=false NOTICE: CopyToOutFunc: atttypid=21 @@ -18,4 +63,6 @@ NOTICE: CopyToStart: natts=3 NOTICE: CopyToOneRow: tts_nvalid=3 NOTICE: CopyToOneRow: tts_nvalid=3 NOTICE: CopyToOneRow: tts_nvalid=3 +NOTICE: CopyToOneRow: tts_nvalid=3 +NOTICE: CopyToOneRow: tts_nvalid=3 NOTICE: CopyToEnd diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql b/src/test/modules/test_copy_format/sql/test_copy_format.sql index 0dfdfa00080..86db71bce7f 100644 --- a/src/test/modules/test_copy_format/sql/test_copy_format.sql +++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql @@ -1,6 +1,30 @@ CREATE EXTENSION test_copy_format; CREATE TABLE public.test (a smallint, b integer, c bigint); INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789); +-- 987 is accepted. +-- 654 is a hard error because ON_ERROR is stop by default. COPY public.test FROM stdin WITH (FORMAT 'test_copy_format'); +987 +654 +\. +-- 987 is accepted. +-- 654 is a soft error because ON_ERROR is ignore. +COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore); +987 +654 +\. +-- 987 is accepted. +-- 654 is a soft error because ON_ERROR is ignore. +COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore, LOG_VERBOSITY verbose); +987 +654 +\. +-- 987 is accepted. +-- 654 is a soft error because ON_ERROR is ignore. +-- 321 is a hard error. +COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore); +987 +654 +321 \. COPY public.test TO stdout WITH (FORMAT 'test_copy_format'); diff --git a/src/test/modules/test_copy_format/test_copy_format.c b/src/test/modules/test_copy_format/test_copy_format.c index abafc668463..96a54dab7ec 100644 --- a/src/test/modules/test_copy_format/test_copy_format.c +++ b/src/test/modules/test_copy_format/test_copy_format.c @@ -14,6 +14,7 @@ #include "postgres.h" #include "commands/copyapi.h" +#include "commands/copyfrom_internal.h" #include "commands/defrem.h" PG_MODULE_MAGIC; @@ -34,8 +35,85 @@ CopyFromStart(CopyFromState cstate, TupleDesc tupDesc) static bool CopyFromOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls) { + int n_attributes = list_length(cstate->attnumlist); + char *line; + int line_size = n_attributes + 1; /* +1 is for new line */ + int read_bytes; + ereport(NOTICE, (errmsg("CopyFromOneRow"))); - return false; + + cstate->cur_lineno++; + line = palloc(line_size); + read_bytes = CopyFromStateGetData(cstate, line, line_size, line_size); + if (read_bytes == 0) + return false; + if (read_bytes != line_size) + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("one line must be %d bytes: %d", + line_size, read_bytes))); + + if (cstate->cur_lineno == 1) + { + /* Success */ + TupleDesc tupDesc = RelationGetDescr(cstate->rel); + ListCell *cur; + int i = 0; + + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + int m = attnum - 1; + Form_pg_attribute att = TupleDescAttr(tupDesc, m); + + if (att->atttypid == INT2OID) + { + values[i] = Int16GetDatum(line[i] - '0'); + } + else if (att->atttypid == INT4OID) + { + values[i] = Int32GetDatum(line[i] - '0'); + } + else if (att->atttypid == INT8OID) + { + values[i] = Int64GetDatum(line[i] - '0'); + } + nulls[i] = false; + i++; + } + } + else if (cstate->cur_lineno == 2) + { + /* Soft error */ + TupleDesc tupDesc = RelationGetDescr(cstate->rel); + int attnum = lfirst_int(list_head(cstate->attnumlist)); + int m = attnum - 1; + Form_pg_attribute att = TupleDescAttr(tupDesc, m); + char value[2]; + + cstate->cur_attname = NameStr(att->attname); + value[0] = line[0]; + value[1] = '\0'; + cstate->cur_attval = value; + errsave((Node *) cstate->escontext, + ( + errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), + errmsg("invalid value: \"%c\"", line[0]))); + CopyFromSkipErrorRow(cstate); + cstate->cur_attname = NULL; + cstate->cur_attval = NULL; + return true; + } + else + { + /* Hard error */ + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("too much lines: %llu", + (unsigned long long) cstate->cur_lineno))); + } + + return true; } static void -- 2.47.2
On Sat, Mar 1, 2025 at 10:50 AM Sutou Kouhei <kou@clear-code.com> wrote: > > Hi, > > Our 0001/0002 patches were merged into master. I've rebased > on master. Can we discuss how to proceed rest patches? > > The contents of them aren't changed but I'll show a summary > of them again: > > 0001-0003 are for COPY TO and 0004-0007 are for COPY FROM. > > For COPY TO: > > 0001: Add support for adding custom COPY TO format. This > uses tablesample like handler approach. We've discussed > other approaches such as USING+CREATE XXX approach but it > seems that other approaches are overkill for this case. > > See also: https://www.postgresql.org/message-id/flat/d838025aceeb19c9ff1db702fa55cabf%40postgrespro.ru#caca2799effc859f82f40ee8bec531d8 > > 0002: Export CopyToStateData to implement custom COPY TO > format as extension. > > 0003: Export a function and add a private space to > CopyToStateData to implement custom COPY TO format as > extension. > > We may want to squash 0002 and 0003 but splitting them will > be easy to review. Because 0002 just moves existing codes > (with some rename) and 0003 just adds some codes. If we > squash 0002 and 0003, moving and adding are mixed. > > For COPY FROM: > > 0004: This is COPY FROM version of 0001. > > 0005: 0002 has COPY_ prefix -> COPY_DEST_ prefix change for > enum CopyDest. This is similar change for enum CopySource. > > 0006: This is COPY FROM version of 0003. > > 0007: This is for easy to implement "ON_ERROR stop" and > "LOG_VERBOSITY verbose" in extension. > > We may want to squash 0005-0007 like for 0002-0003. > > > Thanks, > -- > kou While review another thread (Emitting JSON to file using COPY TO), I found the recently committed patches on this thread pass the CopyFormatOptions struct directly rather a pointer of the struct as a function parameter of CopyToGetRoutine and CopyFromGetRoutine. Then I took a quick look at the newly rebased patch set and found Sutou has already fixed this issue. I'm wondering if we should fix it as a separate commit as it seems like an oversight of previous patches? -- Regards Junwang Zhao
Junwang Zhao <zhjwpku@gmail.com> writes:
> While review another thread (Emitting JSON to file using COPY TO),
> I found the recently committed patches on this thread pass the
> CopyFormatOptions struct directly rather a pointer of the struct
> as a function parameter of CopyToGetRoutine and CopyFromGetRoutine.
Coverity is unhappy about that too:
/srv/coverity/git/pgsql-git/postgresql/src/backend/commands/copyto.c: 177 in CopyToGetRoutine()
171         .CopyToOneRow = CopyToBinaryOneRow,
172         .CopyToEnd = CopyToBinaryEnd,
173     };
174
175     /* Return a COPY TO routine for the given options */
176     static const CopyToRoutine *
>>>     CID 1643911:  Performance inefficiencies  (PASS_BY_VALUE)
>>>     Passing parameter opts of type "CopyFormatOptions" (size 184 bytes) by value, which exceeds the low threshold
of128 bytes. 
177     CopyToGetRoutine(CopyFormatOptions opts)
178     {
179         if (opts.csv_mode)
180             return &CopyToRoutineCSV;
(and likewise for CopyFromGetRoutine).  I realize that these
functions aren't called often enough for performance to be an
overriding concern, but it still seems like poor style.
> Then I took a quick look at the newly rebased patch set and
> found Sutou has already fixed this issue.
+1, except I'd suggest declaring the parameters as
"const CopyFormatOptions *opts".
            regards, tom lane
			
		Hi,
In <3191030.1740932840@sss.pgh.pa.us>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Sun, 02 Mar 2025 11:27:20 -0500,
  Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> While review another thread (Emitting JSON to file using COPY TO),
>> I found the recently committed patches on this thread pass the
>> CopyFormatOptions struct directly rather a pointer of the struct
>> as a function parameter of CopyToGetRoutine and CopyFromGetRoutine.
> 
> Coverity is unhappy about that too:
> 
> /srv/coverity/git/pgsql-git/postgresql/src/backend/commands/copyto.c: 177 in CopyToGetRoutine()
> 171         .CopyToOneRow = CopyToBinaryOneRow,
> 172         .CopyToEnd = CopyToBinaryEnd,
> 173     };
> 174     
> 175     /* Return a COPY TO routine for the given options */
> 176     static const CopyToRoutine *
>>>>     CID 1643911:  Performance inefficiencies  (PASS_BY_VALUE)
>>>>     Passing parameter opts of type "CopyFormatOptions" (size 184 bytes) by value, which exceeds the low threshold
of128 bytes.
 
> 177     CopyToGetRoutine(CopyFormatOptions opts)
> 178     {
> 179         if (opts.csv_mode)
> 180             return &CopyToRoutineCSV;
> 
> (and likewise for CopyFromGetRoutine).  I realize that these
> functions aren't called often enough for performance to be an
> overriding concern, but it still seems like poor style.
> 
>> Then I took a quick look at the newly rebased patch set and
>> found Sutou has already fixed this issue.
> 
> +1, except I'd suggest declaring the parameters as
> "const CopyFormatOptions *opts".
Thanks for pointing out this (and sorry for missing this in
our reviews...)!
How about the attached patch?
I'll rebase the v35 patch set after this is fixed.
Thanks,
-- 
kou
From f21b48c7dd0b141c561e9c8a2c9f1d0e28aabfae Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Mon, 3 Mar 2025 09:13:37 +0900
Subject: [PATCH] Use const pointer for CopyFormatOptions for
 Copy{To,From}GetRoutine()
We don't need to copy CopyFormatOptions here.
Reported-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: https://postgr.es/m/CAEG8a3L6YCpPksTQMzjD_CvwDEhW3D_t=5md9BvvdOs5k+TA=Q@mail.gmail.com
---
 src/backend/commands/copyfrom.c | 8 ++++----
 src/backend/commands/copyto.c   | 8 ++++----
 2 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 198cee2bc48..bcf66f0adf8 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -153,11 +153,11 @@ static const CopyFromRoutine CopyFromRoutineBinary = {
 
 /* Return a COPY FROM routine for the given options */
 static const CopyFromRoutine *
-CopyFromGetRoutine(CopyFormatOptions opts)
+CopyFromGetRoutine(const CopyFormatOptions *opts)
 {
-    if (opts.csv_mode)
+    if (opts->csv_mode)
         return &CopyFromRoutineCSV;
-    else if (opts.binary)
+    else if (opts->binary)
         return &CopyFromRoutineBinary;
 
     /* default is text */
@@ -1574,7 +1574,7 @@ BeginCopyFrom(ParseState *pstate,
     ProcessCopyOptions(pstate, &cstate->opts, true /* is_from */ , options);
 
     /* Set the format routine */
-    cstate->routine = CopyFromGetRoutine(cstate->opts);
+    cstate->routine = CopyFromGetRoutine(&cstate->opts);
 
     /* Process the target relation */
     cstate->rel = rel;
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 721d29f8e53..84a3f3879a8 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -174,11 +174,11 @@ static const CopyToRoutine CopyToRoutineBinary = {
 
 /* Return a COPY TO routine for the given options */
 static const CopyToRoutine *
-CopyToGetRoutine(CopyFormatOptions opts)
+CopyToGetRoutine(const CopyFormatOptions *opts)
 {
-    if (opts.csv_mode)
+    if (opts->csv_mode)
         return &CopyToRoutineCSV;
-    else if (opts.binary)
+    else if (opts->binary)
         return &CopyToRoutineBinary;
 
     /* default is text */
@@ -700,7 +700,7 @@ BeginCopyTo(ParseState *pstate,
     ProcessCopyOptions(pstate, &cstate->opts, false /* is_from */ , options);
 
     /* Set format routine */
-    cstate->routine = CopyToGetRoutine(cstate->opts);
+    cstate->routine = CopyToGetRoutine(&cstate->opts);
 
     /* Process the source/target relation or query */
     if (rel)
-- 
2.47.2
			
		On Mon, Mar 3, 2025 at 8:19 AM Sutou Kouhei <kou@clear-code.com> wrote:
>
> Hi,
>
> In <3191030.1740932840@sss.pgh.pa.us>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Sun, 02 Mar 2025 11:27:20 -0500,
>   Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> >> While review another thread (Emitting JSON to file using COPY TO),
> >> I found the recently committed patches on this thread pass the
> >> CopyFormatOptions struct directly rather a pointer of the struct
> >> as a function parameter of CopyToGetRoutine and CopyFromGetRoutine.
> >
> > Coverity is unhappy about that too:
> >
> > /srv/coverity/git/pgsql-git/postgresql/src/backend/commands/copyto.c: 177 in CopyToGetRoutine()
> > 171           .CopyToOneRow = CopyToBinaryOneRow,
> > 172           .CopyToEnd = CopyToBinaryEnd,
> > 173     };
> > 174
> > 175     /* Return a COPY TO routine for the given options */
> > 176     static const CopyToRoutine *
> >>>>     CID 1643911:  Performance inefficiencies  (PASS_BY_VALUE)
> >>>>     Passing parameter opts of type "CopyFormatOptions" (size 184 bytes) by value, which exceeds the low
thresholdof 128 bytes. 
> > 177     CopyToGetRoutine(CopyFormatOptions opts)
> > 178     {
> > 179           if (opts.csv_mode)
> > 180                   return &CopyToRoutineCSV;
> >
> > (and likewise for CopyFromGetRoutine).  I realize that these
> > functions aren't called often enough for performance to be an
> > overriding concern, but it still seems like poor style.
> >
> >> Then I took a quick look at the newly rebased patch set and
> >> found Sutou has already fixed this issue.
> >
> > +1, except I'd suggest declaring the parameters as
> > "const CopyFormatOptions *opts".
>
> Thanks for pointing out this (and sorry for missing this in
> our reviews...)!
>
> How about the attached patch?
Looking good, thanks
>
> I'll rebase the v35 patch set after this is fixed.
>
>
> Thanks,
> --
> kou
--
Regards
Junwang Zhao
			
		On Sun, Mar 2, 2025 at 4:19 PM Sutou Kouhei <kou@clear-code.com> wrote:
>
> Hi,
>
> In <3191030.1740932840@sss.pgh.pa.us>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Sun, 02 Mar 2025 11:27:20 -0500,
>   Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> >> While review another thread (Emitting JSON to file using COPY TO),
> >> I found the recently committed patches on this thread pass the
> >> CopyFormatOptions struct directly rather a pointer of the struct
> >> as a function parameter of CopyToGetRoutine and CopyFromGetRoutine.
> >
> > Coverity is unhappy about that too:
> >
> > /srv/coverity/git/pgsql-git/postgresql/src/backend/commands/copyto.c: 177 in CopyToGetRoutine()
> > 171           .CopyToOneRow = CopyToBinaryOneRow,
> > 172           .CopyToEnd = CopyToBinaryEnd,
> > 173     };
> > 174
> > 175     /* Return a COPY TO routine for the given options */
> > 176     static const CopyToRoutine *
> >>>>     CID 1643911:  Performance inefficiencies  (PASS_BY_VALUE)
> >>>>     Passing parameter opts of type "CopyFormatOptions" (size 184 bytes) by value, which exceeds the low
thresholdof 128 bytes. 
> > 177     CopyToGetRoutine(CopyFormatOptions opts)
> > 178     {
> > 179           if (opts.csv_mode)
> > 180                   return &CopyToRoutineCSV;
> >
> > (and likewise for CopyFromGetRoutine).  I realize that these
> > functions aren't called often enough for performance to be an
> > overriding concern, but it still seems like poor style.
> >
> >> Then I took a quick look at the newly rebased patch set and
> >> found Sutou has already fixed this issue.
> >
> > +1, except I'd suggest declaring the parameters as
> > "const CopyFormatOptions *opts".
>
> Thanks for pointing out this (and sorry for missing this in
> our reviews...)!
>
> How about the attached patch?
>
> I'll rebase the v35 patch set after this is fixed.
Thank you for reporting the issue and making the patch.
I agree with the fix and the patch looks good to me. I've updated the
commit message and am going to push, barring any objections.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
			
		Вложения
Hi, In <CAD21AoAwOP7p6LgmkPGqPuJ5KbJPPQsSZsFzwCDguwzr9F677Q@mail.gmail.com> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 3 Mar 2025 11:06:39 -0800, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > I agree with the fix and the patch looks good to me. I've updated the > commit message and am going to push, barring any objections. Thanks! I've rebased the patch set. Here is a summary again: > 0001-0003 are for COPY TO and 0004-0007 are for COPY FROM. > > For COPY TO: > > 0001: Add support for adding custom COPY TO format. This > uses tablesample like handler approach. We've discussed > other approaches such as USING+CREATE XXX approach but it > seems that other approaches are overkill for this case. > > See also: https://www.postgresql.org/message-id/flat/d838025aceeb19c9ff1db702fa55cabf%40postgrespro.ru#caca2799effc859f82f40ee8bec531d8 > > 0002: Export CopyToStateData to implement custom COPY TO > format as extension. > > 0003: Export a function and add a private space to > CopyToStateData to implement custom COPY TO format as > extension. > > We may want to squash 0002 and 0003 but splitting them will > be easy to review. Because 0002 just moves existing codes > (with some rename) and 0003 just adds some codes. If we > squash 0002 and 0003, moving and adding are mixed. > > For COPY FROM: > > 0004: This is COPY FROM version of 0001. > > 0005: 0002 has COPY_ prefix -> COPY_DEST_ prefix change for > enum CopyDest. This is similar change for enum CopySource. > > 0006: This is COPY FROM version of 0003. > > 0007: This is for easy to implement "ON_ERROR stop" and > "LOG_VERBOSITY verbose" in extension. > > We may want to squash 0005-0007 like for 0002-0003. Thanks, -- kou From 479c601915b30e4f67e5ed047c6fbf3e35702ec6 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Mon, 25 Nov 2024 12:19:15 +0900 Subject: [PATCH v36 1/7] Add support for adding custom COPY TO format This uses the handler approach like tablesample. The approach creates an internal function that returns an internal struct. In this case, a COPY TO handler returns a CopyToRoutine. This also add a test module for custom COPY TO handler. --- src/backend/commands/copy.c | 79 ++++++++++++++++--- src/backend/commands/copyto.c | 28 ++++++- src/backend/nodes/Makefile | 1 + src/backend/nodes/gen_node_support.pl | 2 + src/backend/utils/adt/pseudotypes.c | 1 + src/include/catalog/pg_proc.dat | 6 ++ src/include/catalog/pg_type.dat | 6 ++ src/include/commands/copy.h | 1 + src/include/commands/copyapi.h | 2 + src/include/nodes/meson.build | 1 + src/test/modules/Makefile | 1 + src/test/modules/meson.build | 1 + src/test/modules/test_copy_format/.gitignore | 4 + src/test/modules/test_copy_format/Makefile | 23 ++++++ .../expected/test_copy_format.out | 17 ++++ src/test/modules/test_copy_format/meson.build | 33 ++++++++ .../test_copy_format/sql/test_copy_format.sql | 5 ++ .../test_copy_format--1.0.sql | 8 ++ .../test_copy_format/test_copy_format.c | 63 +++++++++++++++ .../test_copy_format/test_copy_format.control | 4 + 20 files changed, 269 insertions(+), 17 deletions(-) mode change 100644 => 100755 src/backend/nodes/gen_node_support.pl create mode 100644 src/test/modules/test_copy_format/.gitignore create mode 100644 src/test/modules/test_copy_format/Makefile create mode 100644 src/test/modules/test_copy_format/expected/test_copy_format.out create mode 100644 src/test/modules/test_copy_format/meson.build create mode 100644 src/test/modules/test_copy_format/sql/test_copy_format.sql create mode 100644 src/test/modules/test_copy_format/test_copy_format--1.0.sql create mode 100644 src/test/modules/test_copy_format/test_copy_format.c create mode 100644 src/test/modules/test_copy_format/test_copy_format.control diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index cfca9d9dc29..8d94bc313eb 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -32,6 +32,7 @@ #include "parser/parse_coerce.h" #include "parser/parse_collate.h" #include "parser/parse_expr.h" +#include "parser/parse_func.h" #include "parser/parse_relation.h" #include "utils/acl.h" #include "utils/builtins.h" @@ -476,6 +477,70 @@ defGetCopyLogVerbosityChoice(DefElem *def, ParseState *pstate) return COPY_LOG_VERBOSITY_DEFAULT; /* keep compiler quiet */ } +/* + * Process the "format" option. + * + * This function checks whether the option value is a built-in format such as + * "text" and "csv" or not. If the option value isn't a built-in format, this + * function finds a COPY format handler that returns a CopyToRoutine (for + * is_from == false). If no COPY format handler is found, this function + * reports an error. + */ +static void +ProcessCopyOptionFormat(ParseState *pstate, + CopyFormatOptions *opts_out, + bool is_from, + DefElem *defel) +{ + char *format; + Oid funcargtypes[1]; + Oid handlerOid = InvalidOid; + + format = defGetString(defel); + + opts_out->csv_mode = false; + opts_out->binary = false; + /* built-in formats */ + if (strcmp(format, "text") == 0) + { + /* "csv_mode == false && binary == false" means "text" */ + return; + } + else if (strcmp(format, "csv") == 0) + { + opts_out->csv_mode = true; + return; + } + else if (strcmp(format, "binary") == 0) + { + opts_out->binary = true; + return; + } + + /* custom format */ + if (!is_from) + { + funcargtypes[0] = INTERNALOID; + handlerOid = LookupFuncName(list_make1(makeString(format)), 1, + funcargtypes, true); + } + if (!OidIsValid(handlerOid)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("COPY format \"%s\" not recognized", format), + parser_errposition(pstate, defel->location))); + + /* check that handler has correct return type */ + if (get_func_rettype(handlerOid) != COPY_HANDLEROID) + ereport(ERROR, + (errcode(ERRCODE_WRONG_OBJECT_TYPE), + errmsg("function %s must return type %s", + format, "copy_handler"), + parser_errposition(pstate, defel->location))); + + opts_out->handler = handlerOid; +} + /* * Process the statement option list for COPY. * @@ -519,22 +584,10 @@ ProcessCopyOptions(ParseState *pstate, if (strcmp(defel->defname, "format") == 0) { - char *fmt = defGetString(defel); - if (format_specified) errorConflictingDefElem(defel, pstate); format_specified = true; - if (strcmp(fmt, "text") == 0) - /* default format */ ; - else if (strcmp(fmt, "csv") == 0) - opts_out->csv_mode = true; - else if (strcmp(fmt, "binary") == 0) - opts_out->binary = true; - else - ereport(ERROR, - (errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("COPY format \"%s\" not recognized", fmt), - parser_errposition(pstate, defel->location))); + ProcessCopyOptionFormat(pstate, opts_out, is_from, defel); } else if (strcmp(defel->defname, "freeze") == 0) { diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c index 84a3f3879a8..fce8501dc30 100644 --- a/src/backend/commands/copyto.c +++ b/src/backend/commands/copyto.c @@ -150,6 +150,7 @@ static void CopySendInt16(CopyToState cstate, int16 val); /* text format */ static const CopyToRoutine CopyToRoutineText = { + .type = T_CopyToRoutine, .CopyToStart = CopyToTextLikeStart, .CopyToOutFunc = CopyToTextLikeOutFunc, .CopyToOneRow = CopyToTextOneRow, @@ -158,6 +159,7 @@ static const CopyToRoutine CopyToRoutineText = { /* CSV format */ static const CopyToRoutine CopyToRoutineCSV = { + .type = T_CopyToRoutine, .CopyToStart = CopyToTextLikeStart, .CopyToOutFunc = CopyToTextLikeOutFunc, .CopyToOneRow = CopyToCSVOneRow, @@ -166,6 +168,7 @@ static const CopyToRoutine CopyToRoutineCSV = { /* binary format */ static const CopyToRoutine CopyToRoutineBinary = { + .type = T_CopyToRoutine, .CopyToStart = CopyToBinaryStart, .CopyToOutFunc = CopyToBinaryOutFunc, .CopyToOneRow = CopyToBinaryOneRow, @@ -176,13 +179,30 @@ static const CopyToRoutine CopyToRoutineBinary = { static const CopyToRoutine * CopyToGetRoutine(const CopyFormatOptions *opts) { - if (opts->csv_mode) + if (OidIsValid(opts->handler)) + { + Datum datum; + Node *routine; + + datum = OidFunctionCall1(opts->handler, BoolGetDatum(false)); + routine = (Node *) DatumGetPointer(datum); + if (routine == NULL || !IsA(routine, CopyToRoutine)) + ereport( + ERROR, + (errcode( + ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("COPY handler function " + "%u did not return " + "CopyToRoutine struct", + opts->handler))); + return castNode(CopyToRoutine, routine); + } + else if (opts->csv_mode) return &CopyToRoutineCSV; else if (opts->binary) return &CopyToRoutineBinary; - - /* default is text */ - return &CopyToRoutineText; + else + return &CopyToRoutineText; } /* Implementation of the start callback for text and CSV formats */ diff --git a/src/backend/nodes/Makefile b/src/backend/nodes/Makefile index 77ddb9ca53f..dc6c1087361 100644 --- a/src/backend/nodes/Makefile +++ b/src/backend/nodes/Makefile @@ -50,6 +50,7 @@ node_headers = \ access/sdir.h \ access/tableam.h \ access/tsmapi.h \ + commands/copyapi.h \ commands/event_trigger.h \ commands/trigger.h \ executor/tuptable.h \ diff --git a/src/backend/nodes/gen_node_support.pl b/src/backend/nodes/gen_node_support.pl old mode 100644 new mode 100755 index 1a657f7e0ae..fb90635a245 --- a/src/backend/nodes/gen_node_support.pl +++ b/src/backend/nodes/gen_node_support.pl @@ -62,6 +62,7 @@ my @all_input_files = qw( access/sdir.h access/tableam.h access/tsmapi.h + commands/copyapi.h commands/event_trigger.h commands/trigger.h executor/tuptable.h @@ -86,6 +87,7 @@ my @nodetag_only_files = qw( access/sdir.h access/tableam.h access/tsmapi.h + commands/copyapi.h commands/event_trigger.h commands/trigger.h executor/tuptable.h diff --git a/src/backend/utils/adt/pseudotypes.c b/src/backend/utils/adt/pseudotypes.c index 317a1f2b282..f2ebc21ca56 100644 --- a/src/backend/utils/adt/pseudotypes.c +++ b/src/backend/utils/adt/pseudotypes.c @@ -370,6 +370,7 @@ PSEUDOTYPE_DUMMY_IO_FUNCS(fdw_handler); PSEUDOTYPE_DUMMY_IO_FUNCS(table_am_handler); PSEUDOTYPE_DUMMY_IO_FUNCS(index_am_handler); PSEUDOTYPE_DUMMY_IO_FUNCS(tsm_handler); +PSEUDOTYPE_DUMMY_IO_FUNCS(copy_handler); PSEUDOTYPE_DUMMY_IO_FUNCS(internal); PSEUDOTYPE_DUMMY_IO_FUNCS(anyelement); PSEUDOTYPE_DUMMY_IO_FUNCS(anynonarray); diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index cd9422d0bac..9e7737168c4 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -7809,6 +7809,12 @@ { oid => '3312', descr => 'I/O', proname => 'tsm_handler_out', prorettype => 'cstring', proargtypes => 'tsm_handler', prosrc => 'tsm_handler_out' }, +{ oid => '8753', descr => 'I/O', + proname => 'copy_handler_in', proisstrict => 'f', prorettype => 'copy_handler', + proargtypes => 'cstring', prosrc => 'copy_handler_in' }, +{ oid => '8754', descr => 'I/O', + proname => 'copy_handler_out', prorettype => 'cstring', + proargtypes => 'copy_handler', prosrc => 'copy_handler_out' }, { oid => '267', descr => 'I/O', proname => 'table_am_handler_in', proisstrict => 'f', prorettype => 'table_am_handler', proargtypes => 'cstring', diff --git a/src/include/catalog/pg_type.dat b/src/include/catalog/pg_type.dat index 6dca77e0a22..340e0cd0a8d 100644 --- a/src/include/catalog/pg_type.dat +++ b/src/include/catalog/pg_type.dat @@ -633,6 +633,12 @@ typcategory => 'P', typinput => 'tsm_handler_in', typoutput => 'tsm_handler_out', typreceive => '-', typsend => '-', typalign => 'i' }, +{ oid => '8752', + descr => 'pseudo-type for the result of a copy to method function', + typname => 'copy_handler', typlen => '4', typbyval => 't', typtype => 'p', + typcategory => 'P', typinput => 'copy_handler_in', + typoutput => 'copy_handler_out', typreceive => '-', typsend => '-', + typalign => 'i' }, { oid => '269', descr => 'pseudo-type for the result of a table AM handler function', typname => 'table_am_handler', typlen => '4', typbyval => 't', typtype => 'p', diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h index 06dfdfef721..332628d67cc 100644 --- a/src/include/commands/copy.h +++ b/src/include/commands/copy.h @@ -87,6 +87,7 @@ typedef struct CopyFormatOptions CopyLogVerbosityChoice log_verbosity; /* verbosity of logged messages */ int64 reject_limit; /* maximum tolerable number of errors */ List *convert_select; /* list of column names (can be NIL) */ + Oid handler; /* handler function for custom format routine */ } CopyFormatOptions; /* These are private in commands/copy[from|to].c */ diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index 2a2d2f9876b..4f4ffabf882 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -22,6 +22,8 @@ */ typedef struct CopyToRoutine { + NodeTag type; + /* * Set output function information. This callback is called once at the * beginning of COPY TO. diff --git a/src/include/nodes/meson.build b/src/include/nodes/meson.build index d1ca24dd32f..96e70e7f38b 100644 --- a/src/include/nodes/meson.build +++ b/src/include/nodes/meson.build @@ -12,6 +12,7 @@ node_support_input_i = [ 'access/sdir.h', 'access/tableam.h', 'access/tsmapi.h', + 'commands/copyapi.h', 'commands/event_trigger.h', 'commands/trigger.h', 'executor/tuptable.h', diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile index 4e4be3fa511..c9da440eed0 100644 --- a/src/test/modules/Makefile +++ b/src/test/modules/Makefile @@ -16,6 +16,7 @@ SUBDIRS = \ spgist_name_ops \ test_bloomfilter \ test_copy_callbacks \ + test_copy_format \ test_custom_rmgrs \ test_ddl_deparse \ test_dsa \ diff --git a/src/test/modules/meson.build b/src/test/modules/meson.build index 2b057451473..d33bbbd4092 100644 --- a/src/test/modules/meson.build +++ b/src/test/modules/meson.build @@ -15,6 +15,7 @@ subdir('spgist_name_ops') subdir('ssl_passphrase_callback') subdir('test_bloomfilter') subdir('test_copy_callbacks') +subdir('test_copy_format') subdir('test_custom_rmgrs') subdir('test_ddl_deparse') subdir('test_dsa') diff --git a/src/test/modules/test_copy_format/.gitignore b/src/test/modules/test_copy_format/.gitignore new file mode 100644 index 00000000000..5dcb3ff9723 --- /dev/null +++ b/src/test/modules/test_copy_format/.gitignore @@ -0,0 +1,4 @@ +# Generated subdirectories +/log/ +/results/ +/tmp_check/ diff --git a/src/test/modules/test_copy_format/Makefile b/src/test/modules/test_copy_format/Makefile new file mode 100644 index 00000000000..8497f91624d --- /dev/null +++ b/src/test/modules/test_copy_format/Makefile @@ -0,0 +1,23 @@ +# src/test/modules/test_copy_format/Makefile + +MODULE_big = test_copy_format +OBJS = \ + $(WIN32RES) \ + test_copy_format.o +PGFILEDESC = "test_copy_format - test custom COPY FORMAT" + +EXTENSION = test_copy_format +DATA = test_copy_format--1.0.sql + +REGRESS = test_copy_format + +ifdef USE_PGXS +PG_CONFIG = pg_config +PGXS := $(shell $(PG_CONFIG) --pgxs) +include $(PGXS) +else +subdir = src/test/modules/test_copy_format +top_builddir = ../../../.. +include $(top_builddir)/src/Makefile.global +include $(top_srcdir)/contrib/contrib-global.mk +endif diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out b/src/test/modules/test_copy_format/expected/test_copy_format.out new file mode 100644 index 00000000000..adfe7d1572a --- /dev/null +++ b/src/test/modules/test_copy_format/expected/test_copy_format.out @@ -0,0 +1,17 @@ +CREATE EXTENSION test_copy_format; +CREATE TABLE public.test (a smallint, b integer, c bigint); +INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789); +COPY public.test FROM stdin WITH (FORMAT 'test_copy_format'); +ERROR: COPY format "test_copy_format" not recognized +LINE 1: COPY public.test FROM stdin WITH (FORMAT 'test_copy_format')... + ^ +COPY public.test TO stdout WITH (FORMAT 'test_copy_format'); +NOTICE: test_copy_format: is_from=false +NOTICE: CopyToOutFunc: atttypid=21 +NOTICE: CopyToOutFunc: atttypid=23 +NOTICE: CopyToOutFunc: atttypid=20 +NOTICE: CopyToStart: natts=3 +NOTICE: CopyToOneRow: tts_nvalid=3 +NOTICE: CopyToOneRow: tts_nvalid=3 +NOTICE: CopyToOneRow: tts_nvalid=3 +NOTICE: CopyToEnd diff --git a/src/test/modules/test_copy_format/meson.build b/src/test/modules/test_copy_format/meson.build new file mode 100644 index 00000000000..a45a2e0a039 --- /dev/null +++ b/src/test/modules/test_copy_format/meson.build @@ -0,0 +1,33 @@ +# Copyright (c) 2025, PostgreSQL Global Development Group + +test_copy_format_sources = files( + 'test_copy_format.c', +) + +if host_system == 'windows' + test_copy_format_sources += rc_lib_gen.process(win32ver_rc, extra_args: [ + '--NAME', 'test_copy_format', + '--FILEDESC', 'test_copy_format - test custom COPY FORMAT',]) +endif + +test_copy_format = shared_module('test_copy_format', + test_copy_format_sources, + kwargs: pg_test_mod_args, +) +test_install_libs += test_copy_format + +test_install_data += files( + 'test_copy_format.control', + 'test_copy_format--1.0.sql', +) + +tests += { + 'name': 'test_copy_format', + 'sd': meson.current_source_dir(), + 'bd': meson.current_build_dir(), + 'regress': { + 'sql': [ + 'test_copy_format', + ], + }, +} diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql b/src/test/modules/test_copy_format/sql/test_copy_format.sql new file mode 100644 index 00000000000..810b3d8cedc --- /dev/null +++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql @@ -0,0 +1,5 @@ +CREATE EXTENSION test_copy_format; +CREATE TABLE public.test (a smallint, b integer, c bigint); +INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789); +COPY public.test FROM stdin WITH (FORMAT 'test_copy_format'); +COPY public.test TO stdout WITH (FORMAT 'test_copy_format'); diff --git a/src/test/modules/test_copy_format/test_copy_format--1.0.sql b/src/test/modules/test_copy_format/test_copy_format--1.0.sql new file mode 100644 index 00000000000..d24ea03ce99 --- /dev/null +++ b/src/test/modules/test_copy_format/test_copy_format--1.0.sql @@ -0,0 +1,8 @@ +/* src/test/modules/test_copy_format/test_copy_format--1.0.sql */ + +-- complain if script is sourced in psql, rather than via CREATE EXTENSION +\echo Use "CREATE EXTENSION test_copy_format" to load this file. \quit + +CREATE FUNCTION test_copy_format(internal) + RETURNS copy_handler + AS 'MODULE_PATHNAME' LANGUAGE C; diff --git a/src/test/modules/test_copy_format/test_copy_format.c b/src/test/modules/test_copy_format/test_copy_format.c new file mode 100644 index 00000000000..b42d472d851 --- /dev/null +++ b/src/test/modules/test_copy_format/test_copy_format.c @@ -0,0 +1,63 @@ +/*-------------------------------------------------------------------------- + * + * test_copy_format.c + * Code for testing custom COPY format. + * + * Portions Copyright (c) 2025, PostgreSQL Global Development Group + * + * IDENTIFICATION + * src/test/modules/test_copy_format/test_copy_format.c + * + * ------------------------------------------------------------------------- + */ + +#include "postgres.h" + +#include "commands/copyapi.h" +#include "commands/defrem.h" + +PG_MODULE_MAGIC; + +static void +CopyToOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo) +{ + ereport(NOTICE, (errmsg("CopyToOutFunc: atttypid=%d", atttypid))); +} + +static void +CopyToStart(CopyToState cstate, TupleDesc tupDesc) +{ + ereport(NOTICE, (errmsg("CopyToStart: natts=%d", tupDesc->natts))); +} + +static void +CopyToOneRow(CopyToState cstate, TupleTableSlot *slot) +{ + ereport(NOTICE, (errmsg("CopyToOneRow: tts_nvalid=%u", slot->tts_nvalid))); +} + +static void +CopyToEnd(CopyToState cstate) +{ + ereport(NOTICE, (errmsg("CopyToEnd"))); +} + +static const CopyToRoutine CopyToRoutineTestCopyFormat = { + .type = T_CopyToRoutine, + .CopyToOutFunc = CopyToOutFunc, + .CopyToStart = CopyToStart, + .CopyToOneRow = CopyToOneRow, + .CopyToEnd = CopyToEnd, +}; + +PG_FUNCTION_INFO_V1(test_copy_format); +Datum +test_copy_format(PG_FUNCTION_ARGS) +{ + bool is_from = PG_GETARG_BOOL(0); + + ereport(NOTICE, + (errmsg("test_copy_format: is_from=%s", is_from ? "true" : "false"))); + + PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat); +} diff --git a/src/test/modules/test_copy_format/test_copy_format.control b/src/test/modules/test_copy_format/test_copy_format.control new file mode 100644 index 00000000000..f05a6362358 --- /dev/null +++ b/src/test/modules/test_copy_format/test_copy_format.control @@ -0,0 +1,4 @@ +comment = 'Test code for custom COPY format' +default_version = '1.0' +module_pathname = '$libdir/test_copy_format' +relocatable = true -- 2.47.2 From 8768ecf89603767a3a3f9f96d569e5bc8c8f0c81 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Mon, 25 Nov 2024 13:58:33 +0900 Subject: [PATCH v36 2/7] Export CopyToStateData as private data It's for custom COPY TO format handlers implemented as extension. This just moves codes. This doesn't change codes except CopyDest enum values. CopyDest/CopyFrom enum values such as COPY_FILE are conflicted each other. So COPY_DEST_ prefix instead of COPY_ prefix is used for CopyDest enum values. For example, COPY_FILE in CopyDest is renamed to COPY_DEST_FILE. Note that this isn't enough to implement custom COPY TO format handlers as extension. We'll do the followings in a subsequent commit: 1. Add an opaque space for custom COPY TO format handler 2. Export CopySendEndOfRow() to flush buffer --- src/backend/commands/copyto.c | 78 +++--------------------- src/include/commands/copy.h | 2 +- src/include/commands/copyto_internal.h | 83 ++++++++++++++++++++++++++ 3 files changed, 93 insertions(+), 70 deletions(-) create mode 100644 src/include/commands/copyto_internal.h diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c index fce8501dc30..99c2f2dd699 100644 --- a/src/backend/commands/copyto.c +++ b/src/backend/commands/copyto.c @@ -20,6 +20,7 @@ #include "access/tableam.h" #include "commands/copyapi.h" +#include "commands/copyto_internal.h" #include "commands/progress.h" #include "executor/execdesc.h" #include "executor/executor.h" @@ -36,67 +37,6 @@ #include "utils/rel.h" #include "utils/snapmgr.h" -/* - * Represents the different dest cases we need to worry about at - * the bottom level - */ -typedef enum CopyDest -{ - COPY_FILE, /* to file (or a piped program) */ - COPY_FRONTEND, /* to frontend */ - COPY_CALLBACK, /* to callback function */ -} CopyDest; - -/* - * This struct contains all the state variables used throughout a COPY TO - * operation. - * - * Multi-byte encodings: all supported client-side encodings encode multi-byte - * characters by having the first byte's high bit set. Subsequent bytes of the - * character can have the high bit not set. When scanning data in such an - * encoding to look for a match to a single-byte (ie ASCII) character, we must - * use the full pg_encoding_mblen() machinery to skip over multibyte - * characters, else we might find a false match to a trailing byte. In - * supported server encodings, there is no possibility of a false match, and - * it's faster to make useless comparisons to trailing bytes than it is to - * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true - * when we have to do it the hard way. - */ -typedef struct CopyToStateData -{ - /* format-specific routines */ - const CopyToRoutine *routine; - - /* low-level state data */ - CopyDest copy_dest; /* type of copy source/destination */ - FILE *copy_file; /* used if copy_dest == COPY_FILE */ - StringInfo fe_msgbuf; /* used for all dests during COPY TO */ - - int file_encoding; /* file or remote side's character encoding */ - bool need_transcoding; /* file encoding diff from server? */ - bool encoding_embeds_ascii; /* ASCII can be non-first byte? */ - - /* parameters from the COPY command */ - Relation rel; /* relation to copy to */ - QueryDesc *queryDesc; /* executable query to copy from */ - List *attnumlist; /* integer list of attnums to copy */ - char *filename; /* filename, or NULL for STDOUT */ - bool is_program; /* is 'filename' a program to popen? */ - copy_data_dest_cb data_dest_cb; /* function for writing data */ - - CopyFormatOptions opts; - Node *whereClause; /* WHERE condition (or NULL) */ - - /* - * Working state - */ - MemoryContext copycontext; /* per-copy execution context */ - - FmgrInfo *out_functions; /* lookup info for output functions */ - MemoryContext rowcontext; /* per-row evaluation context */ - uint64 bytes_processed; /* number of bytes processed so far */ -} CopyToStateData; - /* DestReceiver for COPY (query) TO */ typedef struct { @@ -421,7 +361,7 @@ SendCopyBegin(CopyToState cstate) for (i = 0; i < natts; i++) pq_sendint16(&buf, format); /* per-column formats */ pq_endmessage(&buf); - cstate->copy_dest = COPY_FRONTEND; + cstate->copy_dest = COPY_DEST_FRONTEND; } static void @@ -468,7 +408,7 @@ CopySendEndOfRow(CopyToState cstate) switch (cstate->copy_dest) { - case COPY_FILE: + case COPY_DEST_FILE: if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1, cstate->copy_file) != 1 || ferror(cstate->copy_file)) @@ -502,11 +442,11 @@ CopySendEndOfRow(CopyToState cstate) errmsg("could not write to COPY file: %m"))); } break; - case COPY_FRONTEND: + case COPY_DEST_FRONTEND: /* Dump the accumulated row as one CopyData message */ (void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len); break; - case COPY_CALLBACK: + case COPY_DEST_CALLBACK: cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len); break; } @@ -527,7 +467,7 @@ CopySendTextLikeEndOfRow(CopyToState cstate) { switch (cstate->copy_dest) { - case COPY_FILE: + case COPY_DEST_FILE: /* Default line termination depends on platform */ #ifndef WIN32 CopySendChar(cstate, '\n'); @@ -535,7 +475,7 @@ CopySendTextLikeEndOfRow(CopyToState cstate) CopySendString(cstate, "\r\n"); #endif break; - case COPY_FRONTEND: + case COPY_DEST_FRONTEND: /* The FE/BE protocol uses \n as newline for all platforms */ CopySendChar(cstate, '\n'); break; @@ -920,12 +860,12 @@ BeginCopyTo(ParseState *pstate, /* See Multibyte encoding comment above */ cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding); - cstate->copy_dest = COPY_FILE; /* default */ + cstate->copy_dest = COPY_DEST_FILE; /* default */ if (data_dest_cb) { progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK; - cstate->copy_dest = COPY_CALLBACK; + cstate->copy_dest = COPY_DEST_CALLBACK; cstate->data_dest_cb = data_dest_cb; } else if (pipe) diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h index 332628d67cc..6df1f8a3b9b 100644 --- a/src/include/commands/copy.h +++ b/src/include/commands/copy.h @@ -90,7 +90,7 @@ typedef struct CopyFormatOptions Oid handler; /* handler function for custom format routine */ } CopyFormatOptions; -/* These are private in commands/copy[from|to].c */ +/* These are private in commands/copy[from|to]_internal.h */ typedef struct CopyFromStateData *CopyFromState; typedef struct CopyToStateData *CopyToState; diff --git a/src/include/commands/copyto_internal.h b/src/include/commands/copyto_internal.h new file mode 100644 index 00000000000..1b58b36c0a3 --- /dev/null +++ b/src/include/commands/copyto_internal.h @@ -0,0 +1,83 @@ +/*------------------------------------------------------------------------- + * + * copyto_internal.h + * Internal definitions for COPY TO command. + * + * + * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + * src/include/commands/copyto_internal.h + * + *------------------------------------------------------------------------- + */ +#ifndef COPYTO_INTERNAL_H +#define COPYTO_INTERNAL_H + +#include "commands/copy.h" +#include "executor/execdesc.h" +#include "executor/tuptable.h" +#include "nodes/execnodes.h" + +/* + * Represents the different dest cases we need to worry about at + * the bottom level + */ +typedef enum CopyDest +{ + COPY_DEST_FILE, /* to file (or a piped program) */ + COPY_DEST_FRONTEND, /* to frontend */ + COPY_DEST_CALLBACK, /* to callback function */ +} CopyDest; + +/* + * This struct contains all the state variables used throughout a COPY TO + * operation. + * + * Multi-byte encodings: all supported client-side encodings encode multi-byte + * characters by having the first byte's high bit set. Subsequent bytes of the + * character can have the high bit not set. When scanning data in such an + * encoding to look for a match to a single-byte (ie ASCII) character, we must + * use the full pg_encoding_mblen() machinery to skip over multibyte + * characters, else we might find a false match to a trailing byte. In + * supported server encodings, there is no possibility of a false match, and + * it's faster to make useless comparisons to trailing bytes than it is to + * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true + * when we have to do it the hard way. + */ +typedef struct CopyToStateData +{ + /* format-specific routines */ + const struct CopyToRoutine *routine; + + /* low-level state data */ + CopyDest copy_dest; /* type of copy source/destination */ + FILE *copy_file; /* used if copy_dest == COPY_FILE */ + StringInfo fe_msgbuf; /* used for all dests during COPY TO */ + + int file_encoding; /* file or remote side's character encoding */ + bool need_transcoding; /* file encoding diff from server? */ + bool encoding_embeds_ascii; /* ASCII can be non-first byte? */ + + /* parameters from the COPY command */ + Relation rel; /* relation to copy to */ + QueryDesc *queryDesc; /* executable query to copy from */ + List *attnumlist; /* integer list of attnums to copy */ + char *filename; /* filename, or NULL for STDOUT */ + bool is_program; /* is 'filename' a program to popen? */ + copy_data_dest_cb data_dest_cb; /* function for writing data */ + + CopyFormatOptions opts; + Node *whereClause; /* WHERE condition (or NULL) */ + + /* + * Working state + */ + MemoryContext copycontext; /* per-copy execution context */ + + FmgrInfo *out_functions; /* lookup info for output functions */ + MemoryContext rowcontext; /* per-row evaluation context */ + uint64 bytes_processed; /* number of bytes processed so far */ +} CopyToStateData; + +#endif /* COPYTO_INTERNAL_H */ -- 2.47.2 From 427ec09b5cb438e7884d04027acdc207e9d4c80e Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Mon, 25 Nov 2024 14:01:18 +0900 Subject: [PATCH v36 3/7] Add support for implementing custom COPY TO format as extension * Add CopyToStateData::opaque that can be used to keep data for custom COPY TO format implementation * Export CopySendEndOfRow() to flush data in CopyToStateData::fe_msgbuf as CopyToStateFlush() --- src/backend/commands/copyto.c | 12 ++++++++++++ src/include/commands/copyapi.h | 2 ++ src/include/commands/copyto_internal.h | 3 +++ 3 files changed, 17 insertions(+) diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c index 99c2f2dd699..f5ed3efbace 100644 --- a/src/backend/commands/copyto.c +++ b/src/backend/commands/copyto.c @@ -458,6 +458,18 @@ CopySendEndOfRow(CopyToState cstate) resetStringInfo(fe_msgbuf); } +/* + * Export CopySendEndOfRow() for extensions. We want to keep + * CopySendEndOfRow() as a static function for + * optimization. CopySendEndOfRow() calls in this file may be optimized by a + * compiler. + */ +void +CopyToStateFlush(CopyToState cstate) +{ + CopySendEndOfRow(cstate); +} + /* * Wrapper function of CopySendEndOfRow for text and CSV formats. Sends the * line termination and do common appropriate things for the end of row. diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index 4f4ffabf882..5c5ea6592e3 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -56,6 +56,8 @@ typedef struct CopyToRoutine void (*CopyToEnd) (CopyToState cstate); } CopyToRoutine; +extern void CopyToStateFlush(CopyToState cstate); + /* * API structure for a COPY FROM format implementation. Note this must be * allocated in a server-lifetime manner, typically as a static const struct. diff --git a/src/include/commands/copyto_internal.h b/src/include/commands/copyto_internal.h index 1b58b36c0a3..ce1c33a4004 100644 --- a/src/include/commands/copyto_internal.h +++ b/src/include/commands/copyto_internal.h @@ -78,6 +78,9 @@ typedef struct CopyToStateData FmgrInfo *out_functions; /* lookup info for output functions */ MemoryContext rowcontext; /* per-row evaluation context */ uint64 bytes_processed; /* number of bytes processed so far */ + + /* For custom format implementation */ + void *opaque; /* private space */ } CopyToStateData; #endif /* COPYTO_INTERNAL_H */ -- 2.47.2 From fb9fb349e1d08a1adaf07658ccc46e8ea1439a80 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Mon, 25 Nov 2024 14:11:55 +0900 Subject: [PATCH v36 4/7] Add support for adding custom COPY FROM format This uses the same handler for COPY TO and COPY FROM but uses different routine. This uses CopyToRoutine for COPY TO and CopyFromRoutine for COPY FROM. PostgreSQL calls a COPY TO/FROM handler with "is_from" argument. It's true for COPY FROM and false for COPY TO: copy_handler(true) returns CopyToRoutine copy_handler(false) returns CopyFromRoutine This also add a test module for custom COPY FROM handler. --- src/backend/commands/copy.c | 13 +++---- src/backend/commands/copyfrom.c | 28 +++++++++++-- src/include/catalog/pg_type.dat | 2 +- src/include/commands/copyapi.h | 2 + .../expected/test_copy_format.out | 10 +++-- .../test_copy_format/sql/test_copy_format.sql | 1 + .../test_copy_format/test_copy_format.c | 39 ++++++++++++++++++- 7 files changed, 78 insertions(+), 17 deletions(-) diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index 8d94bc313eb..b4417bb6819 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -483,8 +483,8 @@ defGetCopyLogVerbosityChoice(DefElem *def, ParseState *pstate) * This function checks whether the option value is a built-in format such as * "text" and "csv" or not. If the option value isn't a built-in format, this * function finds a COPY format handler that returns a CopyToRoutine (for - * is_from == false). If no COPY format handler is found, this function - * reports an error. + * is_from == false) or CopyFromRountine (for is_from == true). If no COPY + * format handler is found, this function reports an error. */ static void ProcessCopyOptionFormat(ParseState *pstate, @@ -518,12 +518,9 @@ ProcessCopyOptionFormat(ParseState *pstate, } /* custom format */ - if (!is_from) - { - funcargtypes[0] = INTERNALOID; - handlerOid = LookupFuncName(list_make1(makeString(format)), 1, - funcargtypes, true); - } + funcargtypes[0] = INTERNALOID; + handlerOid = LookupFuncName(list_make1(makeString(format)), 1, + funcargtypes, true); if (!OidIsValid(handlerOid)) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index bcf66f0adf8..0809766f910 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -129,6 +129,7 @@ static void CopyFromBinaryEnd(CopyFromState cstate); /* text format */ static const CopyFromRoutine CopyFromRoutineText = { + .type = T_CopyFromRoutine, .CopyFromInFunc = CopyFromTextLikeInFunc, .CopyFromStart = CopyFromTextLikeStart, .CopyFromOneRow = CopyFromTextOneRow, @@ -137,6 +138,7 @@ static const CopyFromRoutine CopyFromRoutineText = { /* CSV format */ static const CopyFromRoutine CopyFromRoutineCSV = { + .type = T_CopyFromRoutine, .CopyFromInFunc = CopyFromTextLikeInFunc, .CopyFromStart = CopyFromTextLikeStart, .CopyFromOneRow = CopyFromCSVOneRow, @@ -145,6 +147,7 @@ static const CopyFromRoutine CopyFromRoutineCSV = { /* binary format */ static const CopyFromRoutine CopyFromRoutineBinary = { + .type = T_CopyFromRoutine, .CopyFromInFunc = CopyFromBinaryInFunc, .CopyFromStart = CopyFromBinaryStart, .CopyFromOneRow = CopyFromBinaryOneRow, @@ -155,13 +158,30 @@ static const CopyFromRoutine CopyFromRoutineBinary = { static const CopyFromRoutine * CopyFromGetRoutine(const CopyFormatOptions *opts) { - if (opts->csv_mode) + if (OidIsValid(opts->handler)) + { + Datum datum; + Node *routine; + + datum = OidFunctionCall1(opts->handler, BoolGetDatum(true)); + routine = (Node *) DatumGetPointer(datum); + if (routine == NULL || !IsA(routine, CopyFromRoutine)) + ereport( + ERROR, + (errcode( + ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("COPY handler function " + "%u did not return " + "CopyFromRoutine struct", + opts->handler))); + return castNode(CopyFromRoutine, routine); + } + else if (opts->csv_mode) return &CopyFromRoutineCSV; else if (opts->binary) return &CopyFromRoutineBinary; - - /* default is text */ - return &CopyFromRoutineText; + else + return &CopyFromRoutineText; } /* Implementation of the start callback for text and CSV formats */ diff --git a/src/include/catalog/pg_type.dat b/src/include/catalog/pg_type.dat index 340e0cd0a8d..63b7d65f982 100644 --- a/src/include/catalog/pg_type.dat +++ b/src/include/catalog/pg_type.dat @@ -634,7 +634,7 @@ typoutput => 'tsm_handler_out', typreceive => '-', typsend => '-', typalign => 'i' }, { oid => '8752', - descr => 'pseudo-type for the result of a copy to method function', + descr => 'pseudo-type for the result of a copy to/from method function', typname => 'copy_handler', typlen => '4', typbyval => 't', typtype => 'p', typcategory => 'P', typinput => 'copy_handler_in', typoutput => 'copy_handler_out', typreceive => '-', typsend => '-', diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index 5c5ea6592e3..895c105d8d8 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -64,6 +64,8 @@ extern void CopyToStateFlush(CopyToState cstate); */ typedef struct CopyFromRoutine { + NodeTag type; + /* * Set input function information. This callback is called once at the * beginning of COPY FROM. diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out b/src/test/modules/test_copy_format/expected/test_copy_format.out index adfe7d1572a..016893e7026 100644 --- a/src/test/modules/test_copy_format/expected/test_copy_format.out +++ b/src/test/modules/test_copy_format/expected/test_copy_format.out @@ -2,9 +2,13 @@ CREATE EXTENSION test_copy_format; CREATE TABLE public.test (a smallint, b integer, c bigint); INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789); COPY public.test FROM stdin WITH (FORMAT 'test_copy_format'); -ERROR: COPY format "test_copy_format" not recognized -LINE 1: COPY public.test FROM stdin WITH (FORMAT 'test_copy_format')... - ^ +NOTICE: test_copy_format: is_from=true +NOTICE: CopyFromInFunc: atttypid=21 +NOTICE: CopyFromInFunc: atttypid=23 +NOTICE: CopyFromInFunc: atttypid=20 +NOTICE: CopyFromStart: natts=3 +NOTICE: CopyFromOneRow +NOTICE: CopyFromEnd COPY public.test TO stdout WITH (FORMAT 'test_copy_format'); NOTICE: test_copy_format: is_from=false NOTICE: CopyToOutFunc: atttypid=21 diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql b/src/test/modules/test_copy_format/sql/test_copy_format.sql index 810b3d8cedc..0dfdfa00080 100644 --- a/src/test/modules/test_copy_format/sql/test_copy_format.sql +++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql @@ -2,4 +2,5 @@ CREATE EXTENSION test_copy_format; CREATE TABLE public.test (a smallint, b integer, c bigint); INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789); COPY public.test FROM stdin WITH (FORMAT 'test_copy_format'); +\. COPY public.test TO stdout WITH (FORMAT 'test_copy_format'); diff --git a/src/test/modules/test_copy_format/test_copy_format.c b/src/test/modules/test_copy_format/test_copy_format.c index b42d472d851..abafc668463 100644 --- a/src/test/modules/test_copy_format/test_copy_format.c +++ b/src/test/modules/test_copy_format/test_copy_format.c @@ -18,6 +18,40 @@ PG_MODULE_MAGIC; +static void +CopyFromInFunc(CopyFromState cstate, Oid atttypid, + FmgrInfo *finfo, Oid *typioparam) +{ + ereport(NOTICE, (errmsg("CopyFromInFunc: atttypid=%d", atttypid))); +} + +static void +CopyFromStart(CopyFromState cstate, TupleDesc tupDesc) +{ + ereport(NOTICE, (errmsg("CopyFromStart: natts=%d", tupDesc->natts))); +} + +static bool +CopyFromOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls) +{ + ereport(NOTICE, (errmsg("CopyFromOneRow"))); + return false; +} + +static void +CopyFromEnd(CopyFromState cstate) +{ + ereport(NOTICE, (errmsg("CopyFromEnd"))); +} + +static const CopyFromRoutine CopyFromRoutineTestCopyFormat = { + .type = T_CopyFromRoutine, + .CopyFromInFunc = CopyFromInFunc, + .CopyFromStart = CopyFromStart, + .CopyFromOneRow = CopyFromOneRow, + .CopyFromEnd = CopyFromEnd, +}; + static void CopyToOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo) { @@ -59,5 +93,8 @@ test_copy_format(PG_FUNCTION_ARGS) ereport(NOTICE, (errmsg("test_copy_format: is_from=%s", is_from ? "true" : "false"))); - PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat); + if (is_from) + PG_RETURN_POINTER(&CopyFromRoutineTestCopyFormat); + else + PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat); } -- 2.47.2 From dcf1598a31557af0ade5758430884489a7c8e610 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Mon, 25 Nov 2024 14:19:34 +0900 Subject: [PATCH v36 5/7] Use COPY_SOURCE_ prefix for CopySource enum values This is for consistency with CopyDest. --- src/backend/commands/copyfrom.c | 4 ++-- src/backend/commands/copyfromparse.c | 10 +++++----- src/include/commands/copyfrom_internal.h | 6 +++--- 3 files changed, 10 insertions(+), 10 deletions(-) diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index 0809766f910..76662e04260 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -1729,7 +1729,7 @@ BeginCopyFrom(ParseState *pstate, pg_encoding_to_char(GetDatabaseEncoding())))); } - cstate->copy_src = COPY_FILE; /* default */ + cstate->copy_src = COPY_SOURCE_FILE; /* default */ cstate->whereClause = whereClause; @@ -1857,7 +1857,7 @@ BeginCopyFrom(ParseState *pstate, if (data_source_cb) { progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK; - cstate->copy_src = COPY_CALLBACK; + cstate->copy_src = COPY_SOURCE_CALLBACK; cstate->data_source_cb = data_source_cb; } else if (pipe) diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index e8128f85e6b..17e51f02e04 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -180,7 +180,7 @@ ReceiveCopyBegin(CopyFromState cstate) for (i = 0; i < natts; i++) pq_sendint16(&buf, format); /* per-column formats */ pq_endmessage(&buf); - cstate->copy_src = COPY_FRONTEND; + cstate->copy_src = COPY_SOURCE_FRONTEND; cstate->fe_msgbuf = makeStringInfo(); /* We *must* flush here to ensure FE knows it can send. */ pq_flush(); @@ -248,7 +248,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread) switch (cstate->copy_src) { - case COPY_FILE: + case COPY_SOURCE_FILE: bytesread = fread(databuf, 1, maxread, cstate->copy_file); if (ferror(cstate->copy_file)) ereport(ERROR, @@ -257,7 +257,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread) if (bytesread == 0) cstate->raw_reached_eof = true; break; - case COPY_FRONTEND: + case COPY_SOURCE_FRONTEND: while (maxread > 0 && bytesread < minread && !cstate->raw_reached_eof) { int avail; @@ -340,7 +340,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread) bytesread += avail; } break; - case COPY_CALLBACK: + case COPY_SOURCE_CALLBACK: bytesread = cstate->data_source_cb(databuf, minread, maxread); break; } @@ -1172,7 +1172,7 @@ CopyReadLine(CopyFromState cstate, bool is_csv) * after \. up to the protocol end of copy data. (XXX maybe better * not to treat \. as special?) */ - if (cstate->copy_src == COPY_FRONTEND) + if (cstate->copy_src == COPY_SOURCE_FRONTEND) { int inbytes; diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h index c8b22af22d8..3a306e3286e 100644 --- a/src/include/commands/copyfrom_internal.h +++ b/src/include/commands/copyfrom_internal.h @@ -24,9 +24,9 @@ */ typedef enum CopySource { - COPY_FILE, /* from file (or a piped program) */ - COPY_FRONTEND, /* from frontend */ - COPY_CALLBACK, /* from callback function */ + COPY_SOURCE_FILE, /* from file (or a piped program) */ + COPY_SOURCE_FRONTEND, /* from frontend */ + COPY_SOURCE_CALLBACK, /* from callback function */ } CopySource; /* -- 2.47.2 From d8657b0d0d295c82ec2807a9767bb7fbe18c85fa Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Mon, 25 Nov 2024 14:21:39 +0900 Subject: [PATCH v36 6/7] Add support for implementing custom COPY FROM format as extension * Add CopyFromStateData::opaque that can be used to keep data for custom COPY From format implementation * Export CopyGetData() to get the next data as CopyFromStateGetData() --- src/backend/commands/copyfromparse.c | 11 +++++++++++ src/include/commands/copyapi.h | 2 ++ src/include/commands/copyfrom_internal.h | 3 +++ 3 files changed, 16 insertions(+) diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index 17e51f02e04..d8fd238e72b 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -739,6 +739,17 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes) return copied_bytes; } +/* + * Export CopyGetData() for extensions. We want to keep CopyGetData() as a + * static function for optimization. CopyGetData() calls in this file may be + * optimized by a compiler. + */ +int +CopyFromStateGetData(CopyFromState cstate, void *dest, int minread, int maxread) +{ + return CopyGetData(cstate, dest, minread, maxread); +} + /* * This function is exposed for use by extensions that read raw fields in the * next line. See NextCopyFromRawFieldsInternal() for details. diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index 895c105d8d8..2044d8b8c4c 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -108,4 +108,6 @@ typedef struct CopyFromRoutine void (*CopyFromEnd) (CopyFromState cstate); } CopyFromRoutine; +extern int CopyFromStateGetData(CopyFromState cstate, void *dest, int minread, int maxread); + #endif /* COPYAPI_H */ diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h index 3a306e3286e..af425cf5fd9 100644 --- a/src/include/commands/copyfrom_internal.h +++ b/src/include/commands/copyfrom_internal.h @@ -181,6 +181,9 @@ typedef struct CopyFromStateData #define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index) uint64 bytes_processed; /* number of bytes processed so far */ + + /* For custom format implementation */ + void *opaque; /* private space */ } CopyFromStateData; extern void ReceiveCopyBegin(CopyFromState cstate); -- 2.47.2 From ddd4b9f09dfe7860195adf82dbaab5ee46e91cf0 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Wed, 27 Nov 2024 16:23:55 +0900 Subject: [PATCH v36 7/7] Add CopyFromSkipErrorRow() for custom COPY format extension Extensions must call CopyFromSkipErrorRow() when CopyFromOneRow callback reports an error by errsave(). CopyFromSkipErrorRow() handles "ON_ERROR stop" and "LOG_VERBOSITY verbose" cases. --- src/backend/commands/copyfromparse.c | 82 +++++++++++-------- src/include/commands/copyapi.h | 2 + .../expected/test_copy_format.out | 47 +++++++++++ .../test_copy_format/sql/test_copy_format.sql | 24 ++++++ .../test_copy_format/test_copy_format.c | 80 +++++++++++++++++- 5 files changed, 198 insertions(+), 37 deletions(-) diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index d8fd238e72b..2070f51a963 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -938,6 +938,51 @@ CopyFromCSVOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, return CopyFromTextLikeOneRow(cstate, econtext, values, nulls, true); } +/* + * Call this when you report an error by errsave() in your CopyFromOneRow + * callback. This handles "ON_ERROR stop" and "LOG_VERBOSITY verbose" cases + * for you. + */ +void +CopyFromSkipErrorRow(CopyFromState cstate) +{ + Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP); + + cstate->num_errors++; + + if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE) + { + /* + * Since we emit line number and column info in the below notice + * message, we suppress error context information other than the + * relation name. + */ + Assert(!cstate->relname_only); + cstate->relname_only = true; + + if (cstate->cur_attval) + { + char *attval; + + attval = CopyLimitPrintoutLength(cstate->cur_attval); + ereport(NOTICE, + errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": \"%s\"", + (unsigned long long) cstate->cur_lineno, + cstate->cur_attname, + attval)); + pfree(attval); + } + else + ereport(NOTICE, + errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": null input", + (unsigned long long) cstate->cur_lineno, + cstate->cur_attname)); + + /* reset relname_only */ + cstate->relname_only = false; + } +} + /* * Workhorse for CopyFromTextOneRow() and CopyFromCSVOneRow(). * @@ -1044,42 +1089,7 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext, (Node *) cstate->escontext, &values[m])) { - Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP); - - cstate->num_errors++; - - if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE) - { - /* - * Since we emit line number and column info in the below - * notice message, we suppress error context information other - * than the relation name. - */ - Assert(!cstate->relname_only); - cstate->relname_only = true; - - if (cstate->cur_attval) - { - char *attval; - - attval = CopyLimitPrintoutLength(cstate->cur_attval); - ereport(NOTICE, - errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": \"%s\"", - (unsigned long long) cstate->cur_lineno, - cstate->cur_attname, - attval)); - pfree(attval); - } - else - ereport(NOTICE, - errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": null input", - (unsigned long long) cstate->cur_lineno, - cstate->cur_attname)); - - /* reset relname_only */ - cstate->relname_only = false; - } - + CopyFromSkipErrorRow(cstate); return true; } diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index 2044d8b8c4c..500ece7d5bb 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -110,4 +110,6 @@ typedef struct CopyFromRoutine extern int CopyFromStateGetData(CopyFromState cstate, void *dest, int minread, int maxread); +extern void CopyFromSkipErrorRow(CopyFromState cstate); + #endif /* COPYAPI_H */ diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out b/src/test/modules/test_copy_format/expected/test_copy_format.out index 016893e7026..b9a6baa85c0 100644 --- a/src/test/modules/test_copy_format/expected/test_copy_format.out +++ b/src/test/modules/test_copy_format/expected/test_copy_format.out @@ -1,6 +1,8 @@ CREATE EXTENSION test_copy_format; CREATE TABLE public.test (a smallint, b integer, c bigint); INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789); +-- 987 is accepted. +-- 654 is a hard error because ON_ERROR is stop by default. COPY public.test FROM stdin WITH (FORMAT 'test_copy_format'); NOTICE: test_copy_format: is_from=true NOTICE: CopyFromInFunc: atttypid=21 @@ -8,7 +10,50 @@ NOTICE: CopyFromInFunc: atttypid=23 NOTICE: CopyFromInFunc: atttypid=20 NOTICE: CopyFromStart: natts=3 NOTICE: CopyFromOneRow +NOTICE: CopyFromOneRow +ERROR: invalid value: "6" +CONTEXT: COPY test, line 2, column a: "6" +-- 987 is accepted. +-- 654 is a soft error because ON_ERROR is ignore. +COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore); +NOTICE: test_copy_format: is_from=true +NOTICE: CopyFromInFunc: atttypid=21 +NOTICE: CopyFromInFunc: atttypid=23 +NOTICE: CopyFromInFunc: atttypid=20 +NOTICE: CopyFromStart: natts=3 +NOTICE: CopyFromOneRow +NOTICE: CopyFromOneRow +NOTICE: CopyFromOneRow +NOTICE: 1 row was skipped due to data type incompatibility NOTICE: CopyFromEnd +-- 987 is accepted. +-- 654 is a soft error because ON_ERROR is ignore. +COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore, LOG_VERBOSITY verbose); +NOTICE: test_copy_format: is_from=true +NOTICE: CopyFromInFunc: atttypid=21 +NOTICE: CopyFromInFunc: atttypid=23 +NOTICE: CopyFromInFunc: atttypid=20 +NOTICE: CopyFromStart: natts=3 +NOTICE: CopyFromOneRow +NOTICE: CopyFromOneRow +NOTICE: skipping row due to data type incompatibility at line 2 for column "a": "6" +NOTICE: CopyFromOneRow +NOTICE: 1 row was skipped due to data type incompatibility +NOTICE: CopyFromEnd +-- 987 is accepted. +-- 654 is a soft error because ON_ERROR is ignore. +-- 321 is a hard error. +COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore); +NOTICE: test_copy_format: is_from=true +NOTICE: CopyFromInFunc: atttypid=21 +NOTICE: CopyFromInFunc: atttypid=23 +NOTICE: CopyFromInFunc: atttypid=20 +NOTICE: CopyFromStart: natts=3 +NOTICE: CopyFromOneRow +NOTICE: CopyFromOneRow +NOTICE: CopyFromOneRow +ERROR: too much lines: 3 +CONTEXT: COPY test, line 3 COPY public.test TO stdout WITH (FORMAT 'test_copy_format'); NOTICE: test_copy_format: is_from=false NOTICE: CopyToOutFunc: atttypid=21 @@ -18,4 +63,6 @@ NOTICE: CopyToStart: natts=3 NOTICE: CopyToOneRow: tts_nvalid=3 NOTICE: CopyToOneRow: tts_nvalid=3 NOTICE: CopyToOneRow: tts_nvalid=3 +NOTICE: CopyToOneRow: tts_nvalid=3 +NOTICE: CopyToOneRow: tts_nvalid=3 NOTICE: CopyToEnd diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql b/src/test/modules/test_copy_format/sql/test_copy_format.sql index 0dfdfa00080..86db71bce7f 100644 --- a/src/test/modules/test_copy_format/sql/test_copy_format.sql +++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql @@ -1,6 +1,30 @@ CREATE EXTENSION test_copy_format; CREATE TABLE public.test (a smallint, b integer, c bigint); INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789); +-- 987 is accepted. +-- 654 is a hard error because ON_ERROR is stop by default. COPY public.test FROM stdin WITH (FORMAT 'test_copy_format'); +987 +654 +\. +-- 987 is accepted. +-- 654 is a soft error because ON_ERROR is ignore. +COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore); +987 +654 +\. +-- 987 is accepted. +-- 654 is a soft error because ON_ERROR is ignore. +COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore, LOG_VERBOSITY verbose); +987 +654 +\. +-- 987 is accepted. +-- 654 is a soft error because ON_ERROR is ignore. +-- 321 is a hard error. +COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore); +987 +654 +321 \. COPY public.test TO stdout WITH (FORMAT 'test_copy_format'); diff --git a/src/test/modules/test_copy_format/test_copy_format.c b/src/test/modules/test_copy_format/test_copy_format.c index abafc668463..96a54dab7ec 100644 --- a/src/test/modules/test_copy_format/test_copy_format.c +++ b/src/test/modules/test_copy_format/test_copy_format.c @@ -14,6 +14,7 @@ #include "postgres.h" #include "commands/copyapi.h" +#include "commands/copyfrom_internal.h" #include "commands/defrem.h" PG_MODULE_MAGIC; @@ -34,8 +35,85 @@ CopyFromStart(CopyFromState cstate, TupleDesc tupDesc) static bool CopyFromOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls) { + int n_attributes = list_length(cstate->attnumlist); + char *line; + int line_size = n_attributes + 1; /* +1 is for new line */ + int read_bytes; + ereport(NOTICE, (errmsg("CopyFromOneRow"))); - return false; + + cstate->cur_lineno++; + line = palloc(line_size); + read_bytes = CopyFromStateGetData(cstate, line, line_size, line_size); + if (read_bytes == 0) + return false; + if (read_bytes != line_size) + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("one line must be %d bytes: %d", + line_size, read_bytes))); + + if (cstate->cur_lineno == 1) + { + /* Success */ + TupleDesc tupDesc = RelationGetDescr(cstate->rel); + ListCell *cur; + int i = 0; + + foreach(cur, cstate->attnumlist) + { + int attnum = lfirst_int(cur); + int m = attnum - 1; + Form_pg_attribute att = TupleDescAttr(tupDesc, m); + + if (att->atttypid == INT2OID) + { + values[i] = Int16GetDatum(line[i] - '0'); + } + else if (att->atttypid == INT4OID) + { + values[i] = Int32GetDatum(line[i] - '0'); + } + else if (att->atttypid == INT8OID) + { + values[i] = Int64GetDatum(line[i] - '0'); + } + nulls[i] = false; + i++; + } + } + else if (cstate->cur_lineno == 2) + { + /* Soft error */ + TupleDesc tupDesc = RelationGetDescr(cstate->rel); + int attnum = lfirst_int(list_head(cstate->attnumlist)); + int m = attnum - 1; + Form_pg_attribute att = TupleDescAttr(tupDesc, m); + char value[2]; + + cstate->cur_attname = NameStr(att->attname); + value[0] = line[0]; + value[1] = '\0'; + cstate->cur_attval = value; + errsave((Node *) cstate->escontext, + ( + errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), + errmsg("invalid value: \"%c\"", line[0]))); + CopyFromSkipErrorRow(cstate); + cstate->cur_attname = NULL; + cstate->cur_attval = NULL; + return true; + } + else + { + /* Hard error */ + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), + errmsg("too much lines: %llu", + (unsigned long long) cstate->cur_lineno))); + } + + return true; } static void -- 2.47.2
On Tue, Mar 4, 2025 at 4:06 PM Sutou Kouhei <kou@clear-code.com> wrote:
>
> Hi,
>
> In <CAD21AoAwOP7p6LgmkPGqPuJ5KbJPPQsSZsFzwCDguwzr9F677Q@mail.gmail.com>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 3 Mar 2025 11:06:39 -0800,
>   Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> > I agree with the fix and the patch looks good to me. I've updated the
> > commit message and am going to push, barring any objections.
>
> Thanks!
>
> I've rebased the patch set. Here is a summary again:
Thank you for updating the patches. Here are some review comments on
the 0001 patch:
+   if (strcmp(format, "text") == 0)
+   {
+       /* "csv_mode == false && binary == false" means "text" */
+       return;
+   }
+   else if (strcmp(format, "csv") == 0)
+   {
+       opts_out->csv_mode = true;
+       return;
+   }
+   else if (strcmp(format, "binary") == 0)
+   {
+       opts_out->binary = true;
+       return;
+   }
+
+   /* custom format */
+   if (!is_from)
+   {
+       funcargtypes[0] = INTERNALOID;
+       handlerOid = LookupFuncName(list_make1(makeString(format)), 1,
+                                   funcargtypes, true);
+   }
+   if (!OidIsValid(handlerOid))
+       ereport(ERROR,
+               (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                errmsg("COPY format \"%s\" not recognized", format),
+                parser_errposition(pstate, defel->location)));
I think that built-in formats also need to have their handler
functions. This seems to be a conventional way for customizable
features such as tablesample and access methods, and we can simplify
this function.
---
I think we need to update the documentation to describe how users can
define the handler functions and what each callback function is
responsible for.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
			
		Hi,
In <CAD21AoDU=bYRDDY8MzCXAfg4h9XTeTBdM-wVJaO1t4UcseCpuA@mail.gmail.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 17 Mar 2025 13:50:03 -0700,
  Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> I think that built-in formats also need to have their handler
> functions. This seems to be a conventional way for customizable
> features such as tablesample and access methods, and we can simplify
> this function.
OK. 0008 in the attached v37 patch set does it.
> I think we need to update the documentation to describe how users can
> define the handler functions and what each callback function is
> responsible for.
I agree with it but we haven't finalized public APIs yet. Can
we defer it after we finalize public APIs? (Proposed public
APIs exist in 0003, 0006 and 0007.)
And could someone help (take over if possible) writing a
document for this feature? I'm not good at writing a
document in English... 0009 in the attached v37 patch set
has a draft of it. It's based on existing documents in
doc/src/sgml/ and *.h.
0001-0007 aren't changed from v36 patch set.
Thanks,
-- 
kou
From bd21411860e1ac8f751b77658bc7f1978a110d5f Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Mon, 25 Nov 2024 12:19:15 +0900
Subject: [PATCH v37 1/9] Add support for adding custom COPY TO format
This uses the handler approach like tablesample. The approach creates
an internal function that returns an internal struct. In this case,
a COPY TO handler returns a CopyToRoutine.
This also add a test module for custom COPY TO handler.
---
 src/backend/commands/copy.c                   | 79 ++++++++++++++++---
 src/backend/commands/copyto.c                 | 28 ++++++-
 src/backend/nodes/Makefile                    |  1 +
 src/backend/nodes/gen_node_support.pl         |  2 +
 src/backend/utils/adt/pseudotypes.c           |  1 +
 src/include/catalog/pg_proc.dat               |  6 ++
 src/include/catalog/pg_type.dat               |  6 ++
 src/include/commands/copy.h                   |  1 +
 src/include/commands/copyapi.h                |  2 +
 src/include/nodes/meson.build                 |  1 +
 src/test/modules/Makefile                     |  1 +
 src/test/modules/meson.build                  |  1 +
 src/test/modules/test_copy_format/.gitignore  |  4 +
 src/test/modules/test_copy_format/Makefile    | 23 ++++++
 .../expected/test_copy_format.out             | 17 ++++
 src/test/modules/test_copy_format/meson.build | 33 ++++++++
 .../test_copy_format/sql/test_copy_format.sql |  5 ++
 .../test_copy_format--1.0.sql                 |  8 ++
 .../test_copy_format/test_copy_format.c       | 63 +++++++++++++++
 .../test_copy_format/test_copy_format.control |  4 +
 20 files changed, 269 insertions(+), 17 deletions(-)
 mode change 100644 => 100755 src/backend/nodes/gen_node_support.pl
 create mode 100644 src/test/modules/test_copy_format/.gitignore
 create mode 100644 src/test/modules/test_copy_format/Makefile
 create mode 100644 src/test/modules/test_copy_format/expected/test_copy_format.out
 create mode 100644 src/test/modules/test_copy_format/meson.build
 create mode 100644 src/test/modules/test_copy_format/sql/test_copy_format.sql
 create mode 100644 src/test/modules/test_copy_format/test_copy_format--1.0.sql
 create mode 100644 src/test/modules/test_copy_format/test_copy_format.c
 create mode 100644 src/test/modules/test_copy_format/test_copy_format.control
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfca9d9dc29..8d94bc313eb 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -32,6 +32,7 @@
 #include "parser/parse_coerce.h"
 #include "parser/parse_collate.h"
 #include "parser/parse_expr.h"
+#include "parser/parse_func.h"
 #include "parser/parse_relation.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
@@ -476,6 +477,70 @@ defGetCopyLogVerbosityChoice(DefElem *def, ParseState *pstate)
     return COPY_LOG_VERBOSITY_DEFAULT;    /* keep compiler quiet */
 }
 
+/*
+ * Process the "format" option.
+ *
+ * This function checks whether the option value is a built-in format such as
+ * "text" and "csv" or not. If the option value isn't a built-in format, this
+ * function finds a COPY format handler that returns a CopyToRoutine (for
+ * is_from == false). If no COPY format handler is found, this function
+ * reports an error.
+ */
+static void
+ProcessCopyOptionFormat(ParseState *pstate,
+                        CopyFormatOptions *opts_out,
+                        bool is_from,
+                        DefElem *defel)
+{
+    char       *format;
+    Oid            funcargtypes[1];
+    Oid            handlerOid = InvalidOid;
+
+    format = defGetString(defel);
+
+    opts_out->csv_mode = false;
+    opts_out->binary = false;
+    /* built-in formats */
+    if (strcmp(format, "text") == 0)
+    {
+        /* "csv_mode == false && binary == false" means "text" */
+        return;
+    }
+    else if (strcmp(format, "csv") == 0)
+    {
+        opts_out->csv_mode = true;
+        return;
+    }
+    else if (strcmp(format, "binary") == 0)
+    {
+        opts_out->binary = true;
+        return;
+    }
+
+    /* custom format */
+    if (!is_from)
+    {
+        funcargtypes[0] = INTERNALOID;
+        handlerOid = LookupFuncName(list_make1(makeString(format)), 1,
+                                    funcargtypes, true);
+    }
+    if (!OidIsValid(handlerOid))
+        ereport(ERROR,
+                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                 errmsg("COPY format \"%s\" not recognized", format),
+                 parser_errposition(pstate, defel->location)));
+
+    /* check that handler has correct return type */
+    if (get_func_rettype(handlerOid) != COPY_HANDLEROID)
+        ereport(ERROR,
+                (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+                 errmsg("function %s must return type %s",
+                        format, "copy_handler"),
+                 parser_errposition(pstate, defel->location)));
+
+    opts_out->handler = handlerOid;
+}
+
 /*
  * Process the statement option list for COPY.
  *
@@ -519,22 +584,10 @@ ProcessCopyOptions(ParseState *pstate,
 
         if (strcmp(defel->defname, "format") == 0)
         {
-            char       *fmt = defGetString(defel);
-
             if (format_specified)
                 errorConflictingDefElem(defel, pstate);
             format_specified = true;
-            if (strcmp(fmt, "text") == 0)
-                 /* default format */ ;
-            else if (strcmp(fmt, "csv") == 0)
-                opts_out->csv_mode = true;
-            else if (strcmp(fmt, "binary") == 0)
-                opts_out->binary = true;
-            else
-                ereport(ERROR,
-                        (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-                         errmsg("COPY format \"%s\" not recognized", fmt),
-                         parser_errposition(pstate, defel->location)));
+            ProcessCopyOptionFormat(pstate, opts_out, is_from, defel);
         }
         else if (strcmp(defel->defname, "freeze") == 0)
         {
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 84a3f3879a8..fce8501dc30 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -150,6 +150,7 @@ static void CopySendInt16(CopyToState cstate, int16 val);
 
 /* text format */
 static const CopyToRoutine CopyToRoutineText = {
+    .type = T_CopyToRoutine,
     .CopyToStart = CopyToTextLikeStart,
     .CopyToOutFunc = CopyToTextLikeOutFunc,
     .CopyToOneRow = CopyToTextOneRow,
@@ -158,6 +159,7 @@ static const CopyToRoutine CopyToRoutineText = {
 
 /* CSV format */
 static const CopyToRoutine CopyToRoutineCSV = {
+    .type = T_CopyToRoutine,
     .CopyToStart = CopyToTextLikeStart,
     .CopyToOutFunc = CopyToTextLikeOutFunc,
     .CopyToOneRow = CopyToCSVOneRow,
@@ -166,6 +168,7 @@ static const CopyToRoutine CopyToRoutineCSV = {
 
 /* binary format */
 static const CopyToRoutine CopyToRoutineBinary = {
+    .type = T_CopyToRoutine,
     .CopyToStart = CopyToBinaryStart,
     .CopyToOutFunc = CopyToBinaryOutFunc,
     .CopyToOneRow = CopyToBinaryOneRow,
@@ -176,13 +179,30 @@ static const CopyToRoutine CopyToRoutineBinary = {
 static const CopyToRoutine *
 CopyToGetRoutine(const CopyFormatOptions *opts)
 {
-    if (opts->csv_mode)
+    if (OidIsValid(opts->handler))
+    {
+        Datum        datum;
+        Node       *routine;
+
+        datum = OidFunctionCall1(opts->handler, BoolGetDatum(false));
+        routine = (Node *) DatumGetPointer(datum);
+        if (routine == NULL || !IsA(routine, CopyToRoutine))
+            ereport(
+                    ERROR,
+                    (errcode(
+                             ERRCODE_INVALID_PARAMETER_VALUE),
+                     errmsg("COPY handler function "
+                            "%u did not return "
+                            "CopyToRoutine struct",
+                            opts->handler)));
+        return castNode(CopyToRoutine, routine);
+    }
+    else if (opts->csv_mode)
         return &CopyToRoutineCSV;
     else if (opts->binary)
         return &CopyToRoutineBinary;
-
-    /* default is text */
-    return &CopyToRoutineText;
+    else
+        return &CopyToRoutineText;
 }
 
 /* Implementation of the start callback for text and CSV formats */
diff --git a/src/backend/nodes/Makefile b/src/backend/nodes/Makefile
index 77ddb9ca53f..dc6c1087361 100644
--- a/src/backend/nodes/Makefile
+++ b/src/backend/nodes/Makefile
@@ -50,6 +50,7 @@ node_headers = \
     access/sdir.h \
     access/tableam.h \
     access/tsmapi.h \
+    commands/copyapi.h \
     commands/event_trigger.h \
     commands/trigger.h \
     executor/tuptable.h \
diff --git a/src/backend/nodes/gen_node_support.pl b/src/backend/nodes/gen_node_support.pl
old mode 100644
new mode 100755
index 1a657f7e0ae..fb90635a245
--- a/src/backend/nodes/gen_node_support.pl
+++ b/src/backend/nodes/gen_node_support.pl
@@ -62,6 +62,7 @@ my @all_input_files = qw(
   access/sdir.h
   access/tableam.h
   access/tsmapi.h
+  commands/copyapi.h
   commands/event_trigger.h
   commands/trigger.h
   executor/tuptable.h
@@ -86,6 +87,7 @@ my @nodetag_only_files = qw(
   access/sdir.h
   access/tableam.h
   access/tsmapi.h
+  commands/copyapi.h
   commands/event_trigger.h
   commands/trigger.h
   executor/tuptable.h
diff --git a/src/backend/utils/adt/pseudotypes.c b/src/backend/utils/adt/pseudotypes.c
index 317a1f2b282..f2ebc21ca56 100644
--- a/src/backend/utils/adt/pseudotypes.c
+++ b/src/backend/utils/adt/pseudotypes.c
@@ -370,6 +370,7 @@ PSEUDOTYPE_DUMMY_IO_FUNCS(fdw_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(table_am_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(index_am_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(tsm_handler);
+PSEUDOTYPE_DUMMY_IO_FUNCS(copy_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(internal);
 PSEUDOTYPE_DUMMY_IO_FUNCS(anyelement);
 PSEUDOTYPE_DUMMY_IO_FUNCS(anynonarray);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 890822eaf79..7c2a510fa3f 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -7838,6 +7838,12 @@
 { oid => '3312', descr => 'I/O',
   proname => 'tsm_handler_out', prorettype => 'cstring',
   proargtypes => 'tsm_handler', prosrc => 'tsm_handler_out' },
+{ oid => '8753', descr => 'I/O',
+  proname => 'copy_handler_in', proisstrict => 'f', prorettype => 'copy_handler',
+  proargtypes => 'cstring', prosrc => 'copy_handler_in' },
+{ oid => '8754', descr => 'I/O',
+  proname => 'copy_handler_out', prorettype => 'cstring',
+  proargtypes => 'copy_handler', prosrc => 'copy_handler_out' },
 { oid => '267', descr => 'I/O',
   proname => 'table_am_handler_in', proisstrict => 'f',
   prorettype => 'table_am_handler', proargtypes => 'cstring',
diff --git a/src/include/catalog/pg_type.dat b/src/include/catalog/pg_type.dat
index 6dca77e0a22..340e0cd0a8d 100644
--- a/src/include/catalog/pg_type.dat
+++ b/src/include/catalog/pg_type.dat
@@ -633,6 +633,12 @@
   typcategory => 'P', typinput => 'tsm_handler_in',
   typoutput => 'tsm_handler_out', typreceive => '-', typsend => '-',
   typalign => 'i' },
+{ oid => '8752',
+  descr => 'pseudo-type for the result of a copy to method function',
+  typname => 'copy_handler', typlen => '4', typbyval => 't', typtype => 'p',
+  typcategory => 'P', typinput => 'copy_handler_in',
+  typoutput => 'copy_handler_out', typreceive => '-', typsend => '-',
+  typalign => 'i' },
 { oid => '269',
   descr => 'pseudo-type for the result of a table AM handler function',
   typname => 'table_am_handler', typlen => '4', typbyval => 't', typtype => 'p',
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 06dfdfef721..332628d67cc 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -87,6 +87,7 @@ typedef struct CopyFormatOptions
     CopyLogVerbosityChoice log_verbosity;    /* verbosity of logged messages */
     int64        reject_limit;    /* maximum tolerable number of errors */
     List       *convert_select; /* list of column names (can be NIL) */
+    Oid            handler;        /* handler function for custom format routine */
 } CopyFormatOptions;
 
 /* These are private in commands/copy[from|to].c */
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 2a2d2f9876b..4f4ffabf882 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -22,6 +22,8 @@
  */
 typedef struct CopyToRoutine
 {
+    NodeTag        type;
+
     /*
      * Set output function information. This callback is called once at the
      * beginning of COPY TO.
diff --git a/src/include/nodes/meson.build b/src/include/nodes/meson.build
index d1ca24dd32f..96e70e7f38b 100644
--- a/src/include/nodes/meson.build
+++ b/src/include/nodes/meson.build
@@ -12,6 +12,7 @@ node_support_input_i = [
   'access/sdir.h',
   'access/tableam.h',
   'access/tsmapi.h',
+  'commands/copyapi.h',
   'commands/event_trigger.h',
   'commands/trigger.h',
   'executor/tuptable.h',
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index 4e4be3fa511..c9da440eed0 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -16,6 +16,7 @@ SUBDIRS = \
           spgist_name_ops \
           test_bloomfilter \
           test_copy_callbacks \
+          test_copy_format \
           test_custom_rmgrs \
           test_ddl_deparse \
           test_dsa \
diff --git a/src/test/modules/meson.build b/src/test/modules/meson.build
index 2b057451473..d33bbbd4092 100644
--- a/src/test/modules/meson.build
+++ b/src/test/modules/meson.build
@@ -15,6 +15,7 @@ subdir('spgist_name_ops')
 subdir('ssl_passphrase_callback')
 subdir('test_bloomfilter')
 subdir('test_copy_callbacks')
+subdir('test_copy_format')
 subdir('test_custom_rmgrs')
 subdir('test_ddl_deparse')
 subdir('test_dsa')
diff --git a/src/test/modules/test_copy_format/.gitignore b/src/test/modules/test_copy_format/.gitignore
new file mode 100644
index 00000000000..5dcb3ff9723
--- /dev/null
+++ b/src/test/modules/test_copy_format/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/src/test/modules/test_copy_format/Makefile b/src/test/modules/test_copy_format/Makefile
new file mode 100644
index 00000000000..8497f91624d
--- /dev/null
+++ b/src/test/modules/test_copy_format/Makefile
@@ -0,0 +1,23 @@
+# src/test/modules/test_copy_format/Makefile
+
+MODULE_big = test_copy_format
+OBJS = \
+    $(WIN32RES) \
+    test_copy_format.o
+PGFILEDESC = "test_copy_format - test custom COPY FORMAT"
+
+EXTENSION = test_copy_format
+DATA = test_copy_format--1.0.sql
+
+REGRESS = test_copy_format
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_copy_format
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out
b/src/test/modules/test_copy_format/expected/test_copy_format.out
new file mode 100644
index 00000000000..adfe7d1572a
--- /dev/null
+++ b/src/test/modules/test_copy_format/expected/test_copy_format.out
@@ -0,0 +1,17 @@
+CREATE EXTENSION test_copy_format;
+CREATE TABLE public.test (a smallint, b integer, c bigint);
+INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
+ERROR:  COPY format "test_copy_format" not recognized
+LINE 1: COPY public.test FROM stdin WITH (FORMAT 'test_copy_format')...
+                                          ^
+COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
+NOTICE:  test_copy_format: is_from=false
+NOTICE:  CopyToOutFunc: atttypid=21
+NOTICE:  CopyToOutFunc: atttypid=23
+NOTICE:  CopyToOutFunc: atttypid=20
+NOTICE:  CopyToStart: natts=3
+NOTICE:  CopyToOneRow: tts_nvalid=3
+NOTICE:  CopyToOneRow: tts_nvalid=3
+NOTICE:  CopyToOneRow: tts_nvalid=3
+NOTICE:  CopyToEnd
diff --git a/src/test/modules/test_copy_format/meson.build b/src/test/modules/test_copy_format/meson.build
new file mode 100644
index 00000000000..a45a2e0a039
--- /dev/null
+++ b/src/test/modules/test_copy_format/meson.build
@@ -0,0 +1,33 @@
+# Copyright (c) 2025, PostgreSQL Global Development Group
+
+test_copy_format_sources = files(
+  'test_copy_format.c',
+)
+
+if host_system == 'windows'
+  test_copy_format_sources += rc_lib_gen.process(win32ver_rc, extra_args: [
+    '--NAME', 'test_copy_format',
+    '--FILEDESC', 'test_copy_format - test custom COPY FORMAT',])
+endif
+
+test_copy_format = shared_module('test_copy_format',
+  test_copy_format_sources,
+  kwargs: pg_test_mod_args,
+)
+test_install_libs += test_copy_format
+
+test_install_data += files(
+  'test_copy_format.control',
+  'test_copy_format--1.0.sql',
+)
+
+tests += {
+  'name': 'test_copy_format',
+  'sd': meson.current_source_dir(),
+  'bd': meson.current_build_dir(),
+  'regress': {
+    'sql': [
+      'test_copy_format',
+    ],
+  },
+}
diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql
b/src/test/modules/test_copy_format/sql/test_copy_format.sql
new file mode 100644
index 00000000000..810b3d8cedc
--- /dev/null
+++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql
@@ -0,0 +1,5 @@
+CREATE EXTENSION test_copy_format;
+CREATE TABLE public.test (a smallint, b integer, c bigint);
+INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
+COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
diff --git a/src/test/modules/test_copy_format/test_copy_format--1.0.sql
b/src/test/modules/test_copy_format/test_copy_format--1.0.sql
new file mode 100644
index 00000000000..d24ea03ce99
--- /dev/null
+++ b/src/test/modules/test_copy_format/test_copy_format--1.0.sql
@@ -0,0 +1,8 @@
+/* src/test/modules/test_copy_format/test_copy_format--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_copy_format" to load this file. \quit
+
+CREATE FUNCTION test_copy_format(internal)
+    RETURNS copy_handler
+    AS 'MODULE_PATHNAME' LANGUAGE C;
diff --git a/src/test/modules/test_copy_format/test_copy_format.c
b/src/test/modules/test_copy_format/test_copy_format.c
new file mode 100644
index 00000000000..b42d472d851
--- /dev/null
+++ b/src/test/modules/test_copy_format/test_copy_format.c
@@ -0,0 +1,63 @@
+/*--------------------------------------------------------------------------
+ *
+ * test_copy_format.c
+ *        Code for testing custom COPY format.
+ *
+ * Portions Copyright (c) 2025, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *        src/test/modules/test_copy_format/test_copy_format.c
+ *
+ * -------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "commands/copyapi.h"
+#include "commands/defrem.h"
+
+PG_MODULE_MAGIC;
+
+static void
+CopyToOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo)
+{
+    ereport(NOTICE, (errmsg("CopyToOutFunc: atttypid=%d", atttypid)));
+}
+
+static void
+CopyToStart(CopyToState cstate, TupleDesc tupDesc)
+{
+    ereport(NOTICE, (errmsg("CopyToStart: natts=%d", tupDesc->natts)));
+}
+
+static void
+CopyToOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+    ereport(NOTICE, (errmsg("CopyToOneRow: tts_nvalid=%u", slot->tts_nvalid)));
+}
+
+static void
+CopyToEnd(CopyToState cstate)
+{
+    ereport(NOTICE, (errmsg("CopyToEnd")));
+}
+
+static const CopyToRoutine CopyToRoutineTestCopyFormat = {
+    .type = T_CopyToRoutine,
+    .CopyToOutFunc = CopyToOutFunc,
+    .CopyToStart = CopyToStart,
+    .CopyToOneRow = CopyToOneRow,
+    .CopyToEnd = CopyToEnd,
+};
+
+PG_FUNCTION_INFO_V1(test_copy_format);
+Datum
+test_copy_format(PG_FUNCTION_ARGS)
+{
+    bool        is_from = PG_GETARG_BOOL(0);
+
+    ereport(NOTICE,
+            (errmsg("test_copy_format: is_from=%s", is_from ? "true" : "false")));
+
+    PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat);
+}
diff --git a/src/test/modules/test_copy_format/test_copy_format.control
b/src/test/modules/test_copy_format/test_copy_format.control
new file mode 100644
index 00000000000..f05a6362358
--- /dev/null
+++ b/src/test/modules/test_copy_format/test_copy_format.control
@@ -0,0 +1,4 @@
+comment = 'Test code for custom COPY format'
+default_version = '1.0'
+module_pathname = '$libdir/test_copy_format'
+relocatable = true
-- 
2.47.2
From e59ca937314580133c4590350d9f9d67bb417b2e Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Mon, 25 Nov 2024 13:58:33 +0900
Subject: [PATCH v37 2/9] Export CopyToStateData as private data
It's for custom COPY TO format handlers implemented as extension.
This just moves codes. This doesn't change codes except CopyDest enum
values. CopyDest/CopyFrom enum values such as COPY_FILE are conflicted
each other. So COPY_DEST_ prefix instead of COPY_ prefix is used for
CopyDest enum values. For example, COPY_FILE in CopyDest is renamed to
COPY_DEST_FILE.
Note that this isn't enough to implement custom COPY TO format
handlers as extension. We'll do the followings in a subsequent commit:
1. Add an opaque space for custom COPY TO format handler
2. Export CopySendEndOfRow() to flush buffer
---
 src/backend/commands/copyto.c          | 78 +++---------------------
 src/include/commands/copy.h            |  2 +-
 src/include/commands/copyto_internal.h | 83 ++++++++++++++++++++++++++
 3 files changed, 93 insertions(+), 70 deletions(-)
 create mode 100644 src/include/commands/copyto_internal.h
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index fce8501dc30..99c2f2dd699 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -20,6 +20,7 @@
 
 #include "access/tableam.h"
 #include "commands/copyapi.h"
+#include "commands/copyto_internal.h"
 #include "commands/progress.h"
 #include "executor/execdesc.h"
 #include "executor/executor.h"
@@ -36,67 +37,6 @@
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
-/*
- * Represents the different dest cases we need to worry about at
- * the bottom level
- */
-typedef enum CopyDest
-{
-    COPY_FILE,                    /* to file (or a piped program) */
-    COPY_FRONTEND,                /* to frontend */
-    COPY_CALLBACK,                /* to callback function */
-} CopyDest;
-
-/*
- * This struct contains all the state variables used throughout a COPY TO
- * operation.
- *
- * Multi-byte encodings: all supported client-side encodings encode multi-byte
- * characters by having the first byte's high bit set. Subsequent bytes of the
- * character can have the high bit not set. When scanning data in such an
- * encoding to look for a match to a single-byte (ie ASCII) character, we must
- * use the full pg_encoding_mblen() machinery to skip over multibyte
- * characters, else we might find a false match to a trailing byte. In
- * supported server encodings, there is no possibility of a false match, and
- * it's faster to make useless comparisons to trailing bytes than it is to
- * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true
- * when we have to do it the hard way.
- */
-typedef struct CopyToStateData
-{
-    /* format-specific routines */
-    const CopyToRoutine *routine;
-
-    /* low-level state data */
-    CopyDest    copy_dest;        /* type of copy source/destination */
-    FILE       *copy_file;        /* used if copy_dest == COPY_FILE */
-    StringInfo    fe_msgbuf;        /* used for all dests during COPY TO */
-
-    int            file_encoding;    /* file or remote side's character encoding */
-    bool        need_transcoding;    /* file encoding diff from server? */
-    bool        encoding_embeds_ascii;    /* ASCII can be non-first byte? */
-
-    /* parameters from the COPY command */
-    Relation    rel;            /* relation to copy to */
-    QueryDesc  *queryDesc;        /* executable query to copy from */
-    List       *attnumlist;        /* integer list of attnums to copy */
-    char       *filename;        /* filename, or NULL for STDOUT */
-    bool        is_program;        /* is 'filename' a program to popen? */
-    copy_data_dest_cb data_dest_cb; /* function for writing data */
-
-    CopyFormatOptions opts;
-    Node       *whereClause;    /* WHERE condition (or NULL) */
-
-    /*
-     * Working state
-     */
-    MemoryContext copycontext;    /* per-copy execution context */
-
-    FmgrInfo   *out_functions;    /* lookup info for output functions */
-    MemoryContext rowcontext;    /* per-row evaluation context */
-    uint64        bytes_processed;    /* number of bytes processed so far */
-} CopyToStateData;
-
 /* DestReceiver for COPY (query) TO */
 typedef struct
 {
@@ -421,7 +361,7 @@ SendCopyBegin(CopyToState cstate)
     for (i = 0; i < natts; i++)
         pq_sendint16(&buf, format); /* per-column formats */
     pq_endmessage(&buf);
-    cstate->copy_dest = COPY_FRONTEND;
+    cstate->copy_dest = COPY_DEST_FRONTEND;
 }
 
 static void
@@ -468,7 +408,7 @@ CopySendEndOfRow(CopyToState cstate)
 
     switch (cstate->copy_dest)
     {
-        case COPY_FILE:
+        case COPY_DEST_FILE:
             if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1,
                        cstate->copy_file) != 1 ||
                 ferror(cstate->copy_file))
@@ -502,11 +442,11 @@ CopySendEndOfRow(CopyToState cstate)
                              errmsg("could not write to COPY file: %m")));
             }
             break;
-        case COPY_FRONTEND:
+        case COPY_DEST_FRONTEND:
             /* Dump the accumulated row as one CopyData message */
             (void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len);
             break;
-        case COPY_CALLBACK:
+        case COPY_DEST_CALLBACK:
             cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len);
             break;
     }
@@ -527,7 +467,7 @@ CopySendTextLikeEndOfRow(CopyToState cstate)
 {
     switch (cstate->copy_dest)
     {
-        case COPY_FILE:
+        case COPY_DEST_FILE:
             /* Default line termination depends on platform */
 #ifndef WIN32
             CopySendChar(cstate, '\n');
@@ -535,7 +475,7 @@ CopySendTextLikeEndOfRow(CopyToState cstate)
             CopySendString(cstate, "\r\n");
 #endif
             break;
-        case COPY_FRONTEND:
+        case COPY_DEST_FRONTEND:
             /* The FE/BE protocol uses \n as newline for all platforms */
             CopySendChar(cstate, '\n');
             break;
@@ -920,12 +860,12 @@ BeginCopyTo(ParseState *pstate,
     /* See Multibyte encoding comment above */
     cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding);
 
-    cstate->copy_dest = COPY_FILE;    /* default */
+    cstate->copy_dest = COPY_DEST_FILE; /* default */
 
     if (data_dest_cb)
     {
         progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK;
-        cstate->copy_dest = COPY_CALLBACK;
+        cstate->copy_dest = COPY_DEST_CALLBACK;
         cstate->data_dest_cb = data_dest_cb;
     }
     else if (pipe)
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 332628d67cc..6df1f8a3b9b 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -90,7 +90,7 @@ typedef struct CopyFormatOptions
     Oid            handler;        /* handler function for custom format routine */
 } CopyFormatOptions;
 
-/* These are private in commands/copy[from|to].c */
+/* These are private in commands/copy[from|to]_internal.h */
 typedef struct CopyFromStateData *CopyFromState;
 typedef struct CopyToStateData *CopyToState;
 
diff --git a/src/include/commands/copyto_internal.h b/src/include/commands/copyto_internal.h
new file mode 100644
index 00000000000..1b58b36c0a3
--- /dev/null
+++ b/src/include/commands/copyto_internal.h
@@ -0,0 +1,83 @@
+/*-------------------------------------------------------------------------
+ *
+ * copyto_internal.h
+ *      Internal definitions for COPY TO command.
+ *
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/commands/copyto_internal.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef COPYTO_INTERNAL_H
+#define COPYTO_INTERNAL_H
+
+#include "commands/copy.h"
+#include "executor/execdesc.h"
+#include "executor/tuptable.h"
+#include "nodes/execnodes.h"
+
+/*
+ * Represents the different dest cases we need to worry about at
+ * the bottom level
+ */
+typedef enum CopyDest
+{
+    COPY_DEST_FILE,                /* to file (or a piped program) */
+    COPY_DEST_FRONTEND,            /* to frontend */
+    COPY_DEST_CALLBACK,            /* to callback function */
+} CopyDest;
+
+/*
+ * This struct contains all the state variables used throughout a COPY TO
+ * operation.
+ *
+ * Multi-byte encodings: all supported client-side encodings encode multi-byte
+ * characters by having the first byte's high bit set. Subsequent bytes of the
+ * character can have the high bit not set. When scanning data in such an
+ * encoding to look for a match to a single-byte (ie ASCII) character, we must
+ * use the full pg_encoding_mblen() machinery to skip over multibyte
+ * characters, else we might find a false match to a trailing byte. In
+ * supported server encodings, there is no possibility of a false match, and
+ * it's faster to make useless comparisons to trailing bytes than it is to
+ * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true
+ * when we have to do it the hard way.
+ */
+typedef struct CopyToStateData
+{
+    /* format-specific routines */
+    const struct CopyToRoutine *routine;
+
+    /* low-level state data */
+    CopyDest    copy_dest;        /* type of copy source/destination */
+    FILE       *copy_file;        /* used if copy_dest == COPY_FILE */
+    StringInfo    fe_msgbuf;        /* used for all dests during COPY TO */
+
+    int            file_encoding;    /* file or remote side's character encoding */
+    bool        need_transcoding;    /* file encoding diff from server? */
+    bool        encoding_embeds_ascii;    /* ASCII can be non-first byte? */
+
+    /* parameters from the COPY command */
+    Relation    rel;            /* relation to copy to */
+    QueryDesc  *queryDesc;        /* executable query to copy from */
+    List       *attnumlist;        /* integer list of attnums to copy */
+    char       *filename;        /* filename, or NULL for STDOUT */
+    bool        is_program;        /* is 'filename' a program to popen? */
+    copy_data_dest_cb data_dest_cb; /* function for writing data */
+
+    CopyFormatOptions opts;
+    Node       *whereClause;    /* WHERE condition (or NULL) */
+
+    /*
+     * Working state
+     */
+    MemoryContext copycontext;    /* per-copy execution context */
+
+    FmgrInfo   *out_functions;    /* lookup info for output functions */
+    MemoryContext rowcontext;    /* per-row evaluation context */
+    uint64        bytes_processed;    /* number of bytes processed so far */
+} CopyToStateData;
+
+#endif                            /* COPYTO_INTERNAL_H */
-- 
2.47.2
From 40bade7212e8894220d7901fe9bb2a2eb07572cd Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Mon, 25 Nov 2024 14:01:18 +0900
Subject: [PATCH v37 3/9] Add support for implementing custom COPY TO format as
 extension
* Add CopyToStateData::opaque that can be used to keep data for custom
  COPY TO format implementation
* Export CopySendEndOfRow() to flush data in CopyToStateData::fe_msgbuf
  as CopyToStateFlush()
---
 src/backend/commands/copyto.c          | 12 ++++++++++++
 src/include/commands/copyapi.h         |  2 ++
 src/include/commands/copyto_internal.h |  3 +++
 3 files changed, 17 insertions(+)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 99c2f2dd699..f5ed3efbace 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -458,6 +458,18 @@ CopySendEndOfRow(CopyToState cstate)
     resetStringInfo(fe_msgbuf);
 }
 
+/*
+ * Export CopySendEndOfRow() for extensions. We want to keep
+ * CopySendEndOfRow() as a static function for
+ * optimization. CopySendEndOfRow() calls in this file may be optimized by a
+ * compiler.
+ */
+void
+CopyToStateFlush(CopyToState cstate)
+{
+    CopySendEndOfRow(cstate);
+}
+
 /*
  * Wrapper function of CopySendEndOfRow for text and CSV formats. Sends the
  * line termination and do common appropriate things for the end of row.
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 4f4ffabf882..5c5ea6592e3 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -56,6 +56,8 @@ typedef struct CopyToRoutine
     void        (*CopyToEnd) (CopyToState cstate);
 } CopyToRoutine;
 
+extern void CopyToStateFlush(CopyToState cstate);
+
 /*
  * API structure for a COPY FROM format implementation. Note this must be
  * allocated in a server-lifetime manner, typically as a static const struct.
diff --git a/src/include/commands/copyto_internal.h b/src/include/commands/copyto_internal.h
index 1b58b36c0a3..ce1c33a4004 100644
--- a/src/include/commands/copyto_internal.h
+++ b/src/include/commands/copyto_internal.h
@@ -78,6 +78,9 @@ typedef struct CopyToStateData
     FmgrInfo   *out_functions;    /* lookup info for output functions */
     MemoryContext rowcontext;    /* per-row evaluation context */
     uint64        bytes_processed;    /* number of bytes processed so far */
+
+    /* For custom format implementation */
+    void       *opaque;            /* private space */
 } CopyToStateData;
 
 #endif                            /* COPYTO_INTERNAL_H */
-- 
2.47.2
From 7025e1ca17b8c1693e94560c499b814b45a7cc45 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Mon, 25 Nov 2024 14:11:55 +0900
Subject: [PATCH v37 4/9] Add support for adding custom COPY FROM format
This uses the same handler for COPY TO and COPY FROM but uses
different routine. This uses CopyToRoutine for COPY TO and
CopyFromRoutine for COPY FROM. PostgreSQL calls a COPY TO/FROM handler
with "is_from" argument. It's true for COPY FROM and false for COPY
TO:
    copy_handler(true) returns CopyToRoutine
    copy_handler(false) returns CopyFromRoutine
This also add a test module for custom COPY FROM handler.
---
 src/backend/commands/copy.c                   | 13 +++----
 src/backend/commands/copyfrom.c               | 28 +++++++++++--
 src/include/catalog/pg_type.dat               |  2 +-
 src/include/commands/copyapi.h                |  2 +
 .../expected/test_copy_format.out             | 10 +++--
 .../test_copy_format/sql/test_copy_format.sql |  1 +
 .../test_copy_format/test_copy_format.c       | 39 ++++++++++++++++++-
 7 files changed, 78 insertions(+), 17 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 8d94bc313eb..b4417bb6819 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -483,8 +483,8 @@ defGetCopyLogVerbosityChoice(DefElem *def, ParseState *pstate)
  * This function checks whether the option value is a built-in format such as
  * "text" and "csv" or not. If the option value isn't a built-in format, this
  * function finds a COPY format handler that returns a CopyToRoutine (for
- * is_from == false). If no COPY format handler is found, this function
- * reports an error.
+ * is_from == false) or CopyFromRountine (for is_from == true). If no COPY
+ * format handler is found, this function reports an error.
  */
 static void
 ProcessCopyOptionFormat(ParseState *pstate,
@@ -518,12 +518,9 @@ ProcessCopyOptionFormat(ParseState *pstate,
     }
 
     /* custom format */
-    if (!is_from)
-    {
-        funcargtypes[0] = INTERNALOID;
-        handlerOid = LookupFuncName(list_make1(makeString(format)), 1,
-                                    funcargtypes, true);
-    }
+    funcargtypes[0] = INTERNALOID;
+    handlerOid = LookupFuncName(list_make1(makeString(format)), 1,
+                                funcargtypes, true);
     if (!OidIsValid(handlerOid))
         ereport(ERROR,
                 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index bcf66f0adf8..0809766f910 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -129,6 +129,7 @@ static void CopyFromBinaryEnd(CopyFromState cstate);
 
 /* text format */
 static const CopyFromRoutine CopyFromRoutineText = {
+    .type = T_CopyFromRoutine,
     .CopyFromInFunc = CopyFromTextLikeInFunc,
     .CopyFromStart = CopyFromTextLikeStart,
     .CopyFromOneRow = CopyFromTextOneRow,
@@ -137,6 +138,7 @@ static const CopyFromRoutine CopyFromRoutineText = {
 
 /* CSV format */
 static const CopyFromRoutine CopyFromRoutineCSV = {
+    .type = T_CopyFromRoutine,
     .CopyFromInFunc = CopyFromTextLikeInFunc,
     .CopyFromStart = CopyFromTextLikeStart,
     .CopyFromOneRow = CopyFromCSVOneRow,
@@ -145,6 +147,7 @@ static const CopyFromRoutine CopyFromRoutineCSV = {
 
 /* binary format */
 static const CopyFromRoutine CopyFromRoutineBinary = {
+    .type = T_CopyFromRoutine,
     .CopyFromInFunc = CopyFromBinaryInFunc,
     .CopyFromStart = CopyFromBinaryStart,
     .CopyFromOneRow = CopyFromBinaryOneRow,
@@ -155,13 +158,30 @@ static const CopyFromRoutine CopyFromRoutineBinary = {
 static const CopyFromRoutine *
 CopyFromGetRoutine(const CopyFormatOptions *opts)
 {
-    if (opts->csv_mode)
+    if (OidIsValid(opts->handler))
+    {
+        Datum        datum;
+        Node       *routine;
+
+        datum = OidFunctionCall1(opts->handler, BoolGetDatum(true));
+        routine = (Node *) DatumGetPointer(datum);
+        if (routine == NULL || !IsA(routine, CopyFromRoutine))
+            ereport(
+                    ERROR,
+                    (errcode(
+                             ERRCODE_INVALID_PARAMETER_VALUE),
+                     errmsg("COPY handler function "
+                            "%u did not return "
+                            "CopyFromRoutine struct",
+                            opts->handler)));
+        return castNode(CopyFromRoutine, routine);
+    }
+    else if (opts->csv_mode)
         return &CopyFromRoutineCSV;
     else if (opts->binary)
         return &CopyFromRoutineBinary;
-
-    /* default is text */
-    return &CopyFromRoutineText;
+    else
+        return &CopyFromRoutineText;
 }
 
 /* Implementation of the start callback for text and CSV formats */
diff --git a/src/include/catalog/pg_type.dat b/src/include/catalog/pg_type.dat
index 340e0cd0a8d..63b7d65f982 100644
--- a/src/include/catalog/pg_type.dat
+++ b/src/include/catalog/pg_type.dat
@@ -634,7 +634,7 @@
   typoutput => 'tsm_handler_out', typreceive => '-', typsend => '-',
   typalign => 'i' },
 { oid => '8752',
-  descr => 'pseudo-type for the result of a copy to method function',
+  descr => 'pseudo-type for the result of a copy to/from method function',
   typname => 'copy_handler', typlen => '4', typbyval => 't', typtype => 'p',
   typcategory => 'P', typinput => 'copy_handler_in',
   typoutput => 'copy_handler_out', typreceive => '-', typsend => '-',
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 5c5ea6592e3..895c105d8d8 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -64,6 +64,8 @@ extern void CopyToStateFlush(CopyToState cstate);
  */
 typedef struct CopyFromRoutine
 {
+    NodeTag        type;
+
     /*
      * Set input function information. This callback is called once at the
      * beginning of COPY FROM.
diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out
b/src/test/modules/test_copy_format/expected/test_copy_format.out
index adfe7d1572a..016893e7026 100644
--- a/src/test/modules/test_copy_format/expected/test_copy_format.out
+++ b/src/test/modules/test_copy_format/expected/test_copy_format.out
@@ -2,9 +2,13 @@ CREATE EXTENSION test_copy_format;
 CREATE TABLE public.test (a smallint, b integer, c bigint);
 INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
 COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
-ERROR:  COPY format "test_copy_format" not recognized
-LINE 1: COPY public.test FROM stdin WITH (FORMAT 'test_copy_format')...
-                                          ^
+NOTICE:  test_copy_format: is_from=true
+NOTICE:  CopyFromInFunc: atttypid=21
+NOTICE:  CopyFromInFunc: atttypid=23
+NOTICE:  CopyFromInFunc: atttypid=20
+NOTICE:  CopyFromStart: natts=3
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromEnd
 COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
 NOTICE:  test_copy_format: is_from=false
 NOTICE:  CopyToOutFunc: atttypid=21
diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql
b/src/test/modules/test_copy_format/sql/test_copy_format.sql
index 810b3d8cedc..0dfdfa00080 100644
--- a/src/test/modules/test_copy_format/sql/test_copy_format.sql
+++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql
@@ -2,4 +2,5 @@ CREATE EXTENSION test_copy_format;
 CREATE TABLE public.test (a smallint, b integer, c bigint);
 INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
 COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
+\.
 COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
diff --git a/src/test/modules/test_copy_format/test_copy_format.c
b/src/test/modules/test_copy_format/test_copy_format.c
index b42d472d851..abafc668463 100644
--- a/src/test/modules/test_copy_format/test_copy_format.c
+++ b/src/test/modules/test_copy_format/test_copy_format.c
@@ -18,6 +18,40 @@
 
 PG_MODULE_MAGIC;
 
+static void
+CopyFromInFunc(CopyFromState cstate, Oid atttypid,
+               FmgrInfo *finfo, Oid *typioparam)
+{
+    ereport(NOTICE, (errmsg("CopyFromInFunc: atttypid=%d", atttypid)));
+}
+
+static void
+CopyFromStart(CopyFromState cstate, TupleDesc tupDesc)
+{
+    ereport(NOTICE, (errmsg("CopyFromStart: natts=%d", tupDesc->natts)));
+}
+
+static bool
+CopyFromOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls)
+{
+    ereport(NOTICE, (errmsg("CopyFromOneRow")));
+    return false;
+}
+
+static void
+CopyFromEnd(CopyFromState cstate)
+{
+    ereport(NOTICE, (errmsg("CopyFromEnd")));
+}
+
+static const CopyFromRoutine CopyFromRoutineTestCopyFormat = {
+    .type = T_CopyFromRoutine,
+    .CopyFromInFunc = CopyFromInFunc,
+    .CopyFromStart = CopyFromStart,
+    .CopyFromOneRow = CopyFromOneRow,
+    .CopyFromEnd = CopyFromEnd,
+};
+
 static void
 CopyToOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo)
 {
@@ -59,5 +93,8 @@ test_copy_format(PG_FUNCTION_ARGS)
     ereport(NOTICE,
             (errmsg("test_copy_format: is_from=%s", is_from ? "true" : "false")));
 
-    PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat);
+    if (is_from)
+        PG_RETURN_POINTER(&CopyFromRoutineTestCopyFormat);
+    else
+        PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat);
 }
-- 
2.47.2
From 82e7ebbdd3323b05706cce573d8d293c3b559f2c Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Mon, 25 Nov 2024 14:19:34 +0900
Subject: [PATCH v37 5/9] Use COPY_SOURCE_ prefix for CopySource enum values
This is for consistency with CopyDest.
---
 src/backend/commands/copyfrom.c          |  4 ++--
 src/backend/commands/copyfromparse.c     | 10 +++++-----
 src/include/commands/copyfrom_internal.h |  6 +++---
 3 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 0809766f910..76662e04260 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1729,7 +1729,7 @@ BeginCopyFrom(ParseState *pstate,
                             pg_encoding_to_char(GetDatabaseEncoding()))));
     }
 
-    cstate->copy_src = COPY_FILE;    /* default */
+    cstate->copy_src = COPY_SOURCE_FILE;    /* default */
 
     cstate->whereClause = whereClause;
 
@@ -1857,7 +1857,7 @@ BeginCopyFrom(ParseState *pstate,
     if (data_source_cb)
     {
         progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK;
-        cstate->copy_src = COPY_CALLBACK;
+        cstate->copy_src = COPY_SOURCE_CALLBACK;
         cstate->data_source_cb = data_source_cb;
     }
     else if (pipe)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index e8128f85e6b..17e51f02e04 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -180,7 +180,7 @@ ReceiveCopyBegin(CopyFromState cstate)
     for (i = 0; i < natts; i++)
         pq_sendint16(&buf, format); /* per-column formats */
     pq_endmessage(&buf);
-    cstate->copy_src = COPY_FRONTEND;
+    cstate->copy_src = COPY_SOURCE_FRONTEND;
     cstate->fe_msgbuf = makeStringInfo();
     /* We *must* flush here to ensure FE knows it can send. */
     pq_flush();
@@ -248,7 +248,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
 
     switch (cstate->copy_src)
     {
-        case COPY_FILE:
+        case COPY_SOURCE_FILE:
             bytesread = fread(databuf, 1, maxread, cstate->copy_file);
             if (ferror(cstate->copy_file))
                 ereport(ERROR,
@@ -257,7 +257,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
             if (bytesread == 0)
                 cstate->raw_reached_eof = true;
             break;
-        case COPY_FRONTEND:
+        case COPY_SOURCE_FRONTEND:
             while (maxread > 0 && bytesread < minread && !cstate->raw_reached_eof)
             {
                 int            avail;
@@ -340,7 +340,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
                 bytesread += avail;
             }
             break;
-        case COPY_CALLBACK:
+        case COPY_SOURCE_CALLBACK:
             bytesread = cstate->data_source_cb(databuf, minread, maxread);
             break;
     }
@@ -1172,7 +1172,7 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
          * after \. up to the protocol end of copy data.  (XXX maybe better
          * not to treat \. as special?)
          */
-        if (cstate->copy_src == COPY_FRONTEND)
+        if (cstate->copy_src == COPY_SOURCE_FRONTEND)
         {
             int            inbytes;
 
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index c8b22af22d8..3a306e3286e 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -24,9 +24,9 @@
  */
 typedef enum CopySource
 {
-    COPY_FILE,                    /* from file (or a piped program) */
-    COPY_FRONTEND,                /* from frontend */
-    COPY_CALLBACK,                /* from callback function */
+    COPY_SOURCE_FILE,            /* from file (or a piped program) */
+    COPY_SOURCE_FRONTEND,        /* from frontend */
+    COPY_SOURCE_CALLBACK,        /* from callback function */
 } CopySource;
 
 /*
-- 
2.47.2
From 97a71244245787f895fcb4f05ee0b212e78c863c Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Mon, 25 Nov 2024 14:21:39 +0900
Subject: [PATCH v37 6/9] Add support for implementing custom COPY FROM format
 as extension
* Add CopyFromStateData::opaque that can be used to keep data for
  custom COPY From format implementation
* Export CopyGetData() to get the next data as
  CopyFromStateGetData()
---
 src/backend/commands/copyfromparse.c     | 11 +++++++++++
 src/include/commands/copyapi.h           |  2 ++
 src/include/commands/copyfrom_internal.h |  3 +++
 3 files changed, 16 insertions(+)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 17e51f02e04..d8fd238e72b 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -739,6 +739,17 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes)
     return copied_bytes;
 }
 
+/*
+ * Export CopyGetData() for extensions. We want to keep CopyGetData() as a
+ * static function for optimization. CopyGetData() calls in this file may be
+ * optimized by a compiler.
+ */
+int
+CopyFromStateGetData(CopyFromState cstate, void *dest, int minread, int maxread)
+{
+    return CopyGetData(cstate, dest, minread, maxread);
+}
+
 /*
  * This function is exposed for use by extensions that read raw fields in the
  * next line. See NextCopyFromRawFieldsInternal() for details.
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 895c105d8d8..2044d8b8c4c 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -108,4 +108,6 @@ typedef struct CopyFromRoutine
     void        (*CopyFromEnd) (CopyFromState cstate);
 } CopyFromRoutine;
 
+extern int    CopyFromStateGetData(CopyFromState cstate, void *dest, int minread, int maxread);
+
 #endif                            /* COPYAPI_H */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 3a306e3286e..af425cf5fd9 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -181,6 +181,9 @@ typedef struct CopyFromStateData
 #define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index)
 
     uint64        bytes_processed;    /* number of bytes processed so far */
+
+    /* For custom format implementation */
+    void       *opaque;            /* private space */
 } CopyFromStateData;
 
 extern void ReceiveCopyBegin(CopyFromState cstate);
-- 
2.47.2
From e6bf46e4006b25c0de110e14b8c2d57247f48b42 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Wed, 27 Nov 2024 16:23:55 +0900
Subject: [PATCH v37 7/9] Add CopyFromSkipErrorRow() for custom COPY format
 extension
Extensions must call CopyFromSkipErrorRow() when CopyFromOneRow
callback reports an error by errsave(). CopyFromSkipErrorRow() handles
"ON_ERROR stop" and "LOG_VERBOSITY verbose" cases.
---
 src/backend/commands/copyfromparse.c          | 82 +++++++++++--------
 src/include/commands/copyapi.h                |  2 +
 .../expected/test_copy_format.out             | 47 +++++++++++
 .../test_copy_format/sql/test_copy_format.sql | 24 ++++++
 .../test_copy_format/test_copy_format.c       | 80 +++++++++++++++++-
 5 files changed, 198 insertions(+), 37 deletions(-)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index d8fd238e72b..2070f51a963 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -938,6 +938,51 @@ CopyFromCSVOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values,
     return CopyFromTextLikeOneRow(cstate, econtext, values, nulls, true);
 }
 
+/*
+ * Call this when you report an error by errsave() in your CopyFromOneRow
+ * callback. This handles "ON_ERROR stop" and "LOG_VERBOSITY verbose" cases
+ * for you.
+ */
+void
+CopyFromSkipErrorRow(CopyFromState cstate)
+{
+    Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
+
+    cstate->num_errors++;
+
+    if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+    {
+        /*
+         * Since we emit line number and column info in the below notice
+         * message, we suppress error context information other than the
+         * relation name.
+         */
+        Assert(!cstate->relname_only);
+        cstate->relname_only = true;
+
+        if (cstate->cur_attval)
+        {
+            char       *attval;
+
+            attval = CopyLimitPrintoutLength(cstate->cur_attval);
+            ereport(NOTICE,
+                    errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": \"%s\"",
+                           (unsigned long long) cstate->cur_lineno,
+                           cstate->cur_attname,
+                           attval));
+            pfree(attval);
+        }
+        else
+            ereport(NOTICE,
+                    errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": null
input",
+                           (unsigned long long) cstate->cur_lineno,
+                           cstate->cur_attname));
+
+        /* reset relname_only */
+        cstate->relname_only = false;
+    }
+}
+
 /*
  * Workhorse for CopyFromTextOneRow() and CopyFromCSVOneRow().
  *
@@ -1044,42 +1089,7 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
                                         (Node *) cstate->escontext,
                                         &values[m]))
         {
-            Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
-
-            cstate->num_errors++;
-
-            if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
-            {
-                /*
-                 * Since we emit line number and column info in the below
-                 * notice message, we suppress error context information other
-                 * than the relation name.
-                 */
-                Assert(!cstate->relname_only);
-                cstate->relname_only = true;
-
-                if (cstate->cur_attval)
-                {
-                    char       *attval;
-
-                    attval = CopyLimitPrintoutLength(cstate->cur_attval);
-                    ereport(NOTICE,
-                            errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\":
\"%s\"",
-                                   (unsigned long long) cstate->cur_lineno,
-                                   cstate->cur_attname,
-                                   attval));
-                    pfree(attval);
-                }
-                else
-                    ereport(NOTICE,
-                            errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": null
input",
-                                   (unsigned long long) cstate->cur_lineno,
-                                   cstate->cur_attname));
-
-                /* reset relname_only */
-                cstate->relname_only = false;
-            }
-
+            CopyFromSkipErrorRow(cstate);
             return true;
         }
 
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 2044d8b8c4c..500ece7d5bb 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -110,4 +110,6 @@ typedef struct CopyFromRoutine
 
 extern int    CopyFromStateGetData(CopyFromState cstate, void *dest, int minread, int maxread);
 
+extern void CopyFromSkipErrorRow(CopyFromState cstate);
+
 #endif                            /* COPYAPI_H */
diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out
b/src/test/modules/test_copy_format/expected/test_copy_format.out
index 016893e7026..b9a6baa85c0 100644
--- a/src/test/modules/test_copy_format/expected/test_copy_format.out
+++ b/src/test/modules/test_copy_format/expected/test_copy_format.out
@@ -1,6 +1,8 @@
 CREATE EXTENSION test_copy_format;
 CREATE TABLE public.test (a smallint, b integer, c bigint);
 INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+-- 987 is accepted.
+-- 654 is a hard error because ON_ERROR is stop by default.
 COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
 NOTICE:  test_copy_format: is_from=true
 NOTICE:  CopyFromInFunc: atttypid=21
@@ -8,7 +10,50 @@ NOTICE:  CopyFromInFunc: atttypid=23
 NOTICE:  CopyFromInFunc: atttypid=20
 NOTICE:  CopyFromStart: natts=3
 NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromOneRow
+ERROR:  invalid value: "6"
+CONTEXT:  COPY test, line 2, column a: "6"
+-- 987 is accepted.
+-- 654 is a soft error because ON_ERROR is ignore.
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore);
+NOTICE:  test_copy_format: is_from=true
+NOTICE:  CopyFromInFunc: atttypid=21
+NOTICE:  CopyFromInFunc: atttypid=23
+NOTICE:  CopyFromInFunc: atttypid=20
+NOTICE:  CopyFromStart: natts=3
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromOneRow
+NOTICE:  1 row was skipped due to data type incompatibility
 NOTICE:  CopyFromEnd
+-- 987 is accepted.
+-- 654 is a soft error because ON_ERROR is ignore.
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore, LOG_VERBOSITY verbose);
+NOTICE:  test_copy_format: is_from=true
+NOTICE:  CopyFromInFunc: atttypid=21
+NOTICE:  CopyFromInFunc: atttypid=23
+NOTICE:  CopyFromInFunc: atttypid=20
+NOTICE:  CopyFromStart: natts=3
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromOneRow
+NOTICE:  skipping row due to data type incompatibility at line 2 for column "a": "6"
+NOTICE:  CopyFromOneRow
+NOTICE:  1 row was skipped due to data type incompatibility
+NOTICE:  CopyFromEnd
+-- 987 is accepted.
+-- 654 is a soft error because ON_ERROR is ignore.
+-- 321 is a hard error.
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore);
+NOTICE:  test_copy_format: is_from=true
+NOTICE:  CopyFromInFunc: atttypid=21
+NOTICE:  CopyFromInFunc: atttypid=23
+NOTICE:  CopyFromInFunc: atttypid=20
+NOTICE:  CopyFromStart: natts=3
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromOneRow
+ERROR:  too much lines: 3
+CONTEXT:  COPY test, line 3
 COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
 NOTICE:  test_copy_format: is_from=false
 NOTICE:  CopyToOutFunc: atttypid=21
@@ -18,4 +63,6 @@ NOTICE:  CopyToStart: natts=3
 NOTICE:  CopyToOneRow: tts_nvalid=3
 NOTICE:  CopyToOneRow: tts_nvalid=3
 NOTICE:  CopyToOneRow: tts_nvalid=3
+NOTICE:  CopyToOneRow: tts_nvalid=3
+NOTICE:  CopyToOneRow: tts_nvalid=3
 NOTICE:  CopyToEnd
diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql
b/src/test/modules/test_copy_format/sql/test_copy_format.sql
index 0dfdfa00080..86db71bce7f 100644
--- a/src/test/modules/test_copy_format/sql/test_copy_format.sql
+++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql
@@ -1,6 +1,30 @@
 CREATE EXTENSION test_copy_format;
 CREATE TABLE public.test (a smallint, b integer, c bigint);
 INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+-- 987 is accepted.
+-- 654 is a hard error because ON_ERROR is stop by default.
 COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
+987
+654
+\.
+-- 987 is accepted.
+-- 654 is a soft error because ON_ERROR is ignore.
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore);
+987
+654
+\.
+-- 987 is accepted.
+-- 654 is a soft error because ON_ERROR is ignore.
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore, LOG_VERBOSITY verbose);
+987
+654
+\.
+-- 987 is accepted.
+-- 654 is a soft error because ON_ERROR is ignore.
+-- 321 is a hard error.
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore);
+987
+654
+321
 \.
 COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
diff --git a/src/test/modules/test_copy_format/test_copy_format.c
b/src/test/modules/test_copy_format/test_copy_format.c
index abafc668463..96a54dab7ec 100644
--- a/src/test/modules/test_copy_format/test_copy_format.c
+++ b/src/test/modules/test_copy_format/test_copy_format.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 
 #include "commands/copyapi.h"
+#include "commands/copyfrom_internal.h"
 #include "commands/defrem.h"
 
 PG_MODULE_MAGIC;
@@ -34,8 +35,85 @@ CopyFromStart(CopyFromState cstate, TupleDesc tupDesc)
 static bool
 CopyFromOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls)
 {
+    int            n_attributes = list_length(cstate->attnumlist);
+    char       *line;
+    int            line_size = n_attributes + 1;    /* +1 is for new line */
+    int            read_bytes;
+
     ereport(NOTICE, (errmsg("CopyFromOneRow")));
-    return false;
+
+    cstate->cur_lineno++;
+    line = palloc(line_size);
+    read_bytes = CopyFromStateGetData(cstate, line, line_size, line_size);
+    if (read_bytes == 0)
+        return false;
+    if (read_bytes != line_size)
+        ereport(ERROR,
+                (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                 errmsg("one line must be %d bytes: %d",
+                        line_size, read_bytes)));
+
+    if (cstate->cur_lineno == 1)
+    {
+        /* Success */
+        TupleDesc    tupDesc = RelationGetDescr(cstate->rel);
+        ListCell   *cur;
+        int            i = 0;
+
+        foreach(cur, cstate->attnumlist)
+        {
+            int            attnum = lfirst_int(cur);
+            int            m = attnum - 1;
+            Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+            if (att->atttypid == INT2OID)
+            {
+                values[i] = Int16GetDatum(line[i] - '0');
+            }
+            else if (att->atttypid == INT4OID)
+            {
+                values[i] = Int32GetDatum(line[i] - '0');
+            }
+            else if (att->atttypid == INT8OID)
+            {
+                values[i] = Int64GetDatum(line[i] - '0');
+            }
+            nulls[i] = false;
+            i++;
+        }
+    }
+    else if (cstate->cur_lineno == 2)
+    {
+        /* Soft error */
+        TupleDesc    tupDesc = RelationGetDescr(cstate->rel);
+        int            attnum = lfirst_int(list_head(cstate->attnumlist));
+        int            m = attnum - 1;
+        Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+        char        value[2];
+
+        cstate->cur_attname = NameStr(att->attname);
+        value[0] = line[0];
+        value[1] = '\0';
+        cstate->cur_attval = value;
+        errsave((Node *) cstate->escontext,
+                (
+                 errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+                 errmsg("invalid value: \"%c\"", line[0])));
+        CopyFromSkipErrorRow(cstate);
+        cstate->cur_attname = NULL;
+        cstate->cur_attval = NULL;
+        return true;
+    }
+    else
+    {
+        /* Hard error */
+        ereport(ERROR,
+                (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                 errmsg("too much lines: %llu",
+                        (unsigned long long) cstate->cur_lineno)));
+    }
+
+    return true;
 }
 
 static void
-- 
2.47.2
From 9a4dfa53cdd1683c9386644886654f3f7ee3e49a Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Tue, 18 Mar 2025 19:09:09 +0900
Subject: [PATCH v37 8/9] Use copy handlers for built-in formats
This adds copy handlers for text, csv and binary. We can simplify
Copy{To,From}GetRoutine() by this. We'll be able to remove
CopyFormatOptions::{binary,csv_mode} when we add more callbacks to
Copy{To,From}Routine and move format specific routines to
Copy{To,From}Routine::*.
---
 src/backend/commands/copy.c              | 48 ++++++++++++++++++------
 src/backend/commands/copyfrom.c          | 48 +++++++++++-------------
 src/backend/commands/copyto.c            | 48 +++++++++++-------------
 src/include/catalog/pg_proc.dat          | 11 ++++++
 src/include/commands/copyfrom_internal.h |  6 ++-
 src/include/commands/copyto_internal.h   |  6 ++-
 6 files changed, 102 insertions(+), 65 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index b4417bb6819..24bd2547e3b 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -22,7 +22,9 @@
 #include "access/table.h"
 #include "access/xact.h"
 #include "catalog/pg_authid.h"
-#include "commands/copy.h"
+#include "commands/copyapi.h"
+#include "commands/copyto_internal.h"
+#include "commands/copyfrom_internal.h"
 #include "commands/defrem.h"
 #include "executor/executor.h"
 #include "mb/pg_wchar.h"
@@ -500,24 +502,15 @@ ProcessCopyOptionFormat(ParseState *pstate,
 
     opts_out->csv_mode = false;
     opts_out->binary = false;
-    /* built-in formats */
-    if (strcmp(format, "text") == 0)
-    {
-        /* "csv_mode == false && binary == false" means "text" */
-        return;
-    }
-    else if (strcmp(format, "csv") == 0)
+    if (strcmp(format, "csv") == 0)
     {
         opts_out->csv_mode = true;
-        return;
     }
     else if (strcmp(format, "binary") == 0)
     {
         opts_out->binary = true;
-        return;
     }
 
-    /* custom format */
     funcargtypes[0] = INTERNALOID;
     handlerOid = LookupFuncName(list_make1(makeString(format)), 1,
                                 funcargtypes, true);
@@ -1067,3 +1060,36 @@ CopyGetAttnums(TupleDesc tupDesc, Relation rel, List *attnamelist)
 
     return attnums;
 }
+
+Datum
+copy_text_handler(PG_FUNCTION_ARGS)
+{
+    bool        is_from = PG_GETARG_BOOL(0);
+
+    if (is_from)
+        PG_RETURN_POINTER(&CopyFromRoutineText);
+    else
+        PG_RETURN_POINTER(&CopyToRoutineText);
+}
+
+Datum
+copy_csv_handler(PG_FUNCTION_ARGS)
+{
+    bool        is_from = PG_GETARG_BOOL(0);
+
+    if (is_from)
+        PG_RETURN_POINTER(&CopyFromRoutineCSV);
+    else
+        PG_RETURN_POINTER(&CopyToRoutineCSV);
+}
+
+Datum
+copy_binary_handler(PG_FUNCTION_ARGS)
+{
+    bool        is_from = PG_GETARG_BOOL(0);
+
+    if (is_from)
+        PG_RETURN_POINTER(&CopyFromRoutineBinary);
+    else
+        PG_RETURN_POINTER(&CopyToRoutineBinary);
+}
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 76662e04260..2677f2ac1bc 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -45,6 +45,7 @@
 #include "rewrite/rewriteHandler.h"
 #include "storage/fd.h"
 #include "tcop/tcopprot.h"
+#include "utils/fmgroids.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/portal.h"
@@ -128,7 +129,7 @@ static void CopyFromBinaryEnd(CopyFromState cstate);
  */
 
 /* text format */
-static const CopyFromRoutine CopyFromRoutineText = {
+const CopyFromRoutine CopyFromRoutineText = {
     .type = T_CopyFromRoutine,
     .CopyFromInFunc = CopyFromTextLikeInFunc,
     .CopyFromStart = CopyFromTextLikeStart,
@@ -137,7 +138,7 @@ static const CopyFromRoutine CopyFromRoutineText = {
 };
 
 /* CSV format */
-static const CopyFromRoutine CopyFromRoutineCSV = {
+const CopyFromRoutine CopyFromRoutineCSV = {
     .type = T_CopyFromRoutine,
     .CopyFromInFunc = CopyFromTextLikeInFunc,
     .CopyFromStart = CopyFromTextLikeStart,
@@ -146,7 +147,7 @@ static const CopyFromRoutine CopyFromRoutineCSV = {
 };
 
 /* binary format */
-static const CopyFromRoutine CopyFromRoutineBinary = {
+const CopyFromRoutine CopyFromRoutineBinary = {
     .type = T_CopyFromRoutine,
     .CopyFromInFunc = CopyFromBinaryInFunc,
     .CopyFromStart = CopyFromBinaryStart,
@@ -158,30 +159,25 @@ static const CopyFromRoutine CopyFromRoutineBinary = {
 static const CopyFromRoutine *
 CopyFromGetRoutine(const CopyFormatOptions *opts)
 {
-    if (OidIsValid(opts->handler))
-    {
-        Datum        datum;
-        Node       *routine;
+    Oid            handler = opts->handler;
+    Datum        datum;
+    Node       *routine;
 
-        datum = OidFunctionCall1(opts->handler, BoolGetDatum(true));
-        routine = (Node *) DatumGetPointer(datum);
-        if (routine == NULL || !IsA(routine, CopyFromRoutine))
-            ereport(
-                    ERROR,
-                    (errcode(
-                             ERRCODE_INVALID_PARAMETER_VALUE),
-                     errmsg("COPY handler function "
-                            "%u did not return "
-                            "CopyFromRoutine struct",
-                            opts->handler)));
-        return castNode(CopyFromRoutine, routine);
-    }
-    else if (opts->csv_mode)
-        return &CopyFromRoutineCSV;
-    else if (opts->binary)
-        return &CopyFromRoutineBinary;
-    else
-        return &CopyFromRoutineText;
+    if (!OidIsValid(handler))
+        handler = F_TEXT_INTERNAL;
+
+    datum = OidFunctionCall1(handler, BoolGetDatum(true));
+    routine = (Node *) DatumGetPointer(datum);
+    if (routine == NULL || !IsA(routine, CopyFromRoutine))
+        ereport(
+                ERROR,
+                (errcode(
+                         ERRCODE_INVALID_PARAMETER_VALUE),
+                 errmsg("COPY handler function "
+                        "%u did not return "
+                        "CopyFromRoutine struct",
+                        opts->handler)));
+    return castNode(CopyFromRoutine, routine);
 }
 
 /* Implementation of the start callback for text and CSV formats */
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index f5ed3efbace..757d24736e3 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -32,6 +32,7 @@
 #include "pgstat.h"
 #include "storage/fd.h"
 #include "tcop/tcopprot.h"
+#include "utils/fmgroids.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
@@ -89,7 +90,7 @@ static void CopySendInt16(CopyToState cstate, int16 val);
  */
 
 /* text format */
-static const CopyToRoutine CopyToRoutineText = {
+const CopyToRoutine CopyToRoutineText = {
     .type = T_CopyToRoutine,
     .CopyToStart = CopyToTextLikeStart,
     .CopyToOutFunc = CopyToTextLikeOutFunc,
@@ -98,7 +99,7 @@ static const CopyToRoutine CopyToRoutineText = {
 };
 
 /* CSV format */
-static const CopyToRoutine CopyToRoutineCSV = {
+const CopyToRoutine CopyToRoutineCSV = {
     .type = T_CopyToRoutine,
     .CopyToStart = CopyToTextLikeStart,
     .CopyToOutFunc = CopyToTextLikeOutFunc,
@@ -107,7 +108,7 @@ static const CopyToRoutine CopyToRoutineCSV = {
 };
 
 /* binary format */
-static const CopyToRoutine CopyToRoutineBinary = {
+const CopyToRoutine CopyToRoutineBinary = {
     .type = T_CopyToRoutine,
     .CopyToStart = CopyToBinaryStart,
     .CopyToOutFunc = CopyToBinaryOutFunc,
@@ -119,30 +120,25 @@ static const CopyToRoutine CopyToRoutineBinary = {
 static const CopyToRoutine *
 CopyToGetRoutine(const CopyFormatOptions *opts)
 {
-    if (OidIsValid(opts->handler))
-    {
-        Datum        datum;
-        Node       *routine;
+    Oid            handler = opts->handler;
+    Datum        datum;
+    Node       *routine;
 
-        datum = OidFunctionCall1(opts->handler, BoolGetDatum(false));
-        routine = (Node *) DatumGetPointer(datum);
-        if (routine == NULL || !IsA(routine, CopyToRoutine))
-            ereport(
-                    ERROR,
-                    (errcode(
-                             ERRCODE_INVALID_PARAMETER_VALUE),
-                     errmsg("COPY handler function "
-                            "%u did not return "
-                            "CopyToRoutine struct",
-                            opts->handler)));
-        return castNode(CopyToRoutine, routine);
-    }
-    else if (opts->csv_mode)
-        return &CopyToRoutineCSV;
-    else if (opts->binary)
-        return &CopyToRoutineBinary;
-    else
-        return &CopyToRoutineText;
+    if (!OidIsValid(handler))
+        handler = F_TEXT_INTERNAL;
+
+    datum = OidFunctionCall1(handler, BoolGetDatum(false));
+    routine = (Node *) DatumGetPointer(datum);
+    if (routine == NULL || !IsA(routine, CopyToRoutine))
+        ereport(
+                ERROR,
+                (errcode(
+                         ERRCODE_INVALID_PARAMETER_VALUE),
+                 errmsg("COPY handler function "
+                        "%u did not return "
+                        "CopyToRoutine struct",
+                        opts->handler)));
+    return castNode(CopyToRoutine, routine);
 }
 
 /* Implementation of the start callback for text and CSV formats */
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 7c2a510fa3f..0737eb73c9a 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12485,4 +12485,15 @@
   proargtypes => 'int4',
   prosrc => 'gist_stratnum_common' },
 
+# COPY handlers
+{ oid => '8100', descr => 'text COPY FORMAT handler',
+  proname => 'text', provolatile => 'i', prorettype => 'copy_handler',
+  proargtypes => 'internal', prosrc => 'copy_text_handler' },
+{ oid => '8101', descr => 'csv COPY FORMAT handler',
+  proname => 'csv', provolatile => 'i', prorettype => 'copy_handler',
+  proargtypes => 'internal', prosrc => 'copy_csv_handler' },
+{ oid => '8102', descr => 'binary COPY FORMAT handler',
+  proname => 'binary', provolatile => 'i', prorettype => 'copy_handler',
+  proargtypes => 'internal', prosrc => 'copy_binary_handler' },
+
 ]
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index af425cf5fd9..abeccf85c1c 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -14,7 +14,7 @@
 #ifndef COPYFROM_INTERNAL_H
 #define COPYFROM_INTERNAL_H
 
-#include "commands/copy.h"
+#include "commands/copyapi.h"
 #include "commands/trigger.h"
 #include "nodes/miscnodes.h"
 
@@ -197,4 +197,8 @@ extern bool CopyFromCSVOneRow(CopyFromState cstate, ExprContext *econtext,
 extern bool CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext,
                                  Datum *values, bool *nulls);
 
+extern PGDLLIMPORT const CopyFromRoutine CopyFromRoutineText;
+extern PGDLLIMPORT const CopyFromRoutine CopyFromRoutineCSV;
+extern PGDLLIMPORT const CopyFromRoutine CopyFromRoutineBinary;
+
 #endif                            /* COPYFROM_INTERNAL_H */
diff --git a/src/include/commands/copyto_internal.h b/src/include/commands/copyto_internal.h
index ce1c33a4004..85412660f7f 100644
--- a/src/include/commands/copyto_internal.h
+++ b/src/include/commands/copyto_internal.h
@@ -14,7 +14,7 @@
 #ifndef COPYTO_INTERNAL_H
 #define COPYTO_INTERNAL_H
 
-#include "commands/copy.h"
+#include "commands/copyapi.h"
 #include "executor/execdesc.h"
 #include "executor/tuptable.h"
 #include "nodes/execnodes.h"
@@ -83,4 +83,8 @@ typedef struct CopyToStateData
     void       *opaque;            /* private space */
 } CopyToStateData;
 
+extern PGDLLIMPORT const CopyToRoutine CopyToRoutineText;
+extern PGDLLIMPORT const CopyToRoutine CopyToRoutineCSV;
+extern PGDLLIMPORT const CopyToRoutine CopyToRoutineBinary;
+
 #endif                            /* COPYTO_INTERNAL_H */
-- 
2.47.2
From deb16adfcdba9ae9cac0d1c4e55b9cf35ba2a179 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Wed, 19 Mar 2025 11:46:34 +0900
Subject: [PATCH v37 9/9] Add document how to write a COPY handler
This is WIP because we haven't decided our API yet.
---
 doc/src/sgml/copy-handler.sgml | 375 +++++++++++++++++++++++++++++++++
 doc/src/sgml/filelist.sgml     |   1 +
 doc/src/sgml/postgres.sgml     |   1 +
 src/include/commands/copyapi.h |   9 +-
 4 files changed, 382 insertions(+), 4 deletions(-)
 create mode 100644 doc/src/sgml/copy-handler.sgml
diff --git a/doc/src/sgml/copy-handler.sgml b/doc/src/sgml/copy-handler.sgml
new file mode 100644
index 00000000000..f602debae61
--- /dev/null
+++ b/doc/src/sgml/copy-handler.sgml
@@ -0,0 +1,375 @@
+<!-- doc/src/sgml/copy-handler.sgml -->
+
+<chapter id="copy-handler">
+ <title>Writing a Copy Handler</title>
+
+ <indexterm zone="copy-handler">
+  <primary><literal>COPY</literal> handler</primary>
+ </indexterm>
+
+ <para>
+  <productname>PostgreSQL</productname> supports
+  custom <link linkend="sql-copy"><literal>COPY</literal></link>
+  handlers. The <literal>COPY</literal> handlers can use different copy format
+  instead of built-in <literal>text</literal>, <literal>csv</literal>
+  and <literal>binary</literal>.
+ </para>
+
+ <para>
+  At the SQL level, a table sampling method is represented by a single SQL
+  function, typically implemented in C, having the signature
+<programlisting>
+format_name(internal) RETURNS copy_handler
+</programlisting>
+  The name of the function is the same name appearing in
+  the <literal>FORMAT</literal> option. The <type>internal</type> argument is
+  a dummy that simply serves to prevent this function from being called
+  directly from an SQL command. The real argument is <literal>bool
+  is_from</literal>. If the handler is used by <literal>COPY FROM</literal>,
+  it's <literal>true</literal>. If the handler is used by <literal>COPY
+  FROM</literal>, it's <literal>false</literal>.
+ </para>
+
+ <para>
+  The function must return <type>CopyFromRoutine *</type> when
+  the <literal>is_from</literal> argument is <literal>true</literal>.
+  The function must return <type>CopyToRoutine *</type> when
+  the <literal>is_from</literal> argument is <literal>false</literal>.
+ </para>
+
+ <para>
+  The <type>CopyFromRoutine</type> and <type>CopyToRoutine</type> struct types
+  are declared in <filename>src/include/commands/copyapi.h</filename>,
+  which see for additional details.
+ </para>
+
+ <sect1 id="copy-handler-from">
+  <title>Copy From Handler</title>
+
+  <para>
+   The <literal>COPY</literal> handler function for <literal>COPY
+   FROM</literal> returns a <type>CopyFromRoutine</type> struct containing
+   pointers to the functions described below. All functions are required.
+  </para>
+
+  <para>
+<programlisting>
+void
+CopyFromInFunc(CopyFromState cstate,
+               Oid atttypid,
+               FmgrInfo *finfo,
+               Oid *typioparam);
+</programlisting>
+
+   This sets input function information for the
+   given <literal>atttypid</literal> attribute. This function is called once
+   at the beginning of <literal>COPY FROM</literal>. If
+   this <literal>COPY</literal> handler doesn't use any input functions, this
+   function doesn't need to do anything.
+
+   <variablelist>
+    <varlistentry>
+     <term><literal>CopyFromState *cstate</literal></term>
+     <listitem>
+      <para>
+       This is an internal struct that contains all the state variables used
+       throughout a <literal>COPY FROM</literal> operation.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><literal>Oid atttypid</literal></term>
+     <listitem>
+      <para>
+       This is the OID of data type used by the relation's attribute.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><literal>FmgrInfo *finfo</literal></term>
+     <listitem>
+      <para>
+       This can be optionally filled to provide the catalog information of
+       the input function.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><literal>Oid *typioparam</literal></term>
+     <listitem>
+      <para>
+       This can be optionally filled to define the OID of the type to
+       pass to the input function.
+      </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+  </para>
+
+  <para>
+<programlisting>
+void
+CopyFromStart(CopyFromState cstate,
+              TupleDesc tupDesc);
+</programlisting>
+
+   This starts a <literal>COPY FROM</literal>. This function is called once at
+   the beginning of <literal>COPY FROM</literal>.
+
+   <variablelist>
+    <varlistentry>
+     <term><literal>CopyFromState *cstate</literal></term>
+     <listitem>
+      <para>
+       This is an internal struct that contains all the state variables used
+       throughout a <literal>COPY FROM</literal> operation.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><literal>TupleDesc tupDesc</literal></term>
+     <listitem>
+      <para>
+       This is the tuple descriptor of the relation where the data needs to be
+       copied. This can be used for any initialization steps required by a
+       format.
+      </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+  </para>
+
+  <para>
+<programlisting>
+bool
+CopyFromOneRow(CopyFromState cstate,
+               ExprContext *econtext,
+               Datum *values,
+               bool *nulls);
+</programlisting>
+
+   This reads one row from the source and fill <literal>values</literal>
+   and <literal>nulls</literal>. If there is one or more tuples to be read,
+   this must return <literal>true</literal>. If there are no more tuples to
+   read, this must return <literal>false</literal>.
+
+   <variablelist>
+    <varlistentry>
+     <term><literal>CopyFromState *cstate</literal></term>
+     <listitem>
+      <para>
+       This is an internal struct that contains all the state variables used
+       throughout a <literal>COPY FROM</literal> operation.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><literal>ExprContext *econtext</literal></term>
+     <listitem>
+      <para>
+       This is used to evaluate default expression for each column that is
+       either not read from the file or is using
+       the <literal>DEFAULT</literal> option of <literal>COPY
+       FROM</literal>. It is <literal>NULL</literal> if no default values are used.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><literal>Datum *values</literal></term>
+     <listitem>
+      <para>
+       This is an output variable to store read tuples.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><literal>bool *nulls</literal></term>
+     <listitem>
+      <para>
+       This is an output variable to store whether the read columns
+       are <literal>NULL</literal> or not.
+      </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+  </para>
+
+  <para>
+<programlisting>
+void
+CopyFromEnd(CopyFromState cstate);
+</programlisting>
+
+   This ends a <literal>COPY FROM</literal>. This function is called once at
+   the end of <literal>COPY FROM</literal>.
+
+   <variablelist>
+    <varlistentry>
+     <term><literal>CopyFromState *cstate</literal></term>
+     <listitem>
+      <para>
+       This is an internal struct that contains all the state variables used
+       throughout a <literal>COPY FROM</literal> operation.
+      </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+  </para>
+
+  <para>
+   TODO: Add CopyFromStateGetData() and CopyFromSkipErrowRow()?
+  </para>
+ </sect1>
+
+ <sect1 id="copy-handler-to">
+  <title>Copy To Handler</title>
+
+  <para>
+   The <literal>COPY</literal> handler function for <literal>COPY
+   TO</literal> returns a <type>CopyToRoutine</type> struct containing
+   pointers to the functions described below. All functions are required.
+  </para>
+
+  <para>
+<programlisting>
+void
+CopyToOutFunc(CopyToState cstate,
+              Oid atttypid,
+              FmgrInfo *finfo);
+</programlisting>
+
+   This sets output function information for the
+   given <literal>atttypid</literal> attribute. This function is called once
+   at the beginning of <literal>COPY TO</literal>. If
+   this <literal>COPY</literal> handler doesn't use any output functions, this
+   function doesn't need to do anything.
+
+   <variablelist>
+    <varlistentry>
+     <term><literal>CopyToState *cstate</literal></term>
+     <listitem>
+      <para>
+       This is an internal struct that contains all the state variables used
+       throughout a <literal>COPY TO</literal> operation.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><literal>Oid atttypid</literal></term>
+     <listitem>
+      <para>
+       This is the OID of data type used by the relation's attribute.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><literal>FmgrInfo *finfo</literal></term>
+     <listitem>
+      <para>
+       This can be optionally filled to provide the catalog information of
+       the output function.
+      </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+  </para>
+
+  <para>
+<programlisting>
+void
+CopyToStart(CopyToState cstate,
+            TupleDesc tupDesc);
+</programlisting>
+
+   This starts a <literal>COPY TO</literal>. This function is called once at
+   the beginning of <literal>COPY TO</literal>.
+
+   <variablelist>
+    <varlistentry>
+     <term><literal>CopyToState *cstate</literal></term>
+     <listitem>
+      <para>
+       This is an internal struct that contains all the state variables used
+       throughout a <literal>COPY TO</literal> operation.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><literal>TupleDesc tupDesc</literal></term>
+     <listitem>
+      <para>
+       This is the tuple descriptor of the relation where the data is read.
+      </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+  </para>
+
+  <para>
+<programlisting>
+bool
+CopyToOneRow(CopyToState cstate,
+             TupleTableSlot *slot);
+</programlisting>
+
+   This writes one row stored in <literal>slot</literal> to the destination.
+
+   <variablelist>
+    <varlistentry>
+     <term><literal>CopyToState *cstate</literal></term>
+     <listitem>
+      <para>
+       This is an internal struct that contains all the state variables used
+       throughout a <literal>COPY TO</literal> operation.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><literal>TupleTableSlot *slot</literal></term>
+     <listitem>
+      <para>
+       This is used to get row to be written.
+      </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+  </para>
+
+  <para>
+<programlisting>
+void
+CopyToEnd(CopyToState cstate);
+</programlisting>
+
+   This ends a <literal>COPY TO</literal>. This function is called once at
+   the end of <literal>COPY TO</literal>.
+
+   <variablelist>
+    <varlistentry>
+     <term><literal>CopyToState *cstate</literal></term>
+     <listitem>
+      <para>
+       This is an internal struct that contains all the state variables used
+       throughout a <literal>COPY TO</literal> operation.
+      </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+  </para>
+
+  <para>
+   TODO: Add CopyToStateFlush()?
+  </para>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 25fb99cee69..1fd6d32d5ec 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -107,6 +107,7 @@
 <!ENTITY storage    SYSTEM "storage.sgml">
 <!ENTITY transaction     SYSTEM "xact.sgml">
 <!ENTITY tablesample-method SYSTEM "tablesample-method.sgml">
+<!ENTITY copy-handler SYSTEM "copy-handler.sgml">
 <!ENTITY wal-for-extensions SYSTEM "wal-for-extensions.sgml">
 <!ENTITY generic-wal SYSTEM "generic-wal.sgml">
 <!ENTITY custom-rmgr SYSTEM "custom-rmgr.sgml">
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index af476c82fcc..8ba319ae2df 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -254,6 +254,7 @@ break is not needed in a wider output rendering.
   &plhandler;
   &fdwhandler;
   &tablesample-method;
+  ©-handler;
   &custom-scan;
   &geqo;
   &tableam;
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 500ece7d5bb..24710cb667a 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -28,10 +28,10 @@ typedef struct CopyToRoutine
      * Set output function information. This callback is called once at the
      * beginning of COPY TO.
      *
+     * 'atttypid' is the OID of data type used by the relation's attribute.
+     *
      * 'finfo' can be optionally filled to provide the catalog information of
      * the output function.
-     *
-     * 'atttypid' is the OID of data type used by the relation's attribute.
      */
     void        (*CopyToOutFunc) (CopyToState cstate, Oid atttypid,
                                   FmgrInfo *finfo);
@@ -70,12 +70,13 @@ typedef struct CopyFromRoutine
      * Set input function information. This callback is called once at the
      * beginning of COPY FROM.
      *
+     * 'atttypid' is the OID of data type used by the relation's attribute.
+     *
      * 'finfo' can be optionally filled to provide the catalog information of
      * the input function.
      *
      * 'typioparam' can be optionally filled to define the OID of the type to
-     * pass to the input function.'atttypid' is the OID of data type used by
-     * the relation's attribute.
+     * pass to the input function.
      */
     void        (*CopyFromInFunc) (CopyFromState cstate, Oid atttypid,
                                    FmgrInfo *finfo, Oid *typioparam);
-- 
2.47.2
			
		Re: Make COPY format extendable: Extract COPY TO format implementations
От
 
		    	"David G. Johnston"
		    Дата:
		        On Tue, Mar 18, 2025 at 7:56 PM Sutou Kouhei <kou@clear-code.com> wrote:
And could someone help (take over if possible) writing a
document for this feature? I'm not good at writing a
document in English... 0009 in the attached v37 patch set
has a draft of it. It's based on existing documents in
doc/src/sgml/ and *.h.
I haven't touched the innards of the structs aside from changing programlisting to synopsis.  And redoing the two section opening paragraphs to better integrate with the content in the chapter opening.
The rest I kinda went to town on...
David J.
diff --git a/doc/src/sgml/copy-handler.sgml b/doc/src/sgml/copy-handler.sgml
index f602debae6..9d2897a104 100644
--- a/doc/src/sgml/copy-handler.sgml
+++ b/doc/src/sgml/copy-handler.sgml
@@ -10,56 +10,72 @@
<para>
<productname>PostgreSQL</productname> supports
custom <link linkend="sql-copy"><literal>COPY</literal></link>
- handlers. The <literal>COPY</literal> handlers can use different copy format
- instead of built-in <literal>text</literal>, <literal>csv</literal>
- and <literal>binary</literal>.
+ handlers; adding additional <replaceable>format_name</replaceable> options
+ to the <literal>FORMAT</literal> clause.
</para>
 
<para>
- At the SQL level, a table sampling method is represented by a single SQL
- function, typically implemented in C, having the signature
-<programlisting>
-format_name(internal) RETURNS copy_handler
-</programlisting>
- The name of the function is the same name appearing in
- the <literal>FORMAT</literal> option. The <type>internal</type> argument is
- a dummy that simply serves to prevent this function from being called
- directly from an SQL command. The real argument is <literal>bool
- is_from</literal>. If the handler is used by <literal>COPY FROM</literal>,
- it's <literal>true</literal>. If the handler is used by <literal>COPY
- FROM</literal>, it's <literal>false</literal>.
+ At the SQL level, a copy handler method is represented by a single SQL
+ function (see <xref linkend="sql-createfunction"/>), typically implemented in
+ C, having the signature
+<synopsis>
+<replaceable>format_name</replaceable>(internal) RETURNS <literal>copy_handler</literal>
+</synopsis>
+ The function's name is then accepted as a valid <replaceable>format_name</replaceable>.
+ The return pseudo-type <literal>copy_handler</literal> informs the system that
+ this function needs to be registered as a copy handler.
+ The <type>internal</type> argument is a dummy that prevents
+ this function from being called directly from an SQL command. As the
+ handler implementation must be server-lifetime immutable; this SQL function's
+ volatility should be marked immutable. The <literal>link_symbol</literal>
+ for this function is the name of the implementation function, described next.
</para>
 
<para>
- The function must return <type>CopyFromRoutine *</type> when
- the <literal>is_from</literal> argument is <literal>true</literal>.
- The function must return <type>CopyToRoutine *</type> when
- the <literal>is_from</literal> argument is <literal>false</literal>.
+ The implementation function signature expected for the function named
+ in the <literal>link_symbol</literal> is:
+<synopsis>
+Datum
+<replaceable>copy_format_handler</replaceable>(PG_FUNCTION_ARGS)
+</synopsis>
+ The convention for the name is to replace the word
+ <replaceable>format</replaceable> in the placeholder above with the value given
+ to <replaceable>format_name</replaceable> in the SQL function.
+ The first argument is a <type>boolean</type> that indicates whether the handler
+ must provide a pointer to its implementation for <literal>COPY FROM</literal>
+ (a <type>CopyFromRoutine *</type>). If <literal>false</literal>, the handler
+ must provide a pointer to its implementation of <literal>COPY TO</literal>
+ (a <type>CopyToRoutine *</type>). These structs are declared in
+ <filename>src/include/commands/copyapi.h</filename>.
</para>
 
<para>
- The <type>CopyFromRoutine</type> and <type>CopyToRoutine</type> struct types
- are declared in <filename>src/include/commands/copyapi.h</filename>,
- which see for additional details.
+ The structs hold pointers to implementation functions for
+ initializing, starting, processing rows, and ending a copy operation.
+ The specific structures vary a bit between <literal>COPY FROM</literal> and
+ <literal>COPY TO</literal> so the next two sections describes each
+ in detail.
</para>
 
<sect1 id="copy-handler-from">
<title>Copy From Handler</title>
 
<para>
- The <literal>COPY</literal> handler function for <literal>COPY
- FROM</literal> returns a <type>CopyFromRoutine</type> struct containing
- pointers to the functions described below. All functions are required.
+ The opening to this chapter describes how the executor will call the
+ main handler function with, in this case,
+ a <type>boolean</type> <literal>true</literal>, and expect to receive a
+ <type>CopyFromRoutine *</type> <type>Datum</type>. This section describes
+ the components of the <type>CopyFromRoutine</type> struct.
</para>
 
<para>
-<programlisting>
+<synopsis>
void
CopyFromInFunc(CopyFromState cstate,
Oid atttypid,
FmgrInfo *finfo,
Oid *typioparam);
-</programlisting>
+</synopsis>
 
This sets input function information for the
given <literal>atttypid</literal> attribute. This function is called once
@@ -110,11 +126,11 @@ CopyFromInFunc(CopyFromState cstate,
</para>
 
<para>
-<programlisting>
+<synopsis>
void
CopyFromStart(CopyFromState cstate,
TupleDesc tupDesc);
-</programlisting>
+</synopsis>
 
This starts a <literal>COPY FROM</literal>. This function is called once at
the beginning of <literal>COPY FROM</literal>.
@@ -144,13 +160,13 @@ CopyFromStart(CopyFromState cstate,
</para>
 
<para>
-<programlisting>
+<synopsis>
bool
CopyFromOneRow(CopyFromState cstate,
ExprContext *econtext,
Datum *values,
bool *nulls);
-</programlisting>
+</synopsis>
 
This reads one row from the source and fill <literal>values</literal>
and <literal>nulls</literal>. If there is one or more tuples to be read,
@@ -202,10 +218,10 @@ CopyFromOneRow(CopyFromState cstate,
</para>
 
<para>
-<programlisting>
+<synopsis>
void
CopyFromEnd(CopyFromState cstate);
-</programlisting>
+</synopsis>
 
This ends a <literal>COPY FROM</literal>. This function is called once at
the end of <literal>COPY FROM</literal>.
@@ -232,18 +248,20 @@ CopyFromEnd(CopyFromState cstate);
<title>Copy To Handler</title>
 
<para>
- The <literal>COPY</literal> handler function for <literal>COPY
- TO</literal> returns a <type>CopyToRoutine</type> struct containing
- pointers to the functions described below. All functions are required.
+ The opening to this chapter describes how the executor will call the
+ main handler function with, in this case,
+ a <type>boolean</type> <literal>false</literal>, and expect to receive a
+ <type>CopyInRoutine *</type> <type>Datum</type>. This section describes
+ the components of the <type>CopyInRoutine</type> struct.
</para>
 
<para>
-<programlisting>
+<synopsis>
void
CopyToOutFunc(CopyToState cstate,
Oid atttypid,
FmgrInfo *finfo);
-</programlisting>
+</synopsis>
 
This sets output function information for the
given <literal>atttypid</literal> attribute. This function is called once
@@ -284,11 +302,11 @@ CopyToOutFunc(CopyToState cstate,
</para>
 
<para>
-<programlisting>
+<synopsis>
void
CopyToStart(CopyToState cstate,
TupleDesc tupDesc);
-</programlisting>
+</synopsis>
 
This starts a <literal>COPY TO</literal>. This function is called once at
the beginning of <literal>COPY TO</literal>.
@@ -316,11 +334,11 @@ CopyToStart(CopyToState cstate,
</para>
 
<para>
-<programlisting>
+<synopsis>
bool
CopyToOneRow(CopyToState cstate,
TupleTableSlot *slot);
-</programlisting>
+</synopsis>
 
This writes one row stored in <literal>slot</literal> to the destination.
 
@@ -347,10 +365,10 @@ CopyToOneRow(CopyToState cstate,
</para>
 
<para>
-<programlisting>
+<synopsis>
void
CopyToEnd(CopyToState cstate);
-</programlisting>
+</synopsis>
 
This ends a <literal>COPY TO</literal>. This function is called once at
the end of <literal>COPY TO</literal>.
index f602debae6..9d2897a104 100644
--- a/doc/src/sgml/copy-handler.sgml
+++ b/doc/src/sgml/copy-handler.sgml
@@ -10,56 +10,72 @@
<para>
<productname>PostgreSQL</productname> supports
custom <link linkend="sql-copy"><literal>COPY</literal></link>
- handlers. The <literal>COPY</literal> handlers can use different copy format
- instead of built-in <literal>text</literal>, <literal>csv</literal>
- and <literal>binary</literal>.
+ handlers; adding additional <replaceable>format_name</replaceable> options
+ to the <literal>FORMAT</literal> clause.
</para>
<para>
- At the SQL level, a table sampling method is represented by a single SQL
- function, typically implemented in C, having the signature
-<programlisting>
-format_name(internal) RETURNS copy_handler
-</programlisting>
- The name of the function is the same name appearing in
- the <literal>FORMAT</literal> option. The <type>internal</type> argument is
- a dummy that simply serves to prevent this function from being called
- directly from an SQL command. The real argument is <literal>bool
- is_from</literal>. If the handler is used by <literal>COPY FROM</literal>,
- it's <literal>true</literal>. If the handler is used by <literal>COPY
- FROM</literal>, it's <literal>false</literal>.
+ At the SQL level, a copy handler method is represented by a single SQL
+ function (see <xref linkend="sql-createfunction"/>), typically implemented in
+ C, having the signature
+<synopsis>
+<replaceable>format_name</replaceable>(internal) RETURNS <literal>copy_handler</literal>
+</synopsis>
+ The function's name is then accepted as a valid <replaceable>format_name</replaceable>.
+ The return pseudo-type <literal>copy_handler</literal> informs the system that
+ this function needs to be registered as a copy handler.
+ The <type>internal</type> argument is a dummy that prevents
+ this function from being called directly from an SQL command. As the
+ handler implementation must be server-lifetime immutable; this SQL function's
+ volatility should be marked immutable. The <literal>link_symbol</literal>
+ for this function is the name of the implementation function, described next.
</para>
<para>
- The function must return <type>CopyFromRoutine *</type> when
- the <literal>is_from</literal> argument is <literal>true</literal>.
- The function must return <type>CopyToRoutine *</type> when
- the <literal>is_from</literal> argument is <literal>false</literal>.
+ The implementation function signature expected for the function named
+ in the <literal>link_symbol</literal> is:
+<synopsis>
+Datum
+<replaceable>copy_format_handler</replaceable>(PG_FUNCTION_ARGS)
+</synopsis>
+ The convention for the name is to replace the word
+ <replaceable>format</replaceable> in the placeholder above with the value given
+ to <replaceable>format_name</replaceable> in the SQL function.
+ The first argument is a <type>boolean</type> that indicates whether the handler
+ must provide a pointer to its implementation for <literal>COPY FROM</literal>
+ (a <type>CopyFromRoutine *</type>). If <literal>false</literal>, the handler
+ must provide a pointer to its implementation of <literal>COPY TO</literal>
+ (a <type>CopyToRoutine *</type>). These structs are declared in
+ <filename>src/include/commands/copyapi.h</filename>.
</para>
<para>
- The <type>CopyFromRoutine</type> and <type>CopyToRoutine</type> struct types
- are declared in <filename>src/include/commands/copyapi.h</filename>,
- which see for additional details.
+ The structs hold pointers to implementation functions for
+ initializing, starting, processing rows, and ending a copy operation.
+ The specific structures vary a bit between <literal>COPY FROM</literal> and
+ <literal>COPY TO</literal> so the next two sections describes each
+ in detail.
</para>
<sect1 id="copy-handler-from">
<title>Copy From Handler</title>
<para>
- The <literal>COPY</literal> handler function for <literal>COPY
- FROM</literal> returns a <type>CopyFromRoutine</type> struct containing
- pointers to the functions described below. All functions are required.
+ The opening to this chapter describes how the executor will call the
+ main handler function with, in this case,
+ a <type>boolean</type> <literal>true</literal>, and expect to receive a
+ <type>CopyFromRoutine *</type> <type>Datum</type>. This section describes
+ the components of the <type>CopyFromRoutine</type> struct.
</para>
<para>
-<programlisting>
+<synopsis>
void
CopyFromInFunc(CopyFromState cstate,
Oid atttypid,
FmgrInfo *finfo,
Oid *typioparam);
-</programlisting>
+</synopsis>
This sets input function information for the
given <literal>atttypid</literal> attribute. This function is called once
@@ -110,11 +126,11 @@ CopyFromInFunc(CopyFromState cstate,
</para>
<para>
-<programlisting>
+<synopsis>
void
CopyFromStart(CopyFromState cstate,
TupleDesc tupDesc);
-</programlisting>
+</synopsis>
This starts a <literal>COPY FROM</literal>. This function is called once at
the beginning of <literal>COPY FROM</literal>.
@@ -144,13 +160,13 @@ CopyFromStart(CopyFromState cstate,
</para>
<para>
-<programlisting>
+<synopsis>
bool
CopyFromOneRow(CopyFromState cstate,
ExprContext *econtext,
Datum *values,
bool *nulls);
-</programlisting>
+</synopsis>
This reads one row from the source and fill <literal>values</literal>
and <literal>nulls</literal>. If there is one or more tuples to be read,
@@ -202,10 +218,10 @@ CopyFromOneRow(CopyFromState cstate,
</para>
<para>
-<programlisting>
+<synopsis>
void
CopyFromEnd(CopyFromState cstate);
-</programlisting>
+</synopsis>
This ends a <literal>COPY FROM</literal>. This function is called once at
the end of <literal>COPY FROM</literal>.
@@ -232,18 +248,20 @@ CopyFromEnd(CopyFromState cstate);
<title>Copy To Handler</title>
<para>
- The <literal>COPY</literal> handler function for <literal>COPY
- TO</literal> returns a <type>CopyToRoutine</type> struct containing
- pointers to the functions described below. All functions are required.
+ The opening to this chapter describes how the executor will call the
+ main handler function with, in this case,
+ a <type>boolean</type> <literal>false</literal>, and expect to receive a
+ <type>CopyInRoutine *</type> <type>Datum</type>. This section describes
+ the components of the <type>CopyInRoutine</type> struct.
</para>
<para>
-<programlisting>
+<synopsis>
void
CopyToOutFunc(CopyToState cstate,
Oid atttypid,
FmgrInfo *finfo);
-</programlisting>
+</synopsis>
This sets output function information for the
given <literal>atttypid</literal> attribute. This function is called once
@@ -284,11 +302,11 @@ CopyToOutFunc(CopyToState cstate,
</para>
<para>
-<programlisting>
+<synopsis>
void
CopyToStart(CopyToState cstate,
TupleDesc tupDesc);
-</programlisting>
+</synopsis>
This starts a <literal>COPY TO</literal>. This function is called once at
the beginning of <literal>COPY TO</literal>.
@@ -316,11 +334,11 @@ CopyToStart(CopyToState cstate,
</para>
<para>
-<programlisting>
+<synopsis>
bool
CopyToOneRow(CopyToState cstate,
TupleTableSlot *slot);
-</programlisting>
+</synopsis>
This writes one row stored in <literal>slot</literal> to the destination.
@@ -347,10 +365,10 @@ CopyToOneRow(CopyToState cstate,
</para>
<para>
-<programlisting>
+<synopsis>
void
CopyToEnd(CopyToState cstate);
-</programlisting>
+</synopsis>
This ends a <literal>COPY TO</literal>. This function is called once at
the end of <literal>COPY TO</literal>.
Hi,
In <CAKFQuwaMAFMHqxDXR=SxA0mDjdmntrwxZd2w=nSruLNFH-OzLw@mail.gmail.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Wed, 19 Mar 2025 17:49:49 -0700,
  "David G. Johnston" <david.g.johnston@gmail.com> wrote:
>> And could someone help (take over if possible) writing a
>> document for this feature? I'm not good at writing a
>> document in English... 0009 in the attached v37 patch set
>> has a draft of it. It's based on existing documents in
>> doc/src/sgml/ and *.h.
>>
>>
> I haven't touched the innards of the structs aside from changing
> programlisting to synopsis.  And redoing the two section opening paragraphs
> to better integrate with the content in the chapter opening.
> 
> The rest I kinda went to town on...
Thanks!!! It's very helpful!!!
I've applied your patch. 0009 is only changed.
Thanks,
-- 
kou
From a9aebf329a5388173b78b20922e19904ce833e9c Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Mon, 25 Nov 2024 12:19:15 +0900
Subject: [PATCH v38 1/9] Add support for adding custom COPY TO format
This uses the handler approach like tablesample. The approach creates
an internal function that returns an internal struct. In this case,
a COPY TO handler returns a CopyToRoutine.
This also add a test module for custom COPY TO handler.
---
 src/backend/commands/copy.c                   | 79 ++++++++++++++++---
 src/backend/commands/copyto.c                 | 28 ++++++-
 src/backend/nodes/Makefile                    |  1 +
 src/backend/nodes/gen_node_support.pl         |  2 +
 src/backend/utils/adt/pseudotypes.c           |  1 +
 src/include/catalog/pg_proc.dat               |  6 ++
 src/include/catalog/pg_type.dat               |  6 ++
 src/include/commands/copy.h                   |  1 +
 src/include/commands/copyapi.h                |  2 +
 src/include/nodes/meson.build                 |  1 +
 src/test/modules/Makefile                     |  1 +
 src/test/modules/meson.build                  |  1 +
 src/test/modules/test_copy_format/.gitignore  |  4 +
 src/test/modules/test_copy_format/Makefile    | 23 ++++++
 .../expected/test_copy_format.out             | 17 ++++
 src/test/modules/test_copy_format/meson.build | 33 ++++++++
 .../test_copy_format/sql/test_copy_format.sql |  5 ++
 .../test_copy_format--1.0.sql                 |  8 ++
 .../test_copy_format/test_copy_format.c       | 63 +++++++++++++++
 .../test_copy_format/test_copy_format.control |  4 +
 20 files changed, 269 insertions(+), 17 deletions(-)
 mode change 100644 => 100755 src/backend/nodes/gen_node_support.pl
 create mode 100644 src/test/modules/test_copy_format/.gitignore
 create mode 100644 src/test/modules/test_copy_format/Makefile
 create mode 100644 src/test/modules/test_copy_format/expected/test_copy_format.out
 create mode 100644 src/test/modules/test_copy_format/meson.build
 create mode 100644 src/test/modules/test_copy_format/sql/test_copy_format.sql
 create mode 100644 src/test/modules/test_copy_format/test_copy_format--1.0.sql
 create mode 100644 src/test/modules/test_copy_format/test_copy_format.c
 create mode 100644 src/test/modules/test_copy_format/test_copy_format.control
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfca9d9dc29..8d94bc313eb 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -32,6 +32,7 @@
 #include "parser/parse_coerce.h"
 #include "parser/parse_collate.h"
 #include "parser/parse_expr.h"
+#include "parser/parse_func.h"
 #include "parser/parse_relation.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
@@ -476,6 +477,70 @@ defGetCopyLogVerbosityChoice(DefElem *def, ParseState *pstate)
     return COPY_LOG_VERBOSITY_DEFAULT;    /* keep compiler quiet */
 }
 
+/*
+ * Process the "format" option.
+ *
+ * This function checks whether the option value is a built-in format such as
+ * "text" and "csv" or not. If the option value isn't a built-in format, this
+ * function finds a COPY format handler that returns a CopyToRoutine (for
+ * is_from == false). If no COPY format handler is found, this function
+ * reports an error.
+ */
+static void
+ProcessCopyOptionFormat(ParseState *pstate,
+                        CopyFormatOptions *opts_out,
+                        bool is_from,
+                        DefElem *defel)
+{
+    char       *format;
+    Oid            funcargtypes[1];
+    Oid            handlerOid = InvalidOid;
+
+    format = defGetString(defel);
+
+    opts_out->csv_mode = false;
+    opts_out->binary = false;
+    /* built-in formats */
+    if (strcmp(format, "text") == 0)
+    {
+        /* "csv_mode == false && binary == false" means "text" */
+        return;
+    }
+    else if (strcmp(format, "csv") == 0)
+    {
+        opts_out->csv_mode = true;
+        return;
+    }
+    else if (strcmp(format, "binary") == 0)
+    {
+        opts_out->binary = true;
+        return;
+    }
+
+    /* custom format */
+    if (!is_from)
+    {
+        funcargtypes[0] = INTERNALOID;
+        handlerOid = LookupFuncName(list_make1(makeString(format)), 1,
+                                    funcargtypes, true);
+    }
+    if (!OidIsValid(handlerOid))
+        ereport(ERROR,
+                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                 errmsg("COPY format \"%s\" not recognized", format),
+                 parser_errposition(pstate, defel->location)));
+
+    /* check that handler has correct return type */
+    if (get_func_rettype(handlerOid) != COPY_HANDLEROID)
+        ereport(ERROR,
+                (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+                 errmsg("function %s must return type %s",
+                        format, "copy_handler"),
+                 parser_errposition(pstate, defel->location)));
+
+    opts_out->handler = handlerOid;
+}
+
 /*
  * Process the statement option list for COPY.
  *
@@ -519,22 +584,10 @@ ProcessCopyOptions(ParseState *pstate,
 
         if (strcmp(defel->defname, "format") == 0)
         {
-            char       *fmt = defGetString(defel);
-
             if (format_specified)
                 errorConflictingDefElem(defel, pstate);
             format_specified = true;
-            if (strcmp(fmt, "text") == 0)
-                 /* default format */ ;
-            else if (strcmp(fmt, "csv") == 0)
-                opts_out->csv_mode = true;
-            else if (strcmp(fmt, "binary") == 0)
-                opts_out->binary = true;
-            else
-                ereport(ERROR,
-                        (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-                         errmsg("COPY format \"%s\" not recognized", fmt),
-                         parser_errposition(pstate, defel->location)));
+            ProcessCopyOptionFormat(pstate, opts_out, is_from, defel);
         }
         else if (strcmp(defel->defname, "freeze") == 0)
         {
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 84a3f3879a8..fce8501dc30 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -150,6 +150,7 @@ static void CopySendInt16(CopyToState cstate, int16 val);
 
 /* text format */
 static const CopyToRoutine CopyToRoutineText = {
+    .type = T_CopyToRoutine,
     .CopyToStart = CopyToTextLikeStart,
     .CopyToOutFunc = CopyToTextLikeOutFunc,
     .CopyToOneRow = CopyToTextOneRow,
@@ -158,6 +159,7 @@ static const CopyToRoutine CopyToRoutineText = {
 
 /* CSV format */
 static const CopyToRoutine CopyToRoutineCSV = {
+    .type = T_CopyToRoutine,
     .CopyToStart = CopyToTextLikeStart,
     .CopyToOutFunc = CopyToTextLikeOutFunc,
     .CopyToOneRow = CopyToCSVOneRow,
@@ -166,6 +168,7 @@ static const CopyToRoutine CopyToRoutineCSV = {
 
 /* binary format */
 static const CopyToRoutine CopyToRoutineBinary = {
+    .type = T_CopyToRoutine,
     .CopyToStart = CopyToBinaryStart,
     .CopyToOutFunc = CopyToBinaryOutFunc,
     .CopyToOneRow = CopyToBinaryOneRow,
@@ -176,13 +179,30 @@ static const CopyToRoutine CopyToRoutineBinary = {
 static const CopyToRoutine *
 CopyToGetRoutine(const CopyFormatOptions *opts)
 {
-    if (opts->csv_mode)
+    if (OidIsValid(opts->handler))
+    {
+        Datum        datum;
+        Node       *routine;
+
+        datum = OidFunctionCall1(opts->handler, BoolGetDatum(false));
+        routine = (Node *) DatumGetPointer(datum);
+        if (routine == NULL || !IsA(routine, CopyToRoutine))
+            ereport(
+                    ERROR,
+                    (errcode(
+                             ERRCODE_INVALID_PARAMETER_VALUE),
+                     errmsg("COPY handler function "
+                            "%u did not return "
+                            "CopyToRoutine struct",
+                            opts->handler)));
+        return castNode(CopyToRoutine, routine);
+    }
+    else if (opts->csv_mode)
         return &CopyToRoutineCSV;
     else if (opts->binary)
         return &CopyToRoutineBinary;
-
-    /* default is text */
-    return &CopyToRoutineText;
+    else
+        return &CopyToRoutineText;
 }
 
 /* Implementation of the start callback for text and CSV formats */
diff --git a/src/backend/nodes/Makefile b/src/backend/nodes/Makefile
index 77ddb9ca53f..dc6c1087361 100644
--- a/src/backend/nodes/Makefile
+++ b/src/backend/nodes/Makefile
@@ -50,6 +50,7 @@ node_headers = \
     access/sdir.h \
     access/tableam.h \
     access/tsmapi.h \
+    commands/copyapi.h \
     commands/event_trigger.h \
     commands/trigger.h \
     executor/tuptable.h \
diff --git a/src/backend/nodes/gen_node_support.pl b/src/backend/nodes/gen_node_support.pl
old mode 100644
new mode 100755
index 7e3f335ac09..29b7180c8ee
--- a/src/backend/nodes/gen_node_support.pl
+++ b/src/backend/nodes/gen_node_support.pl
@@ -62,6 +62,7 @@ my @all_input_files = qw(
   access/sdir.h
   access/tableam.h
   access/tsmapi.h
+  commands/copyapi.h
   commands/event_trigger.h
   commands/trigger.h
   executor/tuptable.h
@@ -86,6 +87,7 @@ my @nodetag_only_files = qw(
   access/sdir.h
   access/tableam.h
   access/tsmapi.h
+  commands/copyapi.h
   commands/event_trigger.h
   commands/trigger.h
   executor/tuptable.h
diff --git a/src/backend/utils/adt/pseudotypes.c b/src/backend/utils/adt/pseudotypes.c
index 317a1f2b282..f2ebc21ca56 100644
--- a/src/backend/utils/adt/pseudotypes.c
+++ b/src/backend/utils/adt/pseudotypes.c
@@ -370,6 +370,7 @@ PSEUDOTYPE_DUMMY_IO_FUNCS(fdw_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(table_am_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(index_am_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(tsm_handler);
+PSEUDOTYPE_DUMMY_IO_FUNCS(copy_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(internal);
 PSEUDOTYPE_DUMMY_IO_FUNCS(anyelement);
 PSEUDOTYPE_DUMMY_IO_FUNCS(anynonarray);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 890822eaf79..7c2a510fa3f 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -7838,6 +7838,12 @@
 { oid => '3312', descr => 'I/O',
   proname => 'tsm_handler_out', prorettype => 'cstring',
   proargtypes => 'tsm_handler', prosrc => 'tsm_handler_out' },
+{ oid => '8753', descr => 'I/O',
+  proname => 'copy_handler_in', proisstrict => 'f', prorettype => 'copy_handler',
+  proargtypes => 'cstring', prosrc => 'copy_handler_in' },
+{ oid => '8754', descr => 'I/O',
+  proname => 'copy_handler_out', prorettype => 'cstring',
+  proargtypes => 'copy_handler', prosrc => 'copy_handler_out' },
 { oid => '267', descr => 'I/O',
   proname => 'table_am_handler_in', proisstrict => 'f',
   prorettype => 'table_am_handler', proargtypes => 'cstring',
diff --git a/src/include/catalog/pg_type.dat b/src/include/catalog/pg_type.dat
index 6dca77e0a22..340e0cd0a8d 100644
--- a/src/include/catalog/pg_type.dat
+++ b/src/include/catalog/pg_type.dat
@@ -633,6 +633,12 @@
   typcategory => 'P', typinput => 'tsm_handler_in',
   typoutput => 'tsm_handler_out', typreceive => '-', typsend => '-',
   typalign => 'i' },
+{ oid => '8752',
+  descr => 'pseudo-type for the result of a copy to method function',
+  typname => 'copy_handler', typlen => '4', typbyval => 't', typtype => 'p',
+  typcategory => 'P', typinput => 'copy_handler_in',
+  typoutput => 'copy_handler_out', typreceive => '-', typsend => '-',
+  typalign => 'i' },
 { oid => '269',
   descr => 'pseudo-type for the result of a table AM handler function',
   typname => 'table_am_handler', typlen => '4', typbyval => 't', typtype => 'p',
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 06dfdfef721..332628d67cc 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -87,6 +87,7 @@ typedef struct CopyFormatOptions
     CopyLogVerbosityChoice log_verbosity;    /* verbosity of logged messages */
     int64        reject_limit;    /* maximum tolerable number of errors */
     List       *convert_select; /* list of column names (can be NIL) */
+    Oid            handler;        /* handler function for custom format routine */
 } CopyFormatOptions;
 
 /* These are private in commands/copy[from|to].c */
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 2a2d2f9876b..4f4ffabf882 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -22,6 +22,8 @@
  */
 typedef struct CopyToRoutine
 {
+    NodeTag        type;
+
     /*
      * Set output function information. This callback is called once at the
      * beginning of COPY TO.
diff --git a/src/include/nodes/meson.build b/src/include/nodes/meson.build
index d1ca24dd32f..96e70e7f38b 100644
--- a/src/include/nodes/meson.build
+++ b/src/include/nodes/meson.build
@@ -12,6 +12,7 @@ node_support_input_i = [
   'access/sdir.h',
   'access/tableam.h',
   'access/tsmapi.h',
+  'commands/copyapi.h',
   'commands/event_trigger.h',
   'commands/trigger.h',
   'executor/tuptable.h',
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index 4e4be3fa511..c9da440eed0 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -16,6 +16,7 @@ SUBDIRS = \
           spgist_name_ops \
           test_bloomfilter \
           test_copy_callbacks \
+          test_copy_format \
           test_custom_rmgrs \
           test_ddl_deparse \
           test_dsa \
diff --git a/src/test/modules/meson.build b/src/test/modules/meson.build
index 2b057451473..d33bbbd4092 100644
--- a/src/test/modules/meson.build
+++ b/src/test/modules/meson.build
@@ -15,6 +15,7 @@ subdir('spgist_name_ops')
 subdir('ssl_passphrase_callback')
 subdir('test_bloomfilter')
 subdir('test_copy_callbacks')
+subdir('test_copy_format')
 subdir('test_custom_rmgrs')
 subdir('test_ddl_deparse')
 subdir('test_dsa')
diff --git a/src/test/modules/test_copy_format/.gitignore b/src/test/modules/test_copy_format/.gitignore
new file mode 100644
index 00000000000..5dcb3ff9723
--- /dev/null
+++ b/src/test/modules/test_copy_format/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/src/test/modules/test_copy_format/Makefile b/src/test/modules/test_copy_format/Makefile
new file mode 100644
index 00000000000..8497f91624d
--- /dev/null
+++ b/src/test/modules/test_copy_format/Makefile
@@ -0,0 +1,23 @@
+# src/test/modules/test_copy_format/Makefile
+
+MODULE_big = test_copy_format
+OBJS = \
+    $(WIN32RES) \
+    test_copy_format.o
+PGFILEDESC = "test_copy_format - test custom COPY FORMAT"
+
+EXTENSION = test_copy_format
+DATA = test_copy_format--1.0.sql
+
+REGRESS = test_copy_format
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_copy_format
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out
b/src/test/modules/test_copy_format/expected/test_copy_format.out
new file mode 100644
index 00000000000..adfe7d1572a
--- /dev/null
+++ b/src/test/modules/test_copy_format/expected/test_copy_format.out
@@ -0,0 +1,17 @@
+CREATE EXTENSION test_copy_format;
+CREATE TABLE public.test (a smallint, b integer, c bigint);
+INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
+ERROR:  COPY format "test_copy_format" not recognized
+LINE 1: COPY public.test FROM stdin WITH (FORMAT 'test_copy_format')...
+                                          ^
+COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
+NOTICE:  test_copy_format: is_from=false
+NOTICE:  CopyToOutFunc: atttypid=21
+NOTICE:  CopyToOutFunc: atttypid=23
+NOTICE:  CopyToOutFunc: atttypid=20
+NOTICE:  CopyToStart: natts=3
+NOTICE:  CopyToOneRow: tts_nvalid=3
+NOTICE:  CopyToOneRow: tts_nvalid=3
+NOTICE:  CopyToOneRow: tts_nvalid=3
+NOTICE:  CopyToEnd
diff --git a/src/test/modules/test_copy_format/meson.build b/src/test/modules/test_copy_format/meson.build
new file mode 100644
index 00000000000..a45a2e0a039
--- /dev/null
+++ b/src/test/modules/test_copy_format/meson.build
@@ -0,0 +1,33 @@
+# Copyright (c) 2025, PostgreSQL Global Development Group
+
+test_copy_format_sources = files(
+  'test_copy_format.c',
+)
+
+if host_system == 'windows'
+  test_copy_format_sources += rc_lib_gen.process(win32ver_rc, extra_args: [
+    '--NAME', 'test_copy_format',
+    '--FILEDESC', 'test_copy_format - test custom COPY FORMAT',])
+endif
+
+test_copy_format = shared_module('test_copy_format',
+  test_copy_format_sources,
+  kwargs: pg_test_mod_args,
+)
+test_install_libs += test_copy_format
+
+test_install_data += files(
+  'test_copy_format.control',
+  'test_copy_format--1.0.sql',
+)
+
+tests += {
+  'name': 'test_copy_format',
+  'sd': meson.current_source_dir(),
+  'bd': meson.current_build_dir(),
+  'regress': {
+    'sql': [
+      'test_copy_format',
+    ],
+  },
+}
diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql
b/src/test/modules/test_copy_format/sql/test_copy_format.sql
new file mode 100644
index 00000000000..810b3d8cedc
--- /dev/null
+++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql
@@ -0,0 +1,5 @@
+CREATE EXTENSION test_copy_format;
+CREATE TABLE public.test (a smallint, b integer, c bigint);
+INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
+COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
diff --git a/src/test/modules/test_copy_format/test_copy_format--1.0.sql
b/src/test/modules/test_copy_format/test_copy_format--1.0.sql
new file mode 100644
index 00000000000..d24ea03ce99
--- /dev/null
+++ b/src/test/modules/test_copy_format/test_copy_format--1.0.sql
@@ -0,0 +1,8 @@
+/* src/test/modules/test_copy_format/test_copy_format--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_copy_format" to load this file. \quit
+
+CREATE FUNCTION test_copy_format(internal)
+    RETURNS copy_handler
+    AS 'MODULE_PATHNAME' LANGUAGE C;
diff --git a/src/test/modules/test_copy_format/test_copy_format.c
b/src/test/modules/test_copy_format/test_copy_format.c
new file mode 100644
index 00000000000..b42d472d851
--- /dev/null
+++ b/src/test/modules/test_copy_format/test_copy_format.c
@@ -0,0 +1,63 @@
+/*--------------------------------------------------------------------------
+ *
+ * test_copy_format.c
+ *        Code for testing custom COPY format.
+ *
+ * Portions Copyright (c) 2025, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *        src/test/modules/test_copy_format/test_copy_format.c
+ *
+ * -------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "commands/copyapi.h"
+#include "commands/defrem.h"
+
+PG_MODULE_MAGIC;
+
+static void
+CopyToOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo)
+{
+    ereport(NOTICE, (errmsg("CopyToOutFunc: atttypid=%d", atttypid)));
+}
+
+static void
+CopyToStart(CopyToState cstate, TupleDesc tupDesc)
+{
+    ereport(NOTICE, (errmsg("CopyToStart: natts=%d", tupDesc->natts)));
+}
+
+static void
+CopyToOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+    ereport(NOTICE, (errmsg("CopyToOneRow: tts_nvalid=%u", slot->tts_nvalid)));
+}
+
+static void
+CopyToEnd(CopyToState cstate)
+{
+    ereport(NOTICE, (errmsg("CopyToEnd")));
+}
+
+static const CopyToRoutine CopyToRoutineTestCopyFormat = {
+    .type = T_CopyToRoutine,
+    .CopyToOutFunc = CopyToOutFunc,
+    .CopyToStart = CopyToStart,
+    .CopyToOneRow = CopyToOneRow,
+    .CopyToEnd = CopyToEnd,
+};
+
+PG_FUNCTION_INFO_V1(test_copy_format);
+Datum
+test_copy_format(PG_FUNCTION_ARGS)
+{
+    bool        is_from = PG_GETARG_BOOL(0);
+
+    ereport(NOTICE,
+            (errmsg("test_copy_format: is_from=%s", is_from ? "true" : "false")));
+
+    PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat);
+}
diff --git a/src/test/modules/test_copy_format/test_copy_format.control
b/src/test/modules/test_copy_format/test_copy_format.control
new file mode 100644
index 00000000000..f05a6362358
--- /dev/null
+++ b/src/test/modules/test_copy_format/test_copy_format.control
@@ -0,0 +1,4 @@
+comment = 'Test code for custom COPY format'
+default_version = '1.0'
+module_pathname = '$libdir/test_copy_format'
+relocatable = true
-- 
2.47.2
From b1a433daaf8296248c22c34f4a27a9db27633506 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Mon, 25 Nov 2024 13:58:33 +0900
Subject: [PATCH v38 2/9] Export CopyToStateData as private data
It's for custom COPY TO format handlers implemented as extension.
This just moves codes. This doesn't change codes except CopyDest enum
values. CopyDest/CopyFrom enum values such as COPY_FILE are conflicted
each other. So COPY_DEST_ prefix instead of COPY_ prefix is used for
CopyDest enum values. For example, COPY_FILE in CopyDest is renamed to
COPY_DEST_FILE.
Note that this isn't enough to implement custom COPY TO format
handlers as extension. We'll do the followings in a subsequent commit:
1. Add an opaque space for custom COPY TO format handler
2. Export CopySendEndOfRow() to flush buffer
---
 src/backend/commands/copyto.c          | 78 +++---------------------
 src/include/commands/copy.h            |  2 +-
 src/include/commands/copyto_internal.h | 83 ++++++++++++++++++++++++++
 3 files changed, 93 insertions(+), 70 deletions(-)
 create mode 100644 src/include/commands/copyto_internal.h
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index fce8501dc30..99c2f2dd699 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -20,6 +20,7 @@
 
 #include "access/tableam.h"
 #include "commands/copyapi.h"
+#include "commands/copyto_internal.h"
 #include "commands/progress.h"
 #include "executor/execdesc.h"
 #include "executor/executor.h"
@@ -36,67 +37,6 @@
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
-/*
- * Represents the different dest cases we need to worry about at
- * the bottom level
- */
-typedef enum CopyDest
-{
-    COPY_FILE,                    /* to file (or a piped program) */
-    COPY_FRONTEND,                /* to frontend */
-    COPY_CALLBACK,                /* to callback function */
-} CopyDest;
-
-/*
- * This struct contains all the state variables used throughout a COPY TO
- * operation.
- *
- * Multi-byte encodings: all supported client-side encodings encode multi-byte
- * characters by having the first byte's high bit set. Subsequent bytes of the
- * character can have the high bit not set. When scanning data in such an
- * encoding to look for a match to a single-byte (ie ASCII) character, we must
- * use the full pg_encoding_mblen() machinery to skip over multibyte
- * characters, else we might find a false match to a trailing byte. In
- * supported server encodings, there is no possibility of a false match, and
- * it's faster to make useless comparisons to trailing bytes than it is to
- * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true
- * when we have to do it the hard way.
- */
-typedef struct CopyToStateData
-{
-    /* format-specific routines */
-    const CopyToRoutine *routine;
-
-    /* low-level state data */
-    CopyDest    copy_dest;        /* type of copy source/destination */
-    FILE       *copy_file;        /* used if copy_dest == COPY_FILE */
-    StringInfo    fe_msgbuf;        /* used for all dests during COPY TO */
-
-    int            file_encoding;    /* file or remote side's character encoding */
-    bool        need_transcoding;    /* file encoding diff from server? */
-    bool        encoding_embeds_ascii;    /* ASCII can be non-first byte? */
-
-    /* parameters from the COPY command */
-    Relation    rel;            /* relation to copy to */
-    QueryDesc  *queryDesc;        /* executable query to copy from */
-    List       *attnumlist;        /* integer list of attnums to copy */
-    char       *filename;        /* filename, or NULL for STDOUT */
-    bool        is_program;        /* is 'filename' a program to popen? */
-    copy_data_dest_cb data_dest_cb; /* function for writing data */
-
-    CopyFormatOptions opts;
-    Node       *whereClause;    /* WHERE condition (or NULL) */
-
-    /*
-     * Working state
-     */
-    MemoryContext copycontext;    /* per-copy execution context */
-
-    FmgrInfo   *out_functions;    /* lookup info for output functions */
-    MemoryContext rowcontext;    /* per-row evaluation context */
-    uint64        bytes_processed;    /* number of bytes processed so far */
-} CopyToStateData;
-
 /* DestReceiver for COPY (query) TO */
 typedef struct
 {
@@ -421,7 +361,7 @@ SendCopyBegin(CopyToState cstate)
     for (i = 0; i < natts; i++)
         pq_sendint16(&buf, format); /* per-column formats */
     pq_endmessage(&buf);
-    cstate->copy_dest = COPY_FRONTEND;
+    cstate->copy_dest = COPY_DEST_FRONTEND;
 }
 
 static void
@@ -468,7 +408,7 @@ CopySendEndOfRow(CopyToState cstate)
 
     switch (cstate->copy_dest)
     {
-        case COPY_FILE:
+        case COPY_DEST_FILE:
             if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1,
                        cstate->copy_file) != 1 ||
                 ferror(cstate->copy_file))
@@ -502,11 +442,11 @@ CopySendEndOfRow(CopyToState cstate)
                              errmsg("could not write to COPY file: %m")));
             }
             break;
-        case COPY_FRONTEND:
+        case COPY_DEST_FRONTEND:
             /* Dump the accumulated row as one CopyData message */
             (void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len);
             break;
-        case COPY_CALLBACK:
+        case COPY_DEST_CALLBACK:
             cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len);
             break;
     }
@@ -527,7 +467,7 @@ CopySendTextLikeEndOfRow(CopyToState cstate)
 {
     switch (cstate->copy_dest)
     {
-        case COPY_FILE:
+        case COPY_DEST_FILE:
             /* Default line termination depends on platform */
 #ifndef WIN32
             CopySendChar(cstate, '\n');
@@ -535,7 +475,7 @@ CopySendTextLikeEndOfRow(CopyToState cstate)
             CopySendString(cstate, "\r\n");
 #endif
             break;
-        case COPY_FRONTEND:
+        case COPY_DEST_FRONTEND:
             /* The FE/BE protocol uses \n as newline for all platforms */
             CopySendChar(cstate, '\n');
             break;
@@ -920,12 +860,12 @@ BeginCopyTo(ParseState *pstate,
     /* See Multibyte encoding comment above */
     cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding);
 
-    cstate->copy_dest = COPY_FILE;    /* default */
+    cstate->copy_dest = COPY_DEST_FILE; /* default */
 
     if (data_dest_cb)
     {
         progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK;
-        cstate->copy_dest = COPY_CALLBACK;
+        cstate->copy_dest = COPY_DEST_CALLBACK;
         cstate->data_dest_cb = data_dest_cb;
     }
     else if (pipe)
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 332628d67cc..6df1f8a3b9b 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -90,7 +90,7 @@ typedef struct CopyFormatOptions
     Oid            handler;        /* handler function for custom format routine */
 } CopyFormatOptions;
 
-/* These are private in commands/copy[from|to].c */
+/* These are private in commands/copy[from|to]_internal.h */
 typedef struct CopyFromStateData *CopyFromState;
 typedef struct CopyToStateData *CopyToState;
 
diff --git a/src/include/commands/copyto_internal.h b/src/include/commands/copyto_internal.h
new file mode 100644
index 00000000000..1b58b36c0a3
--- /dev/null
+++ b/src/include/commands/copyto_internal.h
@@ -0,0 +1,83 @@
+/*-------------------------------------------------------------------------
+ *
+ * copyto_internal.h
+ *      Internal definitions for COPY TO command.
+ *
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/commands/copyto_internal.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef COPYTO_INTERNAL_H
+#define COPYTO_INTERNAL_H
+
+#include "commands/copy.h"
+#include "executor/execdesc.h"
+#include "executor/tuptable.h"
+#include "nodes/execnodes.h"
+
+/*
+ * Represents the different dest cases we need to worry about at
+ * the bottom level
+ */
+typedef enum CopyDest
+{
+    COPY_DEST_FILE,                /* to file (or a piped program) */
+    COPY_DEST_FRONTEND,            /* to frontend */
+    COPY_DEST_CALLBACK,            /* to callback function */
+} CopyDest;
+
+/*
+ * This struct contains all the state variables used throughout a COPY TO
+ * operation.
+ *
+ * Multi-byte encodings: all supported client-side encodings encode multi-byte
+ * characters by having the first byte's high bit set. Subsequent bytes of the
+ * character can have the high bit not set. When scanning data in such an
+ * encoding to look for a match to a single-byte (ie ASCII) character, we must
+ * use the full pg_encoding_mblen() machinery to skip over multibyte
+ * characters, else we might find a false match to a trailing byte. In
+ * supported server encodings, there is no possibility of a false match, and
+ * it's faster to make useless comparisons to trailing bytes than it is to
+ * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true
+ * when we have to do it the hard way.
+ */
+typedef struct CopyToStateData
+{
+    /* format-specific routines */
+    const struct CopyToRoutine *routine;
+
+    /* low-level state data */
+    CopyDest    copy_dest;        /* type of copy source/destination */
+    FILE       *copy_file;        /* used if copy_dest == COPY_FILE */
+    StringInfo    fe_msgbuf;        /* used for all dests during COPY TO */
+
+    int            file_encoding;    /* file or remote side's character encoding */
+    bool        need_transcoding;    /* file encoding diff from server? */
+    bool        encoding_embeds_ascii;    /* ASCII can be non-first byte? */
+
+    /* parameters from the COPY command */
+    Relation    rel;            /* relation to copy to */
+    QueryDesc  *queryDesc;        /* executable query to copy from */
+    List       *attnumlist;        /* integer list of attnums to copy */
+    char       *filename;        /* filename, or NULL for STDOUT */
+    bool        is_program;        /* is 'filename' a program to popen? */
+    copy_data_dest_cb data_dest_cb; /* function for writing data */
+
+    CopyFormatOptions opts;
+    Node       *whereClause;    /* WHERE condition (or NULL) */
+
+    /*
+     * Working state
+     */
+    MemoryContext copycontext;    /* per-copy execution context */
+
+    FmgrInfo   *out_functions;    /* lookup info for output functions */
+    MemoryContext rowcontext;    /* per-row evaluation context */
+    uint64        bytes_processed;    /* number of bytes processed so far */
+} CopyToStateData;
+
+#endif                            /* COPYTO_INTERNAL_H */
-- 
2.47.2
From d205c641dc9913d7a8e0e81a7b614f9e79d13390 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Mon, 25 Nov 2024 14:01:18 +0900
Subject: [PATCH v38 3/9] Add support for implementing custom COPY TO format as
 extension
* Add CopyToStateData::opaque that can be used to keep data for custom
  COPY TO format implementation
* Export CopySendEndOfRow() to flush data in CopyToStateData::fe_msgbuf
  as CopyToStateFlush()
---
 src/backend/commands/copyto.c          | 12 ++++++++++++
 src/include/commands/copyapi.h         |  2 ++
 src/include/commands/copyto_internal.h |  3 +++
 3 files changed, 17 insertions(+)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 99c2f2dd699..f5ed3efbace 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -458,6 +458,18 @@ CopySendEndOfRow(CopyToState cstate)
     resetStringInfo(fe_msgbuf);
 }
 
+/*
+ * Export CopySendEndOfRow() for extensions. We want to keep
+ * CopySendEndOfRow() as a static function for
+ * optimization. CopySendEndOfRow() calls in this file may be optimized by a
+ * compiler.
+ */
+void
+CopyToStateFlush(CopyToState cstate)
+{
+    CopySendEndOfRow(cstate);
+}
+
 /*
  * Wrapper function of CopySendEndOfRow for text and CSV formats. Sends the
  * line termination and do common appropriate things for the end of row.
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 4f4ffabf882..5c5ea6592e3 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -56,6 +56,8 @@ typedef struct CopyToRoutine
     void        (*CopyToEnd) (CopyToState cstate);
 } CopyToRoutine;
 
+extern void CopyToStateFlush(CopyToState cstate);
+
 /*
  * API structure for a COPY FROM format implementation. Note this must be
  * allocated in a server-lifetime manner, typically as a static const struct.
diff --git a/src/include/commands/copyto_internal.h b/src/include/commands/copyto_internal.h
index 1b58b36c0a3..ce1c33a4004 100644
--- a/src/include/commands/copyto_internal.h
+++ b/src/include/commands/copyto_internal.h
@@ -78,6 +78,9 @@ typedef struct CopyToStateData
     FmgrInfo   *out_functions;    /* lookup info for output functions */
     MemoryContext rowcontext;    /* per-row evaluation context */
     uint64        bytes_processed;    /* number of bytes processed so far */
+
+    /* For custom format implementation */
+    void       *opaque;            /* private space */
 } CopyToStateData;
 
 #endif                            /* COPYTO_INTERNAL_H */
-- 
2.47.2
From 60527d56abfe49aeb114236c14001baf71b38ba5 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Mon, 25 Nov 2024 14:11:55 +0900
Subject: [PATCH v38 4/9] Add support for adding custom COPY FROM format
This uses the same handler for COPY TO and COPY FROM but uses
different routine. This uses CopyToRoutine for COPY TO and
CopyFromRoutine for COPY FROM. PostgreSQL calls a COPY TO/FROM handler
with "is_from" argument. It's true for COPY FROM and false for COPY
TO:
    copy_handler(true) returns CopyToRoutine
    copy_handler(false) returns CopyFromRoutine
This also add a test module for custom COPY FROM handler.
---
 src/backend/commands/copy.c                   | 13 +++----
 src/backend/commands/copyfrom.c               | 28 +++++++++++--
 src/include/catalog/pg_type.dat               |  2 +-
 src/include/commands/copyapi.h                |  2 +
 .../expected/test_copy_format.out             | 10 +++--
 .../test_copy_format/sql/test_copy_format.sql |  1 +
 .../test_copy_format/test_copy_format.c       | 39 ++++++++++++++++++-
 7 files changed, 78 insertions(+), 17 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 8d94bc313eb..b4417bb6819 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -483,8 +483,8 @@ defGetCopyLogVerbosityChoice(DefElem *def, ParseState *pstate)
  * This function checks whether the option value is a built-in format such as
  * "text" and "csv" or not. If the option value isn't a built-in format, this
  * function finds a COPY format handler that returns a CopyToRoutine (for
- * is_from == false). If no COPY format handler is found, this function
- * reports an error.
+ * is_from == false) or CopyFromRountine (for is_from == true). If no COPY
+ * format handler is found, this function reports an error.
  */
 static void
 ProcessCopyOptionFormat(ParseState *pstate,
@@ -518,12 +518,9 @@ ProcessCopyOptionFormat(ParseState *pstate,
     }
 
     /* custom format */
-    if (!is_from)
-    {
-        funcargtypes[0] = INTERNALOID;
-        handlerOid = LookupFuncName(list_make1(makeString(format)), 1,
-                                    funcargtypes, true);
-    }
+    funcargtypes[0] = INTERNALOID;
+    handlerOid = LookupFuncName(list_make1(makeString(format)), 1,
+                                funcargtypes, true);
     if (!OidIsValid(handlerOid))
         ereport(ERROR,
                 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index bcf66f0adf8..0809766f910 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -129,6 +129,7 @@ static void CopyFromBinaryEnd(CopyFromState cstate);
 
 /* text format */
 static const CopyFromRoutine CopyFromRoutineText = {
+    .type = T_CopyFromRoutine,
     .CopyFromInFunc = CopyFromTextLikeInFunc,
     .CopyFromStart = CopyFromTextLikeStart,
     .CopyFromOneRow = CopyFromTextOneRow,
@@ -137,6 +138,7 @@ static const CopyFromRoutine CopyFromRoutineText = {
 
 /* CSV format */
 static const CopyFromRoutine CopyFromRoutineCSV = {
+    .type = T_CopyFromRoutine,
     .CopyFromInFunc = CopyFromTextLikeInFunc,
     .CopyFromStart = CopyFromTextLikeStart,
     .CopyFromOneRow = CopyFromCSVOneRow,
@@ -145,6 +147,7 @@ static const CopyFromRoutine CopyFromRoutineCSV = {
 
 /* binary format */
 static const CopyFromRoutine CopyFromRoutineBinary = {
+    .type = T_CopyFromRoutine,
     .CopyFromInFunc = CopyFromBinaryInFunc,
     .CopyFromStart = CopyFromBinaryStart,
     .CopyFromOneRow = CopyFromBinaryOneRow,
@@ -155,13 +158,30 @@ static const CopyFromRoutine CopyFromRoutineBinary = {
 static const CopyFromRoutine *
 CopyFromGetRoutine(const CopyFormatOptions *opts)
 {
-    if (opts->csv_mode)
+    if (OidIsValid(opts->handler))
+    {
+        Datum        datum;
+        Node       *routine;
+
+        datum = OidFunctionCall1(opts->handler, BoolGetDatum(true));
+        routine = (Node *) DatumGetPointer(datum);
+        if (routine == NULL || !IsA(routine, CopyFromRoutine))
+            ereport(
+                    ERROR,
+                    (errcode(
+                             ERRCODE_INVALID_PARAMETER_VALUE),
+                     errmsg("COPY handler function "
+                            "%u did not return "
+                            "CopyFromRoutine struct",
+                            opts->handler)));
+        return castNode(CopyFromRoutine, routine);
+    }
+    else if (opts->csv_mode)
         return &CopyFromRoutineCSV;
     else if (opts->binary)
         return &CopyFromRoutineBinary;
-
-    /* default is text */
-    return &CopyFromRoutineText;
+    else
+        return &CopyFromRoutineText;
 }
 
 /* Implementation of the start callback for text and CSV formats */
diff --git a/src/include/catalog/pg_type.dat b/src/include/catalog/pg_type.dat
index 340e0cd0a8d..63b7d65f982 100644
--- a/src/include/catalog/pg_type.dat
+++ b/src/include/catalog/pg_type.dat
@@ -634,7 +634,7 @@
   typoutput => 'tsm_handler_out', typreceive => '-', typsend => '-',
   typalign => 'i' },
 { oid => '8752',
-  descr => 'pseudo-type for the result of a copy to method function',
+  descr => 'pseudo-type for the result of a copy to/from method function',
   typname => 'copy_handler', typlen => '4', typbyval => 't', typtype => 'p',
   typcategory => 'P', typinput => 'copy_handler_in',
   typoutput => 'copy_handler_out', typreceive => '-', typsend => '-',
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 5c5ea6592e3..895c105d8d8 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -64,6 +64,8 @@ extern void CopyToStateFlush(CopyToState cstate);
  */
 typedef struct CopyFromRoutine
 {
+    NodeTag        type;
+
     /*
      * Set input function information. This callback is called once at the
      * beginning of COPY FROM.
diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out
b/src/test/modules/test_copy_format/expected/test_copy_format.out
index adfe7d1572a..016893e7026 100644
--- a/src/test/modules/test_copy_format/expected/test_copy_format.out
+++ b/src/test/modules/test_copy_format/expected/test_copy_format.out
@@ -2,9 +2,13 @@ CREATE EXTENSION test_copy_format;
 CREATE TABLE public.test (a smallint, b integer, c bigint);
 INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
 COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
-ERROR:  COPY format "test_copy_format" not recognized
-LINE 1: COPY public.test FROM stdin WITH (FORMAT 'test_copy_format')...
-                                          ^
+NOTICE:  test_copy_format: is_from=true
+NOTICE:  CopyFromInFunc: atttypid=21
+NOTICE:  CopyFromInFunc: atttypid=23
+NOTICE:  CopyFromInFunc: atttypid=20
+NOTICE:  CopyFromStart: natts=3
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromEnd
 COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
 NOTICE:  test_copy_format: is_from=false
 NOTICE:  CopyToOutFunc: atttypid=21
diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql
b/src/test/modules/test_copy_format/sql/test_copy_format.sql
index 810b3d8cedc..0dfdfa00080 100644
--- a/src/test/modules/test_copy_format/sql/test_copy_format.sql
+++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql
@@ -2,4 +2,5 @@ CREATE EXTENSION test_copy_format;
 CREATE TABLE public.test (a smallint, b integer, c bigint);
 INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
 COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
+\.
 COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
diff --git a/src/test/modules/test_copy_format/test_copy_format.c
b/src/test/modules/test_copy_format/test_copy_format.c
index b42d472d851..abafc668463 100644
--- a/src/test/modules/test_copy_format/test_copy_format.c
+++ b/src/test/modules/test_copy_format/test_copy_format.c
@@ -18,6 +18,40 @@
 
 PG_MODULE_MAGIC;
 
+static void
+CopyFromInFunc(CopyFromState cstate, Oid atttypid,
+               FmgrInfo *finfo, Oid *typioparam)
+{
+    ereport(NOTICE, (errmsg("CopyFromInFunc: atttypid=%d", atttypid)));
+}
+
+static void
+CopyFromStart(CopyFromState cstate, TupleDesc tupDesc)
+{
+    ereport(NOTICE, (errmsg("CopyFromStart: natts=%d", tupDesc->natts)));
+}
+
+static bool
+CopyFromOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls)
+{
+    ereport(NOTICE, (errmsg("CopyFromOneRow")));
+    return false;
+}
+
+static void
+CopyFromEnd(CopyFromState cstate)
+{
+    ereport(NOTICE, (errmsg("CopyFromEnd")));
+}
+
+static const CopyFromRoutine CopyFromRoutineTestCopyFormat = {
+    .type = T_CopyFromRoutine,
+    .CopyFromInFunc = CopyFromInFunc,
+    .CopyFromStart = CopyFromStart,
+    .CopyFromOneRow = CopyFromOneRow,
+    .CopyFromEnd = CopyFromEnd,
+};
+
 static void
 CopyToOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo)
 {
@@ -59,5 +93,8 @@ test_copy_format(PG_FUNCTION_ARGS)
     ereport(NOTICE,
             (errmsg("test_copy_format: is_from=%s", is_from ? "true" : "false")));
 
-    PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat);
+    if (is_from)
+        PG_RETURN_POINTER(&CopyFromRoutineTestCopyFormat);
+    else
+        PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat);
 }
-- 
2.47.2
From 48a363f84799bb4a4323e90bd425d2fd54c2eef2 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Mon, 25 Nov 2024 14:19:34 +0900
Subject: [PATCH v38 5/9] Use COPY_SOURCE_ prefix for CopySource enum values
This is for consistency with CopyDest.
---
 src/backend/commands/copyfrom.c          |  4 ++--
 src/backend/commands/copyfromparse.c     | 10 +++++-----
 src/include/commands/copyfrom_internal.h |  6 +++---
 3 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 0809766f910..76662e04260 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1729,7 +1729,7 @@ BeginCopyFrom(ParseState *pstate,
                             pg_encoding_to_char(GetDatabaseEncoding()))));
     }
 
-    cstate->copy_src = COPY_FILE;    /* default */
+    cstate->copy_src = COPY_SOURCE_FILE;    /* default */
 
     cstate->whereClause = whereClause;
 
@@ -1857,7 +1857,7 @@ BeginCopyFrom(ParseState *pstate,
     if (data_source_cb)
     {
         progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK;
-        cstate->copy_src = COPY_CALLBACK;
+        cstate->copy_src = COPY_SOURCE_CALLBACK;
         cstate->data_source_cb = data_source_cb;
     }
     else if (pipe)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index e8128f85e6b..17e51f02e04 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -180,7 +180,7 @@ ReceiveCopyBegin(CopyFromState cstate)
     for (i = 0; i < natts; i++)
         pq_sendint16(&buf, format); /* per-column formats */
     pq_endmessage(&buf);
-    cstate->copy_src = COPY_FRONTEND;
+    cstate->copy_src = COPY_SOURCE_FRONTEND;
     cstate->fe_msgbuf = makeStringInfo();
     /* We *must* flush here to ensure FE knows it can send. */
     pq_flush();
@@ -248,7 +248,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
 
     switch (cstate->copy_src)
     {
-        case COPY_FILE:
+        case COPY_SOURCE_FILE:
             bytesread = fread(databuf, 1, maxread, cstate->copy_file);
             if (ferror(cstate->copy_file))
                 ereport(ERROR,
@@ -257,7 +257,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
             if (bytesread == 0)
                 cstate->raw_reached_eof = true;
             break;
-        case COPY_FRONTEND:
+        case COPY_SOURCE_FRONTEND:
             while (maxread > 0 && bytesread < minread && !cstate->raw_reached_eof)
             {
                 int            avail;
@@ -340,7 +340,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
                 bytesread += avail;
             }
             break;
-        case COPY_CALLBACK:
+        case COPY_SOURCE_CALLBACK:
             bytesread = cstate->data_source_cb(databuf, minread, maxread);
             break;
     }
@@ -1172,7 +1172,7 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
          * after \. up to the protocol end of copy data.  (XXX maybe better
          * not to treat \. as special?)
          */
-        if (cstate->copy_src == COPY_FRONTEND)
+        if (cstate->copy_src == COPY_SOURCE_FRONTEND)
         {
             int            inbytes;
 
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index c8b22af22d8..3a306e3286e 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -24,9 +24,9 @@
  */
 typedef enum CopySource
 {
-    COPY_FILE,                    /* from file (or a piped program) */
-    COPY_FRONTEND,                /* from frontend */
-    COPY_CALLBACK,                /* from callback function */
+    COPY_SOURCE_FILE,            /* from file (or a piped program) */
+    COPY_SOURCE_FRONTEND,        /* from frontend */
+    COPY_SOURCE_CALLBACK,        /* from callback function */
 } CopySource;
 
 /*
-- 
2.47.2
From a52949f4b6410bebbca4ef8ef026944ad0ad6fff Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Mon, 25 Nov 2024 14:21:39 +0900
Subject: [PATCH v38 6/9] Add support for implementing custom COPY FROM format
 as extension
* Add CopyFromStateData::opaque that can be used to keep data for
  custom COPY From format implementation
* Export CopyGetData() to get the next data as
  CopyFromStateGetData()
---
 src/backend/commands/copyfromparse.c     | 11 +++++++++++
 src/include/commands/copyapi.h           |  2 ++
 src/include/commands/copyfrom_internal.h |  3 +++
 3 files changed, 16 insertions(+)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 17e51f02e04..d8fd238e72b 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -739,6 +739,17 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes)
     return copied_bytes;
 }
 
+/*
+ * Export CopyGetData() for extensions. We want to keep CopyGetData() as a
+ * static function for optimization. CopyGetData() calls in this file may be
+ * optimized by a compiler.
+ */
+int
+CopyFromStateGetData(CopyFromState cstate, void *dest, int minread, int maxread)
+{
+    return CopyGetData(cstate, dest, minread, maxread);
+}
+
 /*
  * This function is exposed for use by extensions that read raw fields in the
  * next line. See NextCopyFromRawFieldsInternal() for details.
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 895c105d8d8..2044d8b8c4c 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -108,4 +108,6 @@ typedef struct CopyFromRoutine
     void        (*CopyFromEnd) (CopyFromState cstate);
 } CopyFromRoutine;
 
+extern int    CopyFromStateGetData(CopyFromState cstate, void *dest, int minread, int maxread);
+
 #endif                            /* COPYAPI_H */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 3a306e3286e..af425cf5fd9 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -181,6 +181,9 @@ typedef struct CopyFromStateData
 #define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index)
 
     uint64        bytes_processed;    /* number of bytes processed so far */
+
+    /* For custom format implementation */
+    void       *opaque;            /* private space */
 } CopyFromStateData;
 
 extern void ReceiveCopyBegin(CopyFromState cstate);
-- 
2.47.2
From d1a14456b1bd27a10769fbf47d5a47800bd16e0d Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Wed, 27 Nov 2024 16:23:55 +0900
Subject: [PATCH v38 7/9] Add CopyFromSkipErrorRow() for custom COPY format
 extension
Extensions must call CopyFromSkipErrorRow() when CopyFromOneRow
callback reports an error by errsave(). CopyFromSkipErrorRow() handles
"ON_ERROR stop" and "LOG_VERBOSITY verbose" cases.
---
 src/backend/commands/copyfromparse.c          | 82 +++++++++++--------
 src/include/commands/copyapi.h                |  2 +
 .../expected/test_copy_format.out             | 47 +++++++++++
 .../test_copy_format/sql/test_copy_format.sql | 24 ++++++
 .../test_copy_format/test_copy_format.c       | 80 +++++++++++++++++-
 5 files changed, 198 insertions(+), 37 deletions(-)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index d8fd238e72b..2070f51a963 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -938,6 +938,51 @@ CopyFromCSVOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values,
     return CopyFromTextLikeOneRow(cstate, econtext, values, nulls, true);
 }
 
+/*
+ * Call this when you report an error by errsave() in your CopyFromOneRow
+ * callback. This handles "ON_ERROR stop" and "LOG_VERBOSITY verbose" cases
+ * for you.
+ */
+void
+CopyFromSkipErrorRow(CopyFromState cstate)
+{
+    Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
+
+    cstate->num_errors++;
+
+    if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+    {
+        /*
+         * Since we emit line number and column info in the below notice
+         * message, we suppress error context information other than the
+         * relation name.
+         */
+        Assert(!cstate->relname_only);
+        cstate->relname_only = true;
+
+        if (cstate->cur_attval)
+        {
+            char       *attval;
+
+            attval = CopyLimitPrintoutLength(cstate->cur_attval);
+            ereport(NOTICE,
+                    errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": \"%s\"",
+                           (unsigned long long) cstate->cur_lineno,
+                           cstate->cur_attname,
+                           attval));
+            pfree(attval);
+        }
+        else
+            ereport(NOTICE,
+                    errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": null
input",
+                           (unsigned long long) cstate->cur_lineno,
+                           cstate->cur_attname));
+
+        /* reset relname_only */
+        cstate->relname_only = false;
+    }
+}
+
 /*
  * Workhorse for CopyFromTextOneRow() and CopyFromCSVOneRow().
  *
@@ -1044,42 +1089,7 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
                                         (Node *) cstate->escontext,
                                         &values[m]))
         {
-            Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
-
-            cstate->num_errors++;
-
-            if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
-            {
-                /*
-                 * Since we emit line number and column info in the below
-                 * notice message, we suppress error context information other
-                 * than the relation name.
-                 */
-                Assert(!cstate->relname_only);
-                cstate->relname_only = true;
-
-                if (cstate->cur_attval)
-                {
-                    char       *attval;
-
-                    attval = CopyLimitPrintoutLength(cstate->cur_attval);
-                    ereport(NOTICE,
-                            errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\":
\"%s\"",
-                                   (unsigned long long) cstate->cur_lineno,
-                                   cstate->cur_attname,
-                                   attval));
-                    pfree(attval);
-                }
-                else
-                    ereport(NOTICE,
-                            errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": null
input",
-                                   (unsigned long long) cstate->cur_lineno,
-                                   cstate->cur_attname));
-
-                /* reset relname_only */
-                cstate->relname_only = false;
-            }
-
+            CopyFromSkipErrorRow(cstate);
             return true;
         }
 
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 2044d8b8c4c..500ece7d5bb 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -110,4 +110,6 @@ typedef struct CopyFromRoutine
 
 extern int    CopyFromStateGetData(CopyFromState cstate, void *dest, int minread, int maxread);
 
+extern void CopyFromSkipErrorRow(CopyFromState cstate);
+
 #endif                            /* COPYAPI_H */
diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out
b/src/test/modules/test_copy_format/expected/test_copy_format.out
index 016893e7026..b9a6baa85c0 100644
--- a/src/test/modules/test_copy_format/expected/test_copy_format.out
+++ b/src/test/modules/test_copy_format/expected/test_copy_format.out
@@ -1,6 +1,8 @@
 CREATE EXTENSION test_copy_format;
 CREATE TABLE public.test (a smallint, b integer, c bigint);
 INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+-- 987 is accepted.
+-- 654 is a hard error because ON_ERROR is stop by default.
 COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
 NOTICE:  test_copy_format: is_from=true
 NOTICE:  CopyFromInFunc: atttypid=21
@@ -8,7 +10,50 @@ NOTICE:  CopyFromInFunc: atttypid=23
 NOTICE:  CopyFromInFunc: atttypid=20
 NOTICE:  CopyFromStart: natts=3
 NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromOneRow
+ERROR:  invalid value: "6"
+CONTEXT:  COPY test, line 2, column a: "6"
+-- 987 is accepted.
+-- 654 is a soft error because ON_ERROR is ignore.
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore);
+NOTICE:  test_copy_format: is_from=true
+NOTICE:  CopyFromInFunc: atttypid=21
+NOTICE:  CopyFromInFunc: atttypid=23
+NOTICE:  CopyFromInFunc: atttypid=20
+NOTICE:  CopyFromStart: natts=3
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromOneRow
+NOTICE:  1 row was skipped due to data type incompatibility
 NOTICE:  CopyFromEnd
+-- 987 is accepted.
+-- 654 is a soft error because ON_ERROR is ignore.
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore, LOG_VERBOSITY verbose);
+NOTICE:  test_copy_format: is_from=true
+NOTICE:  CopyFromInFunc: atttypid=21
+NOTICE:  CopyFromInFunc: atttypid=23
+NOTICE:  CopyFromInFunc: atttypid=20
+NOTICE:  CopyFromStart: natts=3
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromOneRow
+NOTICE:  skipping row due to data type incompatibility at line 2 for column "a": "6"
+NOTICE:  CopyFromOneRow
+NOTICE:  1 row was skipped due to data type incompatibility
+NOTICE:  CopyFromEnd
+-- 987 is accepted.
+-- 654 is a soft error because ON_ERROR is ignore.
+-- 321 is a hard error.
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore);
+NOTICE:  test_copy_format: is_from=true
+NOTICE:  CopyFromInFunc: atttypid=21
+NOTICE:  CopyFromInFunc: atttypid=23
+NOTICE:  CopyFromInFunc: atttypid=20
+NOTICE:  CopyFromStart: natts=3
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromOneRow
+ERROR:  too much lines: 3
+CONTEXT:  COPY test, line 3
 COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
 NOTICE:  test_copy_format: is_from=false
 NOTICE:  CopyToOutFunc: atttypid=21
@@ -18,4 +63,6 @@ NOTICE:  CopyToStart: natts=3
 NOTICE:  CopyToOneRow: tts_nvalid=3
 NOTICE:  CopyToOneRow: tts_nvalid=3
 NOTICE:  CopyToOneRow: tts_nvalid=3
+NOTICE:  CopyToOneRow: tts_nvalid=3
+NOTICE:  CopyToOneRow: tts_nvalid=3
 NOTICE:  CopyToEnd
diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql
b/src/test/modules/test_copy_format/sql/test_copy_format.sql
index 0dfdfa00080..86db71bce7f 100644
--- a/src/test/modules/test_copy_format/sql/test_copy_format.sql
+++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql
@@ -1,6 +1,30 @@
 CREATE EXTENSION test_copy_format;
 CREATE TABLE public.test (a smallint, b integer, c bigint);
 INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+-- 987 is accepted.
+-- 654 is a hard error because ON_ERROR is stop by default.
 COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
+987
+654
+\.
+-- 987 is accepted.
+-- 654 is a soft error because ON_ERROR is ignore.
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore);
+987
+654
+\.
+-- 987 is accepted.
+-- 654 is a soft error because ON_ERROR is ignore.
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore, LOG_VERBOSITY verbose);
+987
+654
+\.
+-- 987 is accepted.
+-- 654 is a soft error because ON_ERROR is ignore.
+-- 321 is a hard error.
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore);
+987
+654
+321
 \.
 COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
diff --git a/src/test/modules/test_copy_format/test_copy_format.c
b/src/test/modules/test_copy_format/test_copy_format.c
index abafc668463..96a54dab7ec 100644
--- a/src/test/modules/test_copy_format/test_copy_format.c
+++ b/src/test/modules/test_copy_format/test_copy_format.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 
 #include "commands/copyapi.h"
+#include "commands/copyfrom_internal.h"
 #include "commands/defrem.h"
 
 PG_MODULE_MAGIC;
@@ -34,8 +35,85 @@ CopyFromStart(CopyFromState cstate, TupleDesc tupDesc)
 static bool
 CopyFromOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls)
 {
+    int            n_attributes = list_length(cstate->attnumlist);
+    char       *line;
+    int            line_size = n_attributes + 1;    /* +1 is for new line */
+    int            read_bytes;
+
     ereport(NOTICE, (errmsg("CopyFromOneRow")));
-    return false;
+
+    cstate->cur_lineno++;
+    line = palloc(line_size);
+    read_bytes = CopyFromStateGetData(cstate, line, line_size, line_size);
+    if (read_bytes == 0)
+        return false;
+    if (read_bytes != line_size)
+        ereport(ERROR,
+                (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                 errmsg("one line must be %d bytes: %d",
+                        line_size, read_bytes)));
+
+    if (cstate->cur_lineno == 1)
+    {
+        /* Success */
+        TupleDesc    tupDesc = RelationGetDescr(cstate->rel);
+        ListCell   *cur;
+        int            i = 0;
+
+        foreach(cur, cstate->attnumlist)
+        {
+            int            attnum = lfirst_int(cur);
+            int            m = attnum - 1;
+            Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+            if (att->atttypid == INT2OID)
+            {
+                values[i] = Int16GetDatum(line[i] - '0');
+            }
+            else if (att->atttypid == INT4OID)
+            {
+                values[i] = Int32GetDatum(line[i] - '0');
+            }
+            else if (att->atttypid == INT8OID)
+            {
+                values[i] = Int64GetDatum(line[i] - '0');
+            }
+            nulls[i] = false;
+            i++;
+        }
+    }
+    else if (cstate->cur_lineno == 2)
+    {
+        /* Soft error */
+        TupleDesc    tupDesc = RelationGetDescr(cstate->rel);
+        int            attnum = lfirst_int(list_head(cstate->attnumlist));
+        int            m = attnum - 1;
+        Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+        char        value[2];
+
+        cstate->cur_attname = NameStr(att->attname);
+        value[0] = line[0];
+        value[1] = '\0';
+        cstate->cur_attval = value;
+        errsave((Node *) cstate->escontext,
+                (
+                 errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+                 errmsg("invalid value: \"%c\"", line[0])));
+        CopyFromSkipErrorRow(cstate);
+        cstate->cur_attname = NULL;
+        cstate->cur_attval = NULL;
+        return true;
+    }
+    else
+    {
+        /* Hard error */
+        ereport(ERROR,
+                (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                 errmsg("too much lines: %llu",
+                        (unsigned long long) cstate->cur_lineno)));
+    }
+
+    return true;
 }
 
 static void
-- 
2.47.2
From e0240eec67a6a88d3daa2caa2bf48b45880a697b Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Tue, 18 Mar 2025 19:09:09 +0900
Subject: [PATCH v38 8/9] Use copy handlers for built-in formats
This adds copy handlers for text, csv and binary. We can simplify
Copy{To,From}GetRoutine() by this. We'll be able to remove
CopyFormatOptions::{binary,csv_mode} when we add more callbacks to
Copy{To,From}Routine and move format specific routines to
Copy{To,From}Routine::*.
---
 src/backend/commands/copy.c              | 48 ++++++++++++++++++------
 src/backend/commands/copyfrom.c          | 48 +++++++++++-------------
 src/backend/commands/copyto.c            | 48 +++++++++++-------------
 src/include/catalog/pg_proc.dat          | 11 ++++++
 src/include/commands/copyfrom_internal.h |  6 ++-
 src/include/commands/copyto_internal.h   |  6 ++-
 6 files changed, 102 insertions(+), 65 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index b4417bb6819..24bd2547e3b 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -22,7 +22,9 @@
 #include "access/table.h"
 #include "access/xact.h"
 #include "catalog/pg_authid.h"
-#include "commands/copy.h"
+#include "commands/copyapi.h"
+#include "commands/copyto_internal.h"
+#include "commands/copyfrom_internal.h"
 #include "commands/defrem.h"
 #include "executor/executor.h"
 #include "mb/pg_wchar.h"
@@ -500,24 +502,15 @@ ProcessCopyOptionFormat(ParseState *pstate,
 
     opts_out->csv_mode = false;
     opts_out->binary = false;
-    /* built-in formats */
-    if (strcmp(format, "text") == 0)
-    {
-        /* "csv_mode == false && binary == false" means "text" */
-        return;
-    }
-    else if (strcmp(format, "csv") == 0)
+    if (strcmp(format, "csv") == 0)
     {
         opts_out->csv_mode = true;
-        return;
     }
     else if (strcmp(format, "binary") == 0)
     {
         opts_out->binary = true;
-        return;
     }
 
-    /* custom format */
     funcargtypes[0] = INTERNALOID;
     handlerOid = LookupFuncName(list_make1(makeString(format)), 1,
                                 funcargtypes, true);
@@ -1067,3 +1060,36 @@ CopyGetAttnums(TupleDesc tupDesc, Relation rel, List *attnamelist)
 
     return attnums;
 }
+
+Datum
+copy_text_handler(PG_FUNCTION_ARGS)
+{
+    bool        is_from = PG_GETARG_BOOL(0);
+
+    if (is_from)
+        PG_RETURN_POINTER(&CopyFromRoutineText);
+    else
+        PG_RETURN_POINTER(&CopyToRoutineText);
+}
+
+Datum
+copy_csv_handler(PG_FUNCTION_ARGS)
+{
+    bool        is_from = PG_GETARG_BOOL(0);
+
+    if (is_from)
+        PG_RETURN_POINTER(&CopyFromRoutineCSV);
+    else
+        PG_RETURN_POINTER(&CopyToRoutineCSV);
+}
+
+Datum
+copy_binary_handler(PG_FUNCTION_ARGS)
+{
+    bool        is_from = PG_GETARG_BOOL(0);
+
+    if (is_from)
+        PG_RETURN_POINTER(&CopyFromRoutineBinary);
+    else
+        PG_RETURN_POINTER(&CopyToRoutineBinary);
+}
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 76662e04260..2677f2ac1bc 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -45,6 +45,7 @@
 #include "rewrite/rewriteHandler.h"
 #include "storage/fd.h"
 #include "tcop/tcopprot.h"
+#include "utils/fmgroids.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/portal.h"
@@ -128,7 +129,7 @@ static void CopyFromBinaryEnd(CopyFromState cstate);
  */
 
 /* text format */
-static const CopyFromRoutine CopyFromRoutineText = {
+const CopyFromRoutine CopyFromRoutineText = {
     .type = T_CopyFromRoutine,
     .CopyFromInFunc = CopyFromTextLikeInFunc,
     .CopyFromStart = CopyFromTextLikeStart,
@@ -137,7 +138,7 @@ static const CopyFromRoutine CopyFromRoutineText = {
 };
 
 /* CSV format */
-static const CopyFromRoutine CopyFromRoutineCSV = {
+const CopyFromRoutine CopyFromRoutineCSV = {
     .type = T_CopyFromRoutine,
     .CopyFromInFunc = CopyFromTextLikeInFunc,
     .CopyFromStart = CopyFromTextLikeStart,
@@ -146,7 +147,7 @@ static const CopyFromRoutine CopyFromRoutineCSV = {
 };
 
 /* binary format */
-static const CopyFromRoutine CopyFromRoutineBinary = {
+const CopyFromRoutine CopyFromRoutineBinary = {
     .type = T_CopyFromRoutine,
     .CopyFromInFunc = CopyFromBinaryInFunc,
     .CopyFromStart = CopyFromBinaryStart,
@@ -158,30 +159,25 @@ static const CopyFromRoutine CopyFromRoutineBinary = {
 static const CopyFromRoutine *
 CopyFromGetRoutine(const CopyFormatOptions *opts)
 {
-    if (OidIsValid(opts->handler))
-    {
-        Datum        datum;
-        Node       *routine;
+    Oid            handler = opts->handler;
+    Datum        datum;
+    Node       *routine;
 
-        datum = OidFunctionCall1(opts->handler, BoolGetDatum(true));
-        routine = (Node *) DatumGetPointer(datum);
-        if (routine == NULL || !IsA(routine, CopyFromRoutine))
-            ereport(
-                    ERROR,
-                    (errcode(
-                             ERRCODE_INVALID_PARAMETER_VALUE),
-                     errmsg("COPY handler function "
-                            "%u did not return "
-                            "CopyFromRoutine struct",
-                            opts->handler)));
-        return castNode(CopyFromRoutine, routine);
-    }
-    else if (opts->csv_mode)
-        return &CopyFromRoutineCSV;
-    else if (opts->binary)
-        return &CopyFromRoutineBinary;
-    else
-        return &CopyFromRoutineText;
+    if (!OidIsValid(handler))
+        handler = F_TEXT_INTERNAL;
+
+    datum = OidFunctionCall1(handler, BoolGetDatum(true));
+    routine = (Node *) DatumGetPointer(datum);
+    if (routine == NULL || !IsA(routine, CopyFromRoutine))
+        ereport(
+                ERROR,
+                (errcode(
+                         ERRCODE_INVALID_PARAMETER_VALUE),
+                 errmsg("COPY handler function "
+                        "%u did not return "
+                        "CopyFromRoutine struct",
+                        opts->handler)));
+    return castNode(CopyFromRoutine, routine);
 }
 
 /* Implementation of the start callback for text and CSV formats */
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index f5ed3efbace..757d24736e3 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -32,6 +32,7 @@
 #include "pgstat.h"
 #include "storage/fd.h"
 #include "tcop/tcopprot.h"
+#include "utils/fmgroids.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
@@ -89,7 +90,7 @@ static void CopySendInt16(CopyToState cstate, int16 val);
  */
 
 /* text format */
-static const CopyToRoutine CopyToRoutineText = {
+const CopyToRoutine CopyToRoutineText = {
     .type = T_CopyToRoutine,
     .CopyToStart = CopyToTextLikeStart,
     .CopyToOutFunc = CopyToTextLikeOutFunc,
@@ -98,7 +99,7 @@ static const CopyToRoutine CopyToRoutineText = {
 };
 
 /* CSV format */
-static const CopyToRoutine CopyToRoutineCSV = {
+const CopyToRoutine CopyToRoutineCSV = {
     .type = T_CopyToRoutine,
     .CopyToStart = CopyToTextLikeStart,
     .CopyToOutFunc = CopyToTextLikeOutFunc,
@@ -107,7 +108,7 @@ static const CopyToRoutine CopyToRoutineCSV = {
 };
 
 /* binary format */
-static const CopyToRoutine CopyToRoutineBinary = {
+const CopyToRoutine CopyToRoutineBinary = {
     .type = T_CopyToRoutine,
     .CopyToStart = CopyToBinaryStart,
     .CopyToOutFunc = CopyToBinaryOutFunc,
@@ -119,30 +120,25 @@ static const CopyToRoutine CopyToRoutineBinary = {
 static const CopyToRoutine *
 CopyToGetRoutine(const CopyFormatOptions *opts)
 {
-    if (OidIsValid(opts->handler))
-    {
-        Datum        datum;
-        Node       *routine;
+    Oid            handler = opts->handler;
+    Datum        datum;
+    Node       *routine;
 
-        datum = OidFunctionCall1(opts->handler, BoolGetDatum(false));
-        routine = (Node *) DatumGetPointer(datum);
-        if (routine == NULL || !IsA(routine, CopyToRoutine))
-            ereport(
-                    ERROR,
-                    (errcode(
-                             ERRCODE_INVALID_PARAMETER_VALUE),
-                     errmsg("COPY handler function "
-                            "%u did not return "
-                            "CopyToRoutine struct",
-                            opts->handler)));
-        return castNode(CopyToRoutine, routine);
-    }
-    else if (opts->csv_mode)
-        return &CopyToRoutineCSV;
-    else if (opts->binary)
-        return &CopyToRoutineBinary;
-    else
-        return &CopyToRoutineText;
+    if (!OidIsValid(handler))
+        handler = F_TEXT_INTERNAL;
+
+    datum = OidFunctionCall1(handler, BoolGetDatum(false));
+    routine = (Node *) DatumGetPointer(datum);
+    if (routine == NULL || !IsA(routine, CopyToRoutine))
+        ereport(
+                ERROR,
+                (errcode(
+                         ERRCODE_INVALID_PARAMETER_VALUE),
+                 errmsg("COPY handler function "
+                        "%u did not return "
+                        "CopyToRoutine struct",
+                        opts->handler)));
+    return castNode(CopyToRoutine, routine);
 }
 
 /* Implementation of the start callback for text and CSV formats */
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 7c2a510fa3f..0737eb73c9a 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12485,4 +12485,15 @@
   proargtypes => 'int4',
   prosrc => 'gist_stratnum_common' },
 
+# COPY handlers
+{ oid => '8100', descr => 'text COPY FORMAT handler',
+  proname => 'text', provolatile => 'i', prorettype => 'copy_handler',
+  proargtypes => 'internal', prosrc => 'copy_text_handler' },
+{ oid => '8101', descr => 'csv COPY FORMAT handler',
+  proname => 'csv', provolatile => 'i', prorettype => 'copy_handler',
+  proargtypes => 'internal', prosrc => 'copy_csv_handler' },
+{ oid => '8102', descr => 'binary COPY FORMAT handler',
+  proname => 'binary', provolatile => 'i', prorettype => 'copy_handler',
+  proargtypes => 'internal', prosrc => 'copy_binary_handler' },
+
 ]
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index af425cf5fd9..abeccf85c1c 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -14,7 +14,7 @@
 #ifndef COPYFROM_INTERNAL_H
 #define COPYFROM_INTERNAL_H
 
-#include "commands/copy.h"
+#include "commands/copyapi.h"
 #include "commands/trigger.h"
 #include "nodes/miscnodes.h"
 
@@ -197,4 +197,8 @@ extern bool CopyFromCSVOneRow(CopyFromState cstate, ExprContext *econtext,
 extern bool CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext,
                                  Datum *values, bool *nulls);
 
+extern PGDLLIMPORT const CopyFromRoutine CopyFromRoutineText;
+extern PGDLLIMPORT const CopyFromRoutine CopyFromRoutineCSV;
+extern PGDLLIMPORT const CopyFromRoutine CopyFromRoutineBinary;
+
 #endif                            /* COPYFROM_INTERNAL_H */
diff --git a/src/include/commands/copyto_internal.h b/src/include/commands/copyto_internal.h
index ce1c33a4004..85412660f7f 100644
--- a/src/include/commands/copyto_internal.h
+++ b/src/include/commands/copyto_internal.h
@@ -14,7 +14,7 @@
 #ifndef COPYTO_INTERNAL_H
 #define COPYTO_INTERNAL_H
 
-#include "commands/copy.h"
+#include "commands/copyapi.h"
 #include "executor/execdesc.h"
 #include "executor/tuptable.h"
 #include "nodes/execnodes.h"
@@ -83,4 +83,8 @@ typedef struct CopyToStateData
     void       *opaque;            /* private space */
 } CopyToStateData;
 
+extern PGDLLIMPORT const CopyToRoutine CopyToRoutineText;
+extern PGDLLIMPORT const CopyToRoutine CopyToRoutineCSV;
+extern PGDLLIMPORT const CopyToRoutine CopyToRoutineBinary;
+
 #endif                            /* COPYTO_INTERNAL_H */
-- 
2.47.2
From f7d63bff2f0870987e231c8a4dbfc54b61505792 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Wed, 19 Mar 2025 11:46:34 +0900
Subject: [PATCH v38 9/9] Add document how to write a COPY handler
This is WIP because we haven't decided our API yet.
Co-authored-by: David G. Johnston <david.g.johnston@gmail.com>
---
 doc/src/sgml/copy-handler.sgml | 394 +++++++++++++++++++++++++++++++++
 doc/src/sgml/filelist.sgml     |   1 +
 doc/src/sgml/postgres.sgml     |   1 +
 src/include/commands/copyapi.h |   9 +-
 4 files changed, 401 insertions(+), 4 deletions(-)
 create mode 100644 doc/src/sgml/copy-handler.sgml
diff --git a/doc/src/sgml/copy-handler.sgml b/doc/src/sgml/copy-handler.sgml
new file mode 100644
index 00000000000..5bc87d16662
--- /dev/null
+++ b/doc/src/sgml/copy-handler.sgml
@@ -0,0 +1,394 @@
+<!-- doc/src/sgml/copy-handler.sgml -->
+
+<chapter id="copy-handler">
+ <title>Writing a Copy Handler</title>
+
+ <indexterm zone="copy-handler">
+  <primary><literal>COPY</literal> handler</primary>
+ </indexterm>
+
+ <para>
+  <productname>PostgreSQL</productname> supports
+  custom <link linkend="sql-copy"><literal>COPY</literal></link> handlers;
+  adding additional <replaceable>format_name</replaceable> options to
+  the <literal>FORMAT</literal> clause.
+ </para>
+
+ <para>
+  At the SQL level, a copy handler method is represented by a single SQL
+  function (see <xref linkend="sql-createfunction"/>), typically implemented in
+  C, having the signature
+<synopsis>
+<replaceable>format_name</replaceable>(internal) RETURNS <literal>copy_handler</literal>
+</synopsis>
+  The function's name is then accepted as a
+  valid <replaceable>format_name</replaceable>. The return
+  pseudo-type <literal>copy_handler</literal> informs the system that this
+  function needs to be registered as a copy handler.
+  The <type>internal</type> argument is a dummy that prevents this function
+  from being called directly from an SQL command. As the handler
+  implementation must be server-lifetime immutable; this SQL function's
+  volatility should be marked immutable. The <literal>link_symbol</literal>
+  for this function is the name of the implementation function, described
+  next.
+ </para>
+
+ <para>
+  The implementation function signature expected for the function named
+  in the <literal>link_symbol</literal> is:
+<synopsis>
+Datum
+<replaceable>copy_format_handler</replaceable>(PG_FUNCTION_ARGS)
+</synopsis>
+  The convention for the name is to replace the word
+  <replaceable>format</replaceable> in the placeholder above with the value given
+  to <replaceable>format_name</replaceable> in the SQL function.
+  The first argument is a <type>boolean</type> that indicates whether the handler
+  must provide a pointer to its implementation for <literal>COPY FROM</literal>
+  (a <type>CopyFromRoutine *</type>). If <literal>false</literal>, the handler
+  must provide a pointer to its implementation of <literal>COPY TO</literal>
+  (a <type>CopyToRoutine *</type>). These structs are declared in
+  <filename>src/include/commands/copyapi.h</filename>.
+ </para>
+
+ <para>
+  The structs hold pointers to implementation functions for initializing,
+  starting, processing rows, and ending a copy operation. The specific
+  structures vary a bit between <literal>COPY FROM</literal> and
+  <literal>COPY TO</literal> so the next two sections describes each
+  in detail.
+ </para>
+
+ <sect1 id="copy-handler-from">
+  <title>Copy From Handler</title>
+
+  <para>
+   The opening to this chapter describes how the executor will call the main
+   handler function with, in this case,
+   a <type>boolean</type> <literal>true</literal>, and expect to receive a
+   <type>CopyFromRoutine *</type> <type>Datum</type>. This section describes
+   the components of the <type>CopyFromRoutine</type> struct.
+  </para>
+
+  <para>
+<programlisting>
+void
+CopyFromInFunc(CopyFromState cstate,
+               Oid atttypid,
+               FmgrInfo *finfo,
+               Oid *typioparam);
+</programlisting>
+
+   This sets input function information for the
+   given <literal>atttypid</literal> attribute. This function is called once
+   at the beginning of <literal>COPY FROM</literal>. If
+   this <literal>COPY</literal> handler doesn't use any input functions, this
+   function doesn't need to do anything.
+
+   <variablelist>
+    <varlistentry>
+     <term><literal>CopyFromState *cstate</literal></term>
+     <listitem>
+      <para>
+       This is an internal struct that contains all the state variables used
+       throughout a <literal>COPY FROM</literal> operation.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><literal>Oid atttypid</literal></term>
+     <listitem>
+      <para>
+       This is the OID of data type used by the relation's attribute.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><literal>FmgrInfo *finfo</literal></term>
+     <listitem>
+      <para>
+       This can be optionally filled to provide the catalog information of
+       the input function.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><literal>Oid *typioparam</literal></term>
+     <listitem>
+      <para>
+       This can be optionally filled to define the OID of the type to
+       pass to the input function.
+      </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+  </para>
+
+  <para>
+<programlisting>
+void
+CopyFromStart(CopyFromState cstate,
+              TupleDesc tupDesc);
+</programlisting>
+
+   This starts a <literal>COPY FROM</literal>. This function is called once at
+   the beginning of <literal>COPY FROM</literal>.
+
+   <variablelist>
+    <varlistentry>
+     <term><literal>CopyFromState *cstate</literal></term>
+     <listitem>
+      <para>
+       This is an internal struct that contains all the state variables used
+       throughout a <literal>COPY FROM</literal> operation.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><literal>TupleDesc tupDesc</literal></term>
+     <listitem>
+      <para>
+       This is the tuple descriptor of the relation where the data needs to be
+       copied. This can be used for any initialization steps required by a
+       format.
+      </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+  </para>
+
+  <para>
+<programlisting>
+bool
+CopyFromOneRow(CopyFromState cstate,
+               ExprContext *econtext,
+               Datum *values,
+               bool *nulls);
+</programlisting>
+
+   This reads one row from the source and fill <literal>values</literal>
+   and <literal>nulls</literal>. If there is one or more tuples to be read,
+   this must return <literal>true</literal>. If there are no more tuples to
+   read, this must return <literal>false</literal>.
+
+   <variablelist>
+    <varlistentry>
+     <term><literal>CopyFromState *cstate</literal></term>
+     <listitem>
+      <para>
+       This is an internal struct that contains all the state variables used
+       throughout a <literal>COPY FROM</literal> operation.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><literal>ExprContext *econtext</literal></term>
+     <listitem>
+      <para>
+       This is used to evaluate default expression for each column that is
+       either not read from the file or is using
+       the <literal>DEFAULT</literal> option of <literal>COPY
+       FROM</literal>. It is <literal>NULL</literal> if no default values are
+       used.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><literal>Datum *values</literal></term>
+     <listitem>
+      <para>
+       This is an output variable to store read tuples.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><literal>bool *nulls</literal></term>
+     <listitem>
+      <para>
+       This is an output variable to store whether the read columns
+       are <literal>NULL</literal> or not.
+      </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+  </para>
+
+  <para>
+<programlisting>
+void
+CopyFromEnd(CopyFromState cstate);
+</programlisting>
+
+   This ends a <literal>COPY FROM</literal>. This function is called once at
+   the end of <literal>COPY FROM</literal>.
+
+   <variablelist>
+    <varlistentry>
+     <term><literal>CopyFromState *cstate</literal></term>
+     <listitem>
+      <para>
+       This is an internal struct that contains all the state variables used
+       throughout a <literal>COPY FROM</literal> operation.
+      </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+  </para>
+
+  <para>
+   TODO: Add CopyFromStateGetData() and CopyFromSkipErrowRow()?
+  </para>
+ </sect1>
+
+ <sect1 id="copy-handler-to">
+  <title>Copy To Handler</title>
+
+  <para>
+   The <literal>COPY</literal> handler function for <literal>COPY
+   TO</literal> returns a <type>CopyToRoutine</type> struct containing
+   pointers to the functions described below. All functions are required.
+  </para>
+
+  <para>
+<programlisting>
+void
+CopyToOutFunc(CopyToState cstate,
+              Oid atttypid,
+              FmgrInfo *finfo);
+</programlisting>
+
+   This sets output function information for the
+   given <literal>atttypid</literal> attribute. This function is called once
+   at the beginning of <literal>COPY TO</literal>. If
+   this <literal>COPY</literal> handler doesn't use any output functions, this
+   function doesn't need to do anything.
+
+   <variablelist>
+    <varlistentry>
+     <term><literal>CopyToState *cstate</literal></term>
+     <listitem>
+      <para>
+       This is an internal struct that contains all the state variables used
+       throughout a <literal>COPY TO</literal> operation.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><literal>Oid atttypid</literal></term>
+     <listitem>
+      <para>
+       This is the OID of data type used by the relation's attribute.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><literal>FmgrInfo *finfo</literal></term>
+     <listitem>
+      <para>
+       This can be optionally filled to provide the catalog information of
+       the output function.
+      </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+  </para>
+
+  <para>
+<programlisting>
+void
+CopyToStart(CopyToState cstate,
+            TupleDesc tupDesc);
+</programlisting>
+
+   This starts a <literal>COPY TO</literal>. This function is called once at
+   the beginning of <literal>COPY TO</literal>.
+
+   <variablelist>
+    <varlistentry>
+     <term><literal>CopyToState *cstate</literal></term>
+     <listitem>
+      <para>
+       This is an internal struct that contains all the state variables used
+       throughout a <literal>COPY TO</literal> operation.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><literal>TupleDesc tupDesc</literal></term>
+     <listitem>
+      <para>
+       This is the tuple descriptor of the relation where the data is read.
+      </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+  </para>
+
+  <para>
+<programlisting>
+bool
+CopyToOneRow(CopyToState cstate,
+             TupleTableSlot *slot);
+</programlisting>
+
+   This writes one row stored in <literal>slot</literal> to the destination.
+
+   <variablelist>
+    <varlistentry>
+     <term><literal>CopyToState *cstate</literal></term>
+     <listitem>
+      <para>
+       This is an internal struct that contains all the state variables used
+       throughout a <literal>COPY TO</literal> operation.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><literal>TupleTableSlot *slot</literal></term>
+     <listitem>
+      <para>
+       This is used to get row to be written.
+      </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+  </para>
+
+  <para>
+<programlisting>
+void
+CopyToEnd(CopyToState cstate);
+</programlisting>
+
+   This ends a <literal>COPY TO</literal>. This function is called once at
+   the end of <literal>COPY TO</literal>.
+
+   <variablelist>
+    <varlistentry>
+     <term><literal>CopyToState *cstate</literal></term>
+     <listitem>
+      <para>
+       This is an internal struct that contains all the state variables used
+       throughout a <literal>COPY TO</literal> operation.
+      </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+  </para>
+
+  <para>
+   TODO: Add CopyToStateFlush()?
+  </para>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 25fb99cee69..1fd6d32d5ec 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -107,6 +107,7 @@
 <!ENTITY storage    SYSTEM "storage.sgml">
 <!ENTITY transaction     SYSTEM "xact.sgml">
 <!ENTITY tablesample-method SYSTEM "tablesample-method.sgml">
+<!ENTITY copy-handler SYSTEM "copy-handler.sgml">
 <!ENTITY wal-for-extensions SYSTEM "wal-for-extensions.sgml">
 <!ENTITY generic-wal SYSTEM "generic-wal.sgml">
 <!ENTITY custom-rmgr SYSTEM "custom-rmgr.sgml">
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index af476c82fcc..8ba319ae2df 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -254,6 +254,7 @@ break is not needed in a wider output rendering.
   &plhandler;
   &fdwhandler;
   &tablesample-method;
+  ©-handler;
   &custom-scan;
   &geqo;
   &tableam;
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 500ece7d5bb..24710cb667a 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -28,10 +28,10 @@ typedef struct CopyToRoutine
      * Set output function information. This callback is called once at the
      * beginning of COPY TO.
      *
+     * 'atttypid' is the OID of data type used by the relation's attribute.
+     *
      * 'finfo' can be optionally filled to provide the catalog information of
      * the output function.
-     *
-     * 'atttypid' is the OID of data type used by the relation's attribute.
      */
     void        (*CopyToOutFunc) (CopyToState cstate, Oid atttypid,
                                   FmgrInfo *finfo);
@@ -70,12 +70,13 @@ typedef struct CopyFromRoutine
      * Set input function information. This callback is called once at the
      * beginning of COPY FROM.
      *
+     * 'atttypid' is the OID of data type used by the relation's attribute.
+     *
      * 'finfo' can be optionally filled to provide the catalog information of
      * the input function.
      *
      * 'typioparam' can be optionally filled to define the OID of the type to
-     * pass to the input function.'atttypid' is the OID of data type used by
-     * the relation's attribute.
+     * pass to the input function.
      */
     void        (*CopyFromInFunc) (CopyFromState cstate, Oid atttypid,
                                    FmgrInfo *finfo, Oid *typioparam);
-- 
2.47.2
			
		Re: Make COPY format extendable: Extract COPY TO format implementations
От
 
		    	"David G. Johnston"
		    Дата:
		        On Tue, Mar 18, 2025 at 7:56 PM Sutou Kouhei <kou@clear-code.com> wrote:
Hi,
In <CAD21AoDU=bYRDDY8MzCXAfg4h9XTeTBdM-wVJaO1t4UcseCpuA@mail.gmail.com>
"Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 17 Mar 2025 13:50:03 -0700,
Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> I think that built-in formats also need to have their handler
> functions. This seems to be a conventional way for customizable
> features such as tablesample and access methods, and we can simplify
> this function.
OK. 0008 in the attached v37 patch set does it.
tl/dr;
We need to exclude from our SQL function search any function that doesn't declare copy_handler as its return type.
("function text must return type copy_handler" is not an acceptable error message)
We need to accept identifiers in FORMAT and parse the optional catalog, schema, and object name portions.
(Restrict our function search to the named schema if provided.)
Detail:
Fun thing...(not sure how much of this is covered above: I do see, but didn't scour, the security discussion):
-- create some poison
create function public.text(internal) returns boolean language c as '/home/davidj/gotya/gotya', 'gotit';
CREATE FUNCTION
-- inject it
postgres=# set search_path to public,pg_catalog;
SET
postgres=# set search_path to public,pg_catalog;
SET
-- watch it die
postgres=# copy (select 1) to stdout (format text);
ERROR: function text must return type copy_handler
LINE 1: copy (select 1) to stdout (format text);
postgres=# copy (select 1) to stdout (format text);
ERROR: function text must return type copy_handler
LINE 1: copy (select 1) to stdout (format text);
I'm especially concerned about extensions here.
We shouldn't be locating any SQL function that doesn't have a copy_handler return type.  Unfortunately, LookupFuncName seems incapable of doing what we want here.  I suggest we create a new lookup routine where we can specify the return argument type as a required element.  That would cleanly mitigate the denial-of-service attack/accident vector demonstrated above (the text returning function should have zero impact on how this feature behaves).  If someone does create a handler SQL function without using copy_handler return type we'd end up showing "COPY format 'david' not recognized" - a developer should be able to figure out they didn't put a correct return type on their handler function and that is why the system did not register it.
A second concern is simply people wanting to name things the same; or, why namespaces were invented.
Can we just accept a proper identifier after FORMAT so we can use schema-qualified names?
(FORMAT "davescopyformat"."david")
We can special case the internal schema-less names and internally force pg_catalog to avoid them being shadowed.
David J.
On Fri, Mar 21, 2025 at 5:32 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:
>
> On Tue, Mar 18, 2025 at 7:56 PM Sutou Kouhei <kou@clear-code.com> wrote:
>>
>> Hi,
>>
>> In <CAD21AoDU=bYRDDY8MzCXAfg4h9XTeTBdM-wVJaO1t4UcseCpuA@mail.gmail.com>
>>   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 17 Mar 2025 13:50:03 -0700,
>>   Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>
>> > I think that built-in formats also need to have their handler
>> > functions. This seems to be a conventional way for customizable
>> > features such as tablesample and access methods, and we can simplify
>> > this function.
>>
>> OK. 0008 in the attached v37 patch set does it.
>>
>
> tl/dr;
>
> We need to exclude from our SQL function search any function that doesn't declare copy_handler as its return type.
> ("function text must return type copy_handler" is not an acceptable error message)
>
> We need to accept identifiers in FORMAT and parse the optional catalog, schema, and object name portions.
> (Restrict our function search to the named schema if provided.)
>
> Detail:
>
> Fun thing...(not sure how much of this is covered above: I do see, but didn't scour, the security discussion):
>
> -- create some poison
> create function public.text(internal) returns boolean language c as '/home/davidj/gotya/gotya', 'gotit';
> CREATE FUNCTION
>
> -- inject it
> postgres=# set search_path to public,pg_catalog;
> SET
>
> -- watch it die
> postgres=# copy (select 1) to stdout (format text);
> ERROR:  function text must return type copy_handler
> LINE 1: copy (select 1) to stdout (format text);
>
> I'm especially concerned about extensions here.
>
> We shouldn't be locating any SQL function that doesn't have a copy_handler return type.  Unfortunately,
LookupFuncNameseems incapable of doing what we want here.  I suggest we create a new lookup routine where we can
specifythe return argument type as a required element.  That would cleanly mitigate the denial-of-service
attack/accidentvector demonstrated above (the text returning function should have zero impact on how this feature
behaves). If someone does create a handler SQL function without using copy_handler return type we'd end up showing
"COPYformat 'david' not recognized" - a developer should be able to figure out they didn't put a correct return type on
theirhandler function and that is why the system did not register it. 
Just to be clear, the patch checks the function's return type before calling it:
       funcargtypes[0] = INTERNALOID;
       handlerOid = LookupFuncName(list_make1(makeString(format)), 1,
funcargtypes, true);
       if (!OidIsValid(handlerOid))
               ereport(ERROR,
                               (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                                errmsg("COPY format \"%s\" not
recognized", format),
                                parser_errposition(pstate, defel->location)));
       /* check that handler has correct return type */
       if (get_func_rettype(handlerOid) != COPY_HANDLEROID)
               ereport(ERROR,
                               (errcode(ERRCODE_WRONG_OBJECT_TYPE),
                                errmsg("function %s must return type %s",
                                               format, "copy_handler"),
                                parser_errposition(pstate, defel->location)));
So would changing the error message to like "COPY format 'text' not
recognized" untangle your concern?
FYI the same is true for TABLESAMPLE; it invokes a function with the
specified method name and checks the returned Node type:
=# select * from pg_class tablesample text (0);
ERROR:  function text must return type tsm_handler
A difference between TABLESAMPLE and COPY format is that the former
accepts a qualified name but the latter doesn't:
=# create extension tsm_system_rows ;
=# create schema s1;
=# create function s1.system_rows(internal) returns void language c as
'tsm_system_rows.so', 'tsm_system_rows_handler';
=# \df *.system_rows
                          List of functions
 Schema |    Name     | Result data type | Argument data types | Type
--------+-------------+------------------+---------------------+------
 public | system_rows | tsm_handler      | internal            | func
 s1     | system_rows | void             | internal            | func
(2 rows)
postgres(1:1194923)=# select count(*) from pg_class tablesample system_rows(0);
 count
-------
     0
(1 row)
postgres(1:1194923)=# select count(*) from pg_class tablesample
s1.system_rows(0);
ERROR:  function s1.system_rows must return type tsm_handler
> A second concern is simply people wanting to name things the same; or, why namespaces were invented.
Yeah, I think that the custom COPY format should support qualified
names at least.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
			
		Re: Make COPY format extendable: Extract COPY TO format implementations
От
 
		    	"David G. Johnston"
		    Дата:
		        
			
				On Friday, March 21, 2025, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
 
			
		
		
	On Fri, Mar 21, 2025 at 5:32 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:
>
> On Tue, Mar 18, 2025 at 7:56 PM Sutou Kouhei <kou@clear-code.com> wrote:
>>
>> Hi,
>>
>> In <CAD21AoDU=bYRDDY8MzCXAfg4h9XTeTBdM-wVJaO1t4UcseCpuA@mail. gmail.com> 
>> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 17 Mar 2025 13:50:03 -0700,
>> Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>
>> > I think that built-in formats also need to have their handler
>> > functions. This seems to be a conventional way for customizable
>> > features such as tablesample and access methods, and we can simplify
>> > this function.
>>
>> OK. 0008 in the attached v37 patch set does it.
>>
>
> tl/dr;
>
> We need to exclude from our SQL function search any function that doesn't declare copy_handler as its return type.
> ("function text must return type copy_handler" is not an acceptable error message)
>
> We need to accept identifiers in FORMAT and parse the optional catalog, schema, and object name portions.
> (Restrict our function search to the named schema if provided.)
>
> Detail:
>
> Fun thing...(not sure how much of this is covered above: I do see, but didn't scour, the security discussion):
>
> -- create some poison
> create function public.text(internal) returns boolean language c as '/home/davidj/gotya/gotya', 'gotit';
> CREATE FUNCTION
>
> -- inject it
> postgres=# set search_path to public,pg_catalog;
> SET
>
> -- watch it die
> postgres=# copy (select 1) to stdout (format text);
> ERROR: function text must return type copy_handler
> LINE 1: copy (select 1) to stdout (format text);
>
> I'm especially concerned about extensions here.
>
> We shouldn't be locating any SQL function that doesn't have a copy_handler return type. Unfortunately, LookupFuncName seems incapable of doing what we want here. I suggest we create a new lookup routine where we can specify the return argument type as a required element. That would cleanly mitigate the denial-of-service attack/accident vector demonstrated above (the text returning function should have zero impact on how this feature behaves). If someone does create a handler SQL function without using copy_handler return type we'd end up showing "COPY format 'david' not recognized" - a developer should be able to figure out they didn't put a correct return type on their handler function and that is why the system did not register it.
Just to be clear, the patch checks the function's return type before calling it:
funcargtypes[0] = INTERNALOID;
handlerOid = LookupFuncName(list_make1(makeString(format)), 1, 
funcargtypes, true);
if (!OidIsValid(handlerOid))
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE), 
errmsg("COPY format \"%s\" not
recognized", format),
parser_errposition(pstate, defel->location)));
/* check that handler has correct return type */
if (get_func_rettype(handlerOid) != COPY_HANDLEROID)
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE), 
errmsg("function %s must return type %s",
format, "copy_handler"),
parser_errposition(pstate, defel->location)));
So would changing the error message to like "COPY format 'text' not
recognized" untangle your concern?
In my example above copy should not fail at all.  The text function created in public that returns Boolean would never be seen and the real one in pg_catalog would then be found and behave as expected.
FYI the same is true for TABLESAMPLE; it invokes a function with the
specified method name and checks the returned Node type:
=# select * from pg_class tablesample text (0);
ERROR: function text must return type tsm_handler
Then this would benefit from the new function I suggest creating since it apparently has the same, IMO, bug.
David J.
Re: Make COPY format extendable: Extract COPY TO format implementations
От
 
		    	"David G. Johnston"
		    Дата:
		        On Fri, Mar 21, 2025 at 10:23 PM David G. Johnston <david.g.johnston@gmail.com> wrote:
Then this would benefit from the new function I suggest creating since it apparently has the same, IMO, bug.
Concretely like I posted here:
David J.
On Wed, Mar 19, 2025 at 6:25 PM Sutou Kouhei <kou@clear-code.com> wrote:
>
> Hi,
>
> In <CAKFQuwaMAFMHqxDXR=SxA0mDjdmntrwxZd2w=nSruLNFH-OzLw@mail.gmail.com>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Wed, 19 Mar 2025 17:49:49 -0700,
>   "David G. Johnston" <david.g.johnston@gmail.com> wrote:
>
> >> And could someone help (take over if possible) writing a
> >> document for this feature? I'm not good at writing a
> >> document in English... 0009 in the attached v37 patch set
> >> has a draft of it. It's based on existing documents in
> >> doc/src/sgml/ and *.h.
> >>
> >>
> > I haven't touched the innards of the structs aside from changing
> > programlisting to synopsis.  And redoing the two section opening paragraphs
> > to better integrate with the content in the chapter opening.
> >
> > The rest I kinda went to town on...
>
> Thanks!!! It's very helpful!!!
>
> I've applied your patch. 0009 is only changed.
Thank you for updating the patches. I've reviewed the main part of
supporting the custom COPY format. Here are some random comments:
---
+/*
+ * Process the "format" option.
+ *
+ * This function checks whether the option value is a built-in format such as
+ * "text" and "csv" or not. If the option value isn't a built-in format, this
+ * function finds a COPY format handler that returns a CopyToRoutine (for
+ * is_from == false) or CopyFromRountine (for is_from == true). If no COPY
+ * format handler is found, this function reports an error.
+ */
I think this comment needs to be updated as the part "If the option
value isn't ..." is no longer true.
I think we don't necessarily need to create a separate function
ProcessCopyOptionFormat for processing the format option.
We need more regression tests for handling the given format name. For example,
- more various input patterns.
- a function with the specified format name exists but it returns an
unexpected Node.
- looking for a handler function in a different namespace.
etc.
---
I think that we should accept qualified names too as the format name
like tablesample does. That way, different extensions implementing the
same format can be used.
---
+        if (routine == NULL || !IsA(routine, CopyFromRoutine))
+                ereport(
+                                ERROR,
+                                (errcode(
+
ERRCODE_INVALID_PARAMETER_VALUE),
+                                 errmsg("COPY handler function "
+                                                "%u did not return "
+                                                "CopyFromRoutine struct",
+                                                opts->handler)));
It's not conventional to put a new line between 'ereport(' and 'ERROR'
(similarly between 'errcode(' and 'ERRCODE_...'. Also, we don't need
to split the error message into multiple lines as it's not long.
---
+        if (routine == NULL || !IsA(routine, CopyToRoutine))
+                ereport(
+                                ERROR,
+                                (errcode(
+
ERRCODE_INVALID_PARAMETER_VALUE),
+                                 errmsg("COPY handler function "
+                                                "%u did not return "
+                                                "CopyToRoutine struct",
+                                                opts->handler)));
Same as the above comment.
---
+  descr => 'pseudo-type for the result of a copy to/from method function',
s/method function/format function/
---
+        Oid                    handler;                /* handler
function for custom format routine */
'handler' is used also for built-in formats.
---
+static void
+CopyFromInFunc(CopyFromState cstate, Oid atttypid,
+                           FmgrInfo *finfo, Oid *typioparam)
+{
+        ereport(NOTICE, (errmsg("CopyFromInFunc: atttypid=%d", atttypid)));
+}
OIDs could be changed across major versions even for built-in types. I
think it's better to avoid using it for tests.
---
+static void
+CopyToOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+        ereport(NOTICE, (errmsg("CopyToOneRow: tts_nvalid=%u",
slot->tts_nvalid)));
+}
Similar to the above comment, the field name 'tts_nvalid' might also
be changed in the future, let's use another name.
---
+static const CopyFromRoutine CopyFromRoutineTestCopyFormat = {
+        .type = T_CopyFromRoutine,
+        .CopyFromInFunc = CopyFromInFunc,
+        .CopyFromStart = CopyFromStart,
+        .CopyFromOneRow = CopyFromOneRow,
+        .CopyFromEnd = CopyFromEnd,
+};
I'd suggest not using the same function names as the fields.
---
+/*
+ * Export CopySendEndOfRow() for extensions. We want to keep
+ * CopySendEndOfRow() as a static function for
+ * optimization. CopySendEndOfRow() calls in this file may be optimized by a
+ * compiler.
+ */
+void
+CopyToStateFlush(CopyToState cstate)
+{
+        CopySendEndOfRow(cstate);
+}
Is there any reason to use a different name for public functions?
---
+/*
+ * Export CopyGetData() for extensions. We want to keep CopyGetData() as a
+ * static function for optimization. CopyGetData() calls in this file may be
+ * optimized by a compiler.
+ */
+int
+CopyFromStateGetData(CopyFromState cstate, void *dest, int minread,
int maxread)
+{
+        return CopyGetData(cstate, dest, minread, maxread);
+}
+
The same as the above comment.
---
+        /* For custom format implementation */
+        void      *opaque;                     /* private space */
How about renaming 'private'?
---
I've not reviewed the documentation patch yet but I think the patch
seems to miss the updates to the description of the FORMAT option in
the COPY command section.
---
I think we can reorganize the patch set as follows:
1. Create copyto_internal.h and change COPY_XXX to COPY_SOURCE_XXX and
COPY_DEST_XXX accordingly.
2. Support custom format for both COPY TO and COPY FROM.
3. Expose necessary helper functions such as CopySendEndOfRow().
4. Add CopyFromSkipErrorRow().
5. Documentation.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
			
		On Wed, Mar 19, 2025 at 6:25 PM Sutou Kouhei <kou@clear-code.com> wrote: > > Hi, > > In <CAKFQuwaMAFMHqxDXR=SxA0mDjdmntrwxZd2w=nSruLNFH-OzLw@mail.gmail.com> > "Re: Make COPY format extendable: Extract COPY TO format implementations" on Wed, 19 Mar 2025 17:49:49 -0700, > "David G. Johnston" <david.g.johnston@gmail.com> wrote: > > >> And could someone help (take over if possible) writing a > >> document for this feature? I'm not good at writing a > >> document in English... 0009 in the attached v37 patch set > >> has a draft of it. It's based on existing documents in > >> doc/src/sgml/ and *.h. > >> > >> > > I haven't touched the innards of the structs aside from changing > > programlisting to synopsis. And redoing the two section opening paragraphs > > to better integrate with the content in the chapter opening. > > > > The rest I kinda went to town on... > > Thanks!!! It's very helpful!!! > > I've applied your patch. 0009 is only changed. FYI I've implemented an extension to add JSON Lines format as a custom COPY format[1] to check the usability of the COPY format APIs. I think that the exposed APIs are fairly simple and minimum. I didn't find the deficiency and excess of exposed APIs for helping extensions but I find that it would be better to describe what the one-row callback should do to utilize the abstracted destination. For example, in order to use CopyToStateFlush() to write out to the destination, extensions should write the data to cstate->fe_msgbuf. We expose CopyToStateFlush() but not for any functions to write data there such as CopySendString(). It was a bit inconvenient to me but I managed to write the data directly there by #include'ing copyto_internal.h. Regards, [1] https://github.com/MasahikoSawada/pg_copy_jsonlines -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Hi,
In <CAD21AoAfWrjpTDJ0garVUoXY0WC3Ud4Cu51q+ccWiotm1uo_2A@mail.gmail.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Sun, 23 Mar 2025 02:01:59 -0700,
  Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> ---
> +/*
> + * Process the "format" option.
> + *
> + * This function checks whether the option value is a built-in format such as
> + * "text" and "csv" or not. If the option value isn't a built-in format, this
> + * function finds a COPY format handler that returns a CopyToRoutine (for
> + * is_from == false) or CopyFromRountine (for is_from == true). If no COPY
> + * format handler is found, this function reports an error.
> + */
> 
> I think this comment needs to be updated as the part "If the option
> value isn't ..." is no longer true.
> 
> I think we don't necessarily need to create a separate function
> ProcessCopyOptionFormat for processing the format option.
Hmm. I think that this separated function will increase
readability by reducing indentation. But I've removed the
separation as you suggested. So the comment is also removed
entirely.
0002 includes this.
> We need more regression tests for handling the given format name. For example,
> 
> - more various input patterns.
> - a function with the specified format name exists but it returns an
> unexpected Node.
> - looking for a handler function in a different namespace.
> etc.
I've added the following tests:
* Wrong input type handler without namespace
* Wrong input type handler with namespace
* Wrong return type handler without namespace
* Wrong return type handler with namespace
* Wrong return value (Copy*Routine isn't returned) handler without namespace
* Wrong return value (Copy*Routine isn't returned) handler with namespace
* Nonexistent handler
* Invalid qualified name
* Valid handler without namespace and without search_path
* Valid handler without namespace and with search_path
* Valid handler with namespace
0002 also includes this.
> I think that we should accept qualified names too as the format name
> like tablesample does. That way, different extensions implementing the
> same format can be used.
Implemented. It's implemented after parsing SQL. Is it OK?
(It seems that tablesample does it in parsing SQL.)
Because "WITH (FORMAT XXX)" is processed as a generic option
in gram.y. All generic options are processed as strings. So
I keep this.
Syntax is "COPY ... WITH (FORMAT 'NAMESPACE.HANDLER_NAME')"
not "COPY ... WITH (FORMAT 'NAMESPACE'.'HANDLER_NAME')"
because of this choice.
0002 also includes this.
> ---
> +        if (routine == NULL || !IsA(routine, CopyFromRoutine))
> +                ereport(
> +                                ERROR,
> +                                (errcode(
> +
> ERRCODE_INVALID_PARAMETER_VALUE),
> +                                 errmsg("COPY handler function "
> +                                                "%u did not return "
> +                                                "CopyFromRoutine struct",
> +                                                opts->handler)));
> 
> It's not conventional to put a new line between 'ereport(' and 'ERROR'
> (similarly between 'errcode(' and 'ERRCODE_...'. Also, we don't need
> to split the error message into multiple lines as it's not long.
Oh, sorry. I can't remember why I used this... I think I
trusted pgindent...
> ---
> +  descr => 'pseudo-type for the result of a copy to/from method function',
> 
> s/method function/format function/
Good catch. I used "handler function" not "format function"
because we use "handler" in other places.
> ---
> +        Oid                    handler;                /* handler
> function for custom format routine */
> 
> 'handler' is used also for built-in formats.
Updated in 0004.
> ---
> +static void
> +CopyFromInFunc(CopyFromState cstate, Oid atttypid,
> +                           FmgrInfo *finfo, Oid *typioparam)
> +{
> +        ereport(NOTICE, (errmsg("CopyFromInFunc: atttypid=%d", atttypid)));
> +}
> 
> OIDs could be changed across major versions even for built-in types. I
> think it's better to avoid using it for tests.
Oh, I didn't know it. I've changed to use type name instead
of OID. It'll be more stable than OID.
> ---
> +static void
> +CopyToOneRow(CopyToState cstate, TupleTableSlot *slot)
> +{
> +        ereport(NOTICE, (errmsg("CopyToOneRow: tts_nvalid=%u",
> slot->tts_nvalid)));
> +}
> 
> Similar to the above comment, the field name 'tts_nvalid' might also
> be changed in the future, let's use another name.
Hmm. If the field name is changed, we need to change this
code. So changing tests too isn't strange. Anyway, I used
more generic text.
> ---
> +static const CopyFromRoutine CopyFromRoutineTestCopyFormat = {
> +        .type = T_CopyFromRoutine,
> +        .CopyFromInFunc = CopyFromInFunc,
> +        .CopyFromStart = CopyFromStart,
> +        .CopyFromOneRow = CopyFromOneRow,
> +        .CopyFromEnd = CopyFromEnd,
> +};
> 
> I'd suggest not using the same function names as the fields.
OK. I've added "Test" prefix.
> ---
> +/*
> + * Export CopySendEndOfRow() for extensions. We want to keep
> + * CopySendEndOfRow() as a static function for
> + * optimization. CopySendEndOfRow() calls in this file may be optimized by a
> + * compiler.
> + */
> +void
> +CopyToStateFlush(CopyToState cstate)
> +{
> +        CopySendEndOfRow(cstate);
> +}
> 
> Is there any reason to use a different name for public functions?
In this patch set, I use "CopyFrom"/"CopyTo" prefixes for
public APIs for custom COPY FORMAT handler extensions. It
will help understanding related APIs. Is it strange in
PostgreSQL?
> ---
> +        /* For custom format implementation */
> +        void      *opaque;                     /* private space */
> 
> How about renaming 'private'?
We should not use "private" because it's a keyword in
C++. If we use "private" here, we can't include this file
from C++ code.
> ---
> I've not reviewed the documentation patch yet but I think the patch
> seems to miss the updates to the description of the FORMAT option in
> the COPY command section.
I defer this for now. We can revisit the last documentation
patch after we finalize our API. (Or could someone help us?)
> I think we can reorganize the patch set as follows:
> 
> 1. Create copyto_internal.h and change COPY_XXX to COPY_SOURCE_XXX and
> COPY_DEST_XXX accordingly.
> 2. Support custom format for both COPY TO and COPY FROM.
> 3. Expose necessary helper functions such as CopySendEndOfRow().
> 4. Add CopyFromSkipErrorRow().
> 5. Documentation.
The attached v39 patch set uses the followings:
0001: Create copyto_internal.h and change COPY_XXX to
      COPY_SOURCE_XXX and COPY_DEST_XXX accordingly.
      (Same as 1. in your suggestion)
0002: Support custom format for both COPY TO and COPY FROM.
      (Same as 2. in your suggestion)
0003: Expose necessary helper functions such as CopySendEndOfRow()
      and add CopyFromSkipErrorRow().
      (3. + 4. in your suggestion)
0004: Define handler functions for built-in formats.
      (Not included in your suggestion)
0005: Documentation. (WIP)
      (Same as 5. in your suggestion)
We can merge 0001 quickly, right?
Thanks,
-- 
kou
From 76f8134652f14210817e872daab3c0a8b3c0318a Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Mon, 25 Nov 2024 13:58:33 +0900
Subject: [PATCH v39 1/5] Export CopyDest as private data
This is a preparation to export CopyToStateData as private data.
CopyToStateData depends on CopyDest. So we need to export CopyDest
too.
But CopyDest and CopySource has the same names. So we can't export
CopyDest as-is.
This uses the COPY_DEST_ prefix for CopyDest enum values. CopySource
uses the COPY_FROM_ prefix for consistency.
---
 src/backend/commands/copyfrom.c          |  4 ++--
 src/backend/commands/copyfromparse.c     | 10 ++++-----
 src/backend/commands/copyto.c            | 28 ++++++++----------------
 src/include/commands/copyfrom_internal.h |  6 ++---
 src/include/commands/copyto_internal.h   | 28 ++++++++++++++++++++++++
 5 files changed, 47 insertions(+), 29 deletions(-)
 create mode 100644 src/include/commands/copyto_internal.h
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index bcf66f0adf8..f58497d4187 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1709,7 +1709,7 @@ BeginCopyFrom(ParseState *pstate,
                             pg_encoding_to_char(GetDatabaseEncoding()))));
     }
 
-    cstate->copy_src = COPY_FILE;    /* default */
+    cstate->copy_src = COPY_SOURCE_FILE;    /* default */
 
     cstate->whereClause = whereClause;
 
@@ -1837,7 +1837,7 @@ BeginCopyFrom(ParseState *pstate,
     if (data_source_cb)
     {
         progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK;
-        cstate->copy_src = COPY_CALLBACK;
+        cstate->copy_src = COPY_SOURCE_CALLBACK;
         cstate->data_source_cb = data_source_cb;
     }
     else if (pipe)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index e8128f85e6b..17e51f02e04 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -180,7 +180,7 @@ ReceiveCopyBegin(CopyFromState cstate)
     for (i = 0; i < natts; i++)
         pq_sendint16(&buf, format); /* per-column formats */
     pq_endmessage(&buf);
-    cstate->copy_src = COPY_FRONTEND;
+    cstate->copy_src = COPY_SOURCE_FRONTEND;
     cstate->fe_msgbuf = makeStringInfo();
     /* We *must* flush here to ensure FE knows it can send. */
     pq_flush();
@@ -248,7 +248,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
 
     switch (cstate->copy_src)
     {
-        case COPY_FILE:
+        case COPY_SOURCE_FILE:
             bytesread = fread(databuf, 1, maxread, cstate->copy_file);
             if (ferror(cstate->copy_file))
                 ereport(ERROR,
@@ -257,7 +257,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
             if (bytesread == 0)
                 cstate->raw_reached_eof = true;
             break;
-        case COPY_FRONTEND:
+        case COPY_SOURCE_FRONTEND:
             while (maxread > 0 && bytesread < minread && !cstate->raw_reached_eof)
             {
                 int            avail;
@@ -340,7 +340,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
                 bytesread += avail;
             }
             break;
-        case COPY_CALLBACK:
+        case COPY_SOURCE_CALLBACK:
             bytesread = cstate->data_source_cb(databuf, minread, maxread);
             break;
     }
@@ -1172,7 +1172,7 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
          * after \. up to the protocol end of copy data.  (XXX maybe better
          * not to treat \. as special?)
          */
-        if (cstate->copy_src == COPY_FRONTEND)
+        if (cstate->copy_src == COPY_SOURCE_FRONTEND)
         {
             int            inbytes;
 
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 84a3f3879a8..05ad87d8220 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -20,6 +20,7 @@
 
 #include "access/tableam.h"
 #include "commands/copyapi.h"
+#include "commands/copyto_internal.h"
 #include "commands/progress.h"
 #include "executor/execdesc.h"
 #include "executor/executor.h"
@@ -36,17 +37,6 @@
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
-/*
- * Represents the different dest cases we need to worry about at
- * the bottom level
- */
-typedef enum CopyDest
-{
-    COPY_FILE,                    /* to file (or a piped program) */
-    COPY_FRONTEND,                /* to frontend */
-    COPY_CALLBACK,                /* to callback function */
-} CopyDest;
-
 /*
  * This struct contains all the state variables used throughout a COPY TO
  * operation.
@@ -401,7 +391,7 @@ SendCopyBegin(CopyToState cstate)
     for (i = 0; i < natts; i++)
         pq_sendint16(&buf, format); /* per-column formats */
     pq_endmessage(&buf);
-    cstate->copy_dest = COPY_FRONTEND;
+    cstate->copy_dest = COPY_DEST_FRONTEND;
 }
 
 static void
@@ -448,7 +438,7 @@ CopySendEndOfRow(CopyToState cstate)
 
     switch (cstate->copy_dest)
     {
-        case COPY_FILE:
+        case COPY_DEST_FILE:
             if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1,
                        cstate->copy_file) != 1 ||
                 ferror(cstate->copy_file))
@@ -482,11 +472,11 @@ CopySendEndOfRow(CopyToState cstate)
                              errmsg("could not write to COPY file: %m")));
             }
             break;
-        case COPY_FRONTEND:
+        case COPY_DEST_FRONTEND:
             /* Dump the accumulated row as one CopyData message */
             (void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len);
             break;
-        case COPY_CALLBACK:
+        case COPY_DEST_CALLBACK:
             cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len);
             break;
     }
@@ -507,7 +497,7 @@ CopySendTextLikeEndOfRow(CopyToState cstate)
 {
     switch (cstate->copy_dest)
     {
-        case COPY_FILE:
+        case COPY_DEST_FILE:
             /* Default line termination depends on platform */
 #ifndef WIN32
             CopySendChar(cstate, '\n');
@@ -515,7 +505,7 @@ CopySendTextLikeEndOfRow(CopyToState cstate)
             CopySendString(cstate, "\r\n");
 #endif
             break;
-        case COPY_FRONTEND:
+        case COPY_DEST_FRONTEND:
             /* The FE/BE protocol uses \n as newline for all platforms */
             CopySendChar(cstate, '\n');
             break;
@@ -900,12 +890,12 @@ BeginCopyTo(ParseState *pstate,
     /* See Multibyte encoding comment above */
     cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding);
 
-    cstate->copy_dest = COPY_FILE;    /* default */
+    cstate->copy_dest = COPY_DEST_FILE; /* default */
 
     if (data_dest_cb)
     {
         progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK;
-        cstate->copy_dest = COPY_CALLBACK;
+        cstate->copy_dest = COPY_DEST_CALLBACK;
         cstate->data_dest_cb = data_dest_cb;
     }
     else if (pipe)
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index c8b22af22d8..3a306e3286e 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -24,9 +24,9 @@
  */
 typedef enum CopySource
 {
-    COPY_FILE,                    /* from file (or a piped program) */
-    COPY_FRONTEND,                /* from frontend */
-    COPY_CALLBACK,                /* from callback function */
+    COPY_SOURCE_FILE,            /* from file (or a piped program) */
+    COPY_SOURCE_FRONTEND,        /* from frontend */
+    COPY_SOURCE_CALLBACK,        /* from callback function */
 } CopySource;
 
 /*
diff --git a/src/include/commands/copyto_internal.h b/src/include/commands/copyto_internal.h
new file mode 100644
index 00000000000..42ddb37a8a2
--- /dev/null
+++ b/src/include/commands/copyto_internal.h
@@ -0,0 +1,28 @@
+/*-------------------------------------------------------------------------
+ *
+ * copyto_internal.h
+ *      Internal definitions for COPY TO command.
+ *
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/commands/copyto_internal.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef COPYTO_INTERNAL_H
+#define COPYTO_INTERNAL_H
+
+/*
+ * Represents the different dest cases we need to worry about at
+ * the bottom level
+ */
+typedef enum CopyDest
+{
+    COPY_DEST_FILE,                /* to file (or a piped program) */
+    COPY_DEST_FRONTEND,            /* to frontend */
+    COPY_DEST_CALLBACK,            /* to callback function */
+} CopyDest;
+
+#endif                            /* COPYTO_INTERNAL_H */
-- 
2.47.2
From bcc2c19e59b4dbc15fcfd3b5c7d4837314e00518 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Thu, 27 Mar 2025 11:14:43 +0900
Subject: [PATCH v39 2/5] Add support for adding custom COPY format
This uses the handler approach like tablesample. The approach creates
an internal function that returns an internal struct. In this case,
a handler returns a CopyToRoutine for COPY TO and a CopyFromRoutine
for COPY FROM.
Whether COPY TO or COPY FROM is passed as the "is_from" argument:
    copy_handler(true) returns CopyToRoutine
    copy_handler(false) returns CopyFromRoutine
This also add a test module for custom COPY handler.
---
 src/backend/commands/copy.c                   |  31 ++++-
 src/backend/commands/copyfrom.c               |  20 +++-
 src/backend/commands/copyto.c                 |  70 +++--------
 src/backend/nodes/Makefile                    |   1 +
 src/backend/nodes/gen_node_support.pl         |   2 +
 src/backend/utils/adt/pseudotypes.c           |   1 +
 src/include/catalog/pg_proc.dat               |   6 +
 src/include/catalog/pg_type.dat               |   6 +
 src/include/commands/copy.h                   |   3 +-
 src/include/commands/copyapi.h                |   4 +
 src/include/commands/copyto_internal.h        |  55 +++++++++
 src/include/nodes/meson.build                 |   1 +
 src/test/modules/Makefile                     |   1 +
 src/test/modules/meson.build                  |   1 +
 src/test/modules/test_copy_format/.gitignore  |   4 +
 src/test/modules/test_copy_format/Makefile    |  23 ++++
 .../test_copy_format/expected/invalid.out     |  61 ++++++++++
 .../test_copy_format/expected/no_schema.out   |  23 ++++
 .../test_copy_format/expected/schema.out      |  56 +++++++++
 src/test/modules/test_copy_format/meson.build |  35 ++++++
 .../modules/test_copy_format/sql/invalid.sql  |  29 +++++
 .../test_copy_format/sql/no_schema.sql        |   8 ++
 .../modules/test_copy_format/sql/schema.sql   |  24 ++++
 .../test_copy_format--1.0.sql                 |  24 ++++
 .../test_copy_format/test_copy_format.c       | 113 ++++++++++++++++++
 .../test_copy_format/test_copy_format.control |   4 +
 26 files changed, 549 insertions(+), 57 deletions(-)
 mode change 100644 => 100755 src/backend/nodes/gen_node_support.pl
 create mode 100644 src/test/modules/test_copy_format/.gitignore
 create mode 100644 src/test/modules/test_copy_format/Makefile
 create mode 100644 src/test/modules/test_copy_format/expected/invalid.out
 create mode 100644 src/test/modules/test_copy_format/expected/no_schema.out
 create mode 100644 src/test/modules/test_copy_format/expected/schema.out
 create mode 100644 src/test/modules/test_copy_format/meson.build
 create mode 100644 src/test/modules/test_copy_format/sql/invalid.sql
 create mode 100644 src/test/modules/test_copy_format/sql/no_schema.sql
 create mode 100644 src/test/modules/test_copy_format/sql/schema.sql
 create mode 100644 src/test/modules/test_copy_format/test_copy_format--1.0.sql
 create mode 100644 src/test/modules/test_copy_format/test_copy_format.c
 create mode 100644 src/test/modules/test_copy_format/test_copy_format.control
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfca9d9dc29..2539aee3a53 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -32,10 +32,12 @@
 #include "parser/parse_coerce.h"
 #include "parser/parse_collate.h"
 #include "parser/parse_expr.h"
+#include "parser/parse_func.h"
 #include "parser/parse_relation.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
+#include "utils/regproc.h"
 #include "utils/rel.h"
 #include "utils/rls.h"
 
@@ -531,10 +533,31 @@ ProcessCopyOptions(ParseState *pstate,
             else if (strcmp(fmt, "binary") == 0)
                 opts_out->binary = true;
             else
-                ereport(ERROR,
-                        (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-                         errmsg("COPY format \"%s\" not recognized", fmt),
-                         parser_errposition(pstate, defel->location)));
+            {
+                List       *qualified_format;
+                Oid            arg_types[1];
+                Oid            handler = InvalidOid;
+
+                qualified_format = stringToQualifiedNameList(fmt, NULL);
+                arg_types[0] = INTERNALOID;
+                handler = LookupFuncName(qualified_format, 1,
+                                         arg_types, true);
+                if (!OidIsValid(handler))
+                    ereport(ERROR,
+                            (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                             errmsg("COPY format \"%s\" not recognized", fmt),
+                             parser_errposition(pstate, defel->location)));
+
+                /* check that handler has correct return type */
+                if (get_func_rettype(handler) != COPY_HANDLEROID)
+                    ereport(ERROR,
+                            (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+                             errmsg("function %s must return type %s",
+                                    fmt, "copy_handler"),
+                             parser_errposition(pstate, defel->location)));
+
+                opts_out->handler = handler;
+            }
         }
         else if (strcmp(defel->defname, "freeze") == 0)
         {
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index f58497d4187..91f44193abf 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -129,6 +129,7 @@ static void CopyFromBinaryEnd(CopyFromState cstate);
 
 /* text format */
 static const CopyFromRoutine CopyFromRoutineText = {
+    .type = T_CopyFromRoutine,
     .CopyFromInFunc = CopyFromTextLikeInFunc,
     .CopyFromStart = CopyFromTextLikeStart,
     .CopyFromOneRow = CopyFromTextOneRow,
@@ -137,6 +138,7 @@ static const CopyFromRoutine CopyFromRoutineText = {
 
 /* CSV format */
 static const CopyFromRoutine CopyFromRoutineCSV = {
+    .type = T_CopyFromRoutine,
     .CopyFromInFunc = CopyFromTextLikeInFunc,
     .CopyFromStart = CopyFromTextLikeStart,
     .CopyFromOneRow = CopyFromCSVOneRow,
@@ -145,6 +147,7 @@ static const CopyFromRoutine CopyFromRoutineCSV = {
 
 /* binary format */
 static const CopyFromRoutine CopyFromRoutineBinary = {
+    .type = T_CopyFromRoutine,
     .CopyFromInFunc = CopyFromBinaryInFunc,
     .CopyFromStart = CopyFromBinaryStart,
     .CopyFromOneRow = CopyFromBinaryOneRow,
@@ -155,7 +158,22 @@ static const CopyFromRoutine CopyFromRoutineBinary = {
 static const CopyFromRoutine *
 CopyFromGetRoutine(const CopyFormatOptions *opts)
 {
-    if (opts->csv_mode)
+    if (OidIsValid(opts->handler))
+    {
+        Datum        datum;
+        Node       *routine;
+
+        datum = OidFunctionCall1(opts->handler, BoolGetDatum(true));
+        routine = (Node *) DatumGetPointer(datum);
+        if (routine == NULL || !IsA(routine, CopyFromRoutine))
+            ereport(ERROR,
+                    (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                     errmsg("COPY handler function %s.%s did not return CopyFromRoutine struct",
+                            get_namespace_name(get_func_namespace(opts->handler)),
+                            get_func_name(opts->handler))));
+        return castNode(CopyFromRoutine, routine);
+    }
+    else if (opts->csv_mode)
         return &CopyFromRoutineCSV;
     else if (opts->binary)
         return &CopyFromRoutineBinary;
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 05ad87d8220..b7ff6466ce3 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -37,56 +37,6 @@
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
-/*
- * This struct contains all the state variables used throughout a COPY TO
- * operation.
- *
- * Multi-byte encodings: all supported client-side encodings encode multi-byte
- * characters by having the first byte's high bit set. Subsequent bytes of the
- * character can have the high bit not set. When scanning data in such an
- * encoding to look for a match to a single-byte (ie ASCII) character, we must
- * use the full pg_encoding_mblen() machinery to skip over multibyte
- * characters, else we might find a false match to a trailing byte. In
- * supported server encodings, there is no possibility of a false match, and
- * it's faster to make useless comparisons to trailing bytes than it is to
- * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true
- * when we have to do it the hard way.
- */
-typedef struct CopyToStateData
-{
-    /* format-specific routines */
-    const CopyToRoutine *routine;
-
-    /* low-level state data */
-    CopyDest    copy_dest;        /* type of copy source/destination */
-    FILE       *copy_file;        /* used if copy_dest == COPY_FILE */
-    StringInfo    fe_msgbuf;        /* used for all dests during COPY TO */
-
-    int            file_encoding;    /* file or remote side's character encoding */
-    bool        need_transcoding;    /* file encoding diff from server? */
-    bool        encoding_embeds_ascii;    /* ASCII can be non-first byte? */
-
-    /* parameters from the COPY command */
-    Relation    rel;            /* relation to copy to */
-    QueryDesc  *queryDesc;        /* executable query to copy from */
-    List       *attnumlist;        /* integer list of attnums to copy */
-    char       *filename;        /* filename, or NULL for STDOUT */
-    bool        is_program;        /* is 'filename' a program to popen? */
-    copy_data_dest_cb data_dest_cb; /* function for writing data */
-
-    CopyFormatOptions opts;
-    Node       *whereClause;    /* WHERE condition (or NULL) */
-
-    /*
-     * Working state
-     */
-    MemoryContext copycontext;    /* per-copy execution context */
-
-    FmgrInfo   *out_functions;    /* lookup info for output functions */
-    MemoryContext rowcontext;    /* per-row evaluation context */
-    uint64        bytes_processed;    /* number of bytes processed so far */
-} CopyToStateData;
-
 /* DestReceiver for COPY (query) TO */
 typedef struct
 {
@@ -140,6 +90,7 @@ static void CopySendInt16(CopyToState cstate, int16 val);
 
 /* text format */
 static const CopyToRoutine CopyToRoutineText = {
+    .type = T_CopyToRoutine,
     .CopyToStart = CopyToTextLikeStart,
     .CopyToOutFunc = CopyToTextLikeOutFunc,
     .CopyToOneRow = CopyToTextOneRow,
@@ -148,6 +99,7 @@ static const CopyToRoutine CopyToRoutineText = {
 
 /* CSV format */
 static const CopyToRoutine CopyToRoutineCSV = {
+    .type = T_CopyToRoutine,
     .CopyToStart = CopyToTextLikeStart,
     .CopyToOutFunc = CopyToTextLikeOutFunc,
     .CopyToOneRow = CopyToCSVOneRow,
@@ -156,6 +108,7 @@ static const CopyToRoutine CopyToRoutineCSV = {
 
 /* binary format */
 static const CopyToRoutine CopyToRoutineBinary = {
+    .type = T_CopyToRoutine,
     .CopyToStart = CopyToBinaryStart,
     .CopyToOutFunc = CopyToBinaryOutFunc,
     .CopyToOneRow = CopyToBinaryOneRow,
@@ -166,7 +119,22 @@ static const CopyToRoutine CopyToRoutineBinary = {
 static const CopyToRoutine *
 CopyToGetRoutine(const CopyFormatOptions *opts)
 {
-    if (opts->csv_mode)
+    if (OidIsValid(opts->handler))
+    {
+        Datum        datum;
+        Node       *routine;
+
+        datum = OidFunctionCall1(opts->handler, BoolGetDatum(false));
+        routine = (Node *) DatumGetPointer(datum);
+        if (routine == NULL || !IsA(routine, CopyToRoutine))
+            ereport(ERROR,
+                    (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                     errmsg("COPY handler function %s.%s did not return CopyToRoutine struct",
+                            get_namespace_name(get_func_namespace(opts->handler)),
+                            get_func_name(opts->handler))));
+        return castNode(CopyToRoutine, routine);
+    }
+    else if (opts->csv_mode)
         return &CopyToRoutineCSV;
     else if (opts->binary)
         return &CopyToRoutineBinary;
diff --git a/src/backend/nodes/Makefile b/src/backend/nodes/Makefile
index 77ddb9ca53f..dc6c1087361 100644
--- a/src/backend/nodes/Makefile
+++ b/src/backend/nodes/Makefile
@@ -50,6 +50,7 @@ node_headers = \
     access/sdir.h \
     access/tableam.h \
     access/tsmapi.h \
+    commands/copyapi.h \
     commands/event_trigger.h \
     commands/trigger.h \
     executor/tuptable.h \
diff --git a/src/backend/nodes/gen_node_support.pl b/src/backend/nodes/gen_node_support.pl
old mode 100644
new mode 100755
index 40994b53fb2..d7e8d16e789
--- a/src/backend/nodes/gen_node_support.pl
+++ b/src/backend/nodes/gen_node_support.pl
@@ -62,6 +62,7 @@ my @all_input_files = qw(
   access/sdir.h
   access/tableam.h
   access/tsmapi.h
+  commands/copyapi.h
   commands/event_trigger.h
   commands/trigger.h
   executor/tuptable.h
@@ -86,6 +87,7 @@ my @nodetag_only_files = qw(
   access/sdir.h
   access/tableam.h
   access/tsmapi.h
+  commands/copyapi.h
   commands/event_trigger.h
   commands/trigger.h
   executor/tuptable.h
diff --git a/src/backend/utils/adt/pseudotypes.c b/src/backend/utils/adt/pseudotypes.c
index 317a1f2b282..f2ebc21ca56 100644
--- a/src/backend/utils/adt/pseudotypes.c
+++ b/src/backend/utils/adt/pseudotypes.c
@@ -370,6 +370,7 @@ PSEUDOTYPE_DUMMY_IO_FUNCS(fdw_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(table_am_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(index_am_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(tsm_handler);
+PSEUDOTYPE_DUMMY_IO_FUNCS(copy_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(internal);
 PSEUDOTYPE_DUMMY_IO_FUNCS(anyelement);
 PSEUDOTYPE_DUMMY_IO_FUNCS(anynonarray);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 0d29ef50ff2..79b5a36088c 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -7838,6 +7838,12 @@
 { oid => '3312', descr => 'I/O',
   proname => 'tsm_handler_out', prorettype => 'cstring',
   proargtypes => 'tsm_handler', prosrc => 'tsm_handler_out' },
+{ oid => '8753', descr => 'I/O',
+  proname => 'copy_handler_in', proisstrict => 'f', prorettype => 'copy_handler',
+  proargtypes => 'cstring', prosrc => 'copy_handler_in' },
+{ oid => '8754', descr => 'I/O',
+  proname => 'copy_handler_out', prorettype => 'cstring',
+  proargtypes => 'copy_handler', prosrc => 'copy_handler_out' },
 { oid => '267', descr => 'I/O',
   proname => 'table_am_handler_in', proisstrict => 'f',
   prorettype => 'table_am_handler', proargtypes => 'cstring',
diff --git a/src/include/catalog/pg_type.dat b/src/include/catalog/pg_type.dat
index 6dca77e0a22..bddf9fb4fbe 100644
--- a/src/include/catalog/pg_type.dat
+++ b/src/include/catalog/pg_type.dat
@@ -633,6 +633,12 @@
   typcategory => 'P', typinput => 'tsm_handler_in',
   typoutput => 'tsm_handler_out', typreceive => '-', typsend => '-',
   typalign => 'i' },
+{ oid => '8752',
+  descr => 'pseudo-type for the result of a COPY TO/FROM handler function',
+  typname => 'copy_handler', typlen => '4', typbyval => 't', typtype => 'p',
+  typcategory => 'P', typinput => 'copy_handler_in',
+  typoutput => 'copy_handler_out', typreceive => '-', typsend => '-',
+  typalign => 'i' },
 { oid => '269',
   descr => 'pseudo-type for the result of a table AM handler function',
   typname => 'table_am_handler', typlen => '4', typbyval => 't', typtype => 'p',
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 06dfdfef721..6df1f8a3b9b 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -87,9 +87,10 @@ typedef struct CopyFormatOptions
     CopyLogVerbosityChoice log_verbosity;    /* verbosity of logged messages */
     int64        reject_limit;    /* maximum tolerable number of errors */
     List       *convert_select; /* list of column names (can be NIL) */
+    Oid            handler;        /* handler function for custom format routine */
 } CopyFormatOptions;
 
-/* These are private in commands/copy[from|to].c */
+/* These are private in commands/copy[from|to]_internal.h */
 typedef struct CopyFromStateData *CopyFromState;
 typedef struct CopyToStateData *CopyToState;
 
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 2a2d2f9876b..53ad3337f86 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -22,6 +22,8 @@
  */
 typedef struct CopyToRoutine
 {
+    NodeTag        type;
+
     /*
      * Set output function information. This callback is called once at the
      * beginning of COPY TO.
@@ -60,6 +62,8 @@ typedef struct CopyToRoutine
  */
 typedef struct CopyFromRoutine
 {
+    NodeTag        type;
+
     /*
      * Set input function information. This callback is called once at the
      * beginning of COPY FROM.
diff --git a/src/include/commands/copyto_internal.h b/src/include/commands/copyto_internal.h
index 42ddb37a8a2..12c4a0f5979 100644
--- a/src/include/commands/copyto_internal.h
+++ b/src/include/commands/copyto_internal.h
@@ -14,6 +14,11 @@
 #ifndef COPYTO_INTERNAL_H
 #define COPYTO_INTERNAL_H
 
+#include "commands/copy.h"
+#include "executor/execdesc.h"
+#include "executor/tuptable.h"
+#include "nodes/execnodes.h"
+
 /*
  * Represents the different dest cases we need to worry about at
  * the bottom level
@@ -25,4 +30,54 @@ typedef enum CopyDest
     COPY_DEST_CALLBACK,            /* to callback function */
 } CopyDest;
 
+/*
+ * This struct contains all the state variables used throughout a COPY TO
+ * operation.
+ *
+ * Multi-byte encodings: all supported client-side encodings encode multi-byte
+ * characters by having the first byte's high bit set. Subsequent bytes of the
+ * character can have the high bit not set. When scanning data in such an
+ * encoding to look for a match to a single-byte (ie ASCII) character, we must
+ * use the full pg_encoding_mblen() machinery to skip over multibyte
+ * characters, else we might find a false match to a trailing byte. In
+ * supported server encodings, there is no possibility of a false match, and
+ * it's faster to make useless comparisons to trailing bytes than it is to
+ * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true
+ * when we have to do it the hard way.
+ */
+typedef struct CopyToStateData
+{
+    /* format-specific routines */
+    const CopyToRoutine *routine;
+
+    /* low-level state data */
+    CopyDest    copy_dest;        /* type of copy source/destination */
+    FILE       *copy_file;        /* used if copy_dest == COPY_FILE */
+    StringInfo    fe_msgbuf;        /* used for all dests during COPY TO */
+
+    int            file_encoding;    /* file or remote side's character encoding */
+    bool        need_transcoding;    /* file encoding diff from server? */
+    bool        encoding_embeds_ascii;    /* ASCII can be non-first byte? */
+
+    /* parameters from the COPY command */
+    Relation    rel;            /* relation to copy to */
+    QueryDesc  *queryDesc;        /* executable query to copy from */
+    List       *attnumlist;        /* integer list of attnums to copy */
+    char       *filename;        /* filename, or NULL for STDOUT */
+    bool        is_program;        /* is 'filename' a program to popen? */
+    copy_data_dest_cb data_dest_cb; /* function for writing data */
+
+    CopyFormatOptions opts;
+    Node       *whereClause;    /* WHERE condition (or NULL) */
+
+    /*
+     * Working state
+     */
+    MemoryContext copycontext;    /* per-copy execution context */
+
+    FmgrInfo   *out_functions;    /* lookup info for output functions */
+    MemoryContext rowcontext;    /* per-row evaluation context */
+    uint64        bytes_processed;    /* number of bytes processed so far */
+} CopyToStateData;
+
 #endif                            /* COPYTO_INTERNAL_H */
diff --git a/src/include/nodes/meson.build b/src/include/nodes/meson.build
index d1ca24dd32f..96e70e7f38b 100644
--- a/src/include/nodes/meson.build
+++ b/src/include/nodes/meson.build
@@ -12,6 +12,7 @@ node_support_input_i = [
   'access/sdir.h',
   'access/tableam.h',
   'access/tsmapi.h',
+  'commands/copyapi.h',
   'commands/event_trigger.h',
   'commands/trigger.h',
   'executor/tuptable.h',
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index 4e4be3fa511..c9da440eed0 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -16,6 +16,7 @@ SUBDIRS = \
           spgist_name_ops \
           test_bloomfilter \
           test_copy_callbacks \
+          test_copy_format \
           test_custom_rmgrs \
           test_ddl_deparse \
           test_dsa \
diff --git a/src/test/modules/meson.build b/src/test/modules/meson.build
index 2b057451473..d33bbbd4092 100644
--- a/src/test/modules/meson.build
+++ b/src/test/modules/meson.build
@@ -15,6 +15,7 @@ subdir('spgist_name_ops')
 subdir('ssl_passphrase_callback')
 subdir('test_bloomfilter')
 subdir('test_copy_callbacks')
+subdir('test_copy_format')
 subdir('test_custom_rmgrs')
 subdir('test_ddl_deparse')
 subdir('test_dsa')
diff --git a/src/test/modules/test_copy_format/.gitignore b/src/test/modules/test_copy_format/.gitignore
new file mode 100644
index 00000000000..5dcb3ff9723
--- /dev/null
+++ b/src/test/modules/test_copy_format/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/src/test/modules/test_copy_format/Makefile b/src/test/modules/test_copy_format/Makefile
new file mode 100644
index 00000000000..8497f91624d
--- /dev/null
+++ b/src/test/modules/test_copy_format/Makefile
@@ -0,0 +1,23 @@
+# src/test/modules/test_copy_format/Makefile
+
+MODULE_big = test_copy_format
+OBJS = \
+    $(WIN32RES) \
+    test_copy_format.o
+PGFILEDESC = "test_copy_format - test custom COPY FORMAT"
+
+EXTENSION = test_copy_format
+DATA = test_copy_format--1.0.sql
+
+REGRESS = test_copy_format
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_copy_format
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/test_copy_format/expected/invalid.out
b/src/test/modules/test_copy_format/expected/invalid.out
new file mode 100644
index 00000000000..306c9928431
--- /dev/null
+++ b/src/test/modules/test_copy_format/expected/invalid.out
@@ -0,0 +1,61 @@
+CREATE SCHEMA test_schema;
+CREATE EXTENSION test_copy_format WITH SCHEMA test_schema;
+CREATE TABLE public.test (a smallint, b integer, c bigint);
+INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+SET search_path = public,test_schema;
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format_wrong_input_type');
+ERROR:  COPY format "test_copy_format_wrong_input_type" not recognized
+LINE 1: COPY public.test FROM stdin WITH (FORMAT 'test_copy_format_w...
+                                          ^
+COPY public.test TO stdout WITH (FORMAT 'test_copy_format_wrong_input_type');
+ERROR:  COPY format "test_copy_format_wrong_input_type" not recognized
+LINE 1: COPY public.test TO stdout WITH (FORMAT 'test_copy_format_wr...
+                                         ^
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format_wrong_return_type');
+ERROR:  function test_copy_format_wrong_return_type must return type copy_handler
+LINE 1: COPY public.test FROM stdin WITH (FORMAT 'test_copy_format_w...
+                                          ^
+COPY public.test TO stdout WITH (FORMAT 'test_copy_format_wrong_return_type');
+ERROR:  function test_copy_format_wrong_return_type must return type copy_handler
+LINE 1: COPY public.test TO stdout WITH (FORMAT 'test_copy_format_wr...
+                                         ^
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format_wrong_return_value');
+ERROR:  COPY handler function test_schema.test_copy_format_wrong_return_value did not return CopyFromRoutine struct
+COPY public.test TO stdout WITH (FORMAT 'test_copy_format_wrong_return_value');
+ERROR:  COPY handler function test_schema.test_copy_format_wrong_return_value did not return CopyToRoutine struct
+RESET search_path;
+COPY public.test FROM stdin WITH (FORMAT 'test_schema.test_copy_format_wrong_input_type');
+ERROR:  COPY format "test_schema.test_copy_format_wrong_input_type" not recognized
+LINE 1: COPY public.test FROM stdin WITH (FORMAT 'test_schema.test_c...
+                                          ^
+COPY public.test TO stdout WITH (FORMAT 'test_schema.test_copy_format_wrong_input_type');
+ERROR:  COPY format "test_schema.test_copy_format_wrong_input_type" not recognized
+LINE 1: COPY public.test TO stdout WITH (FORMAT 'test_schema.test_co...
+                                         ^
+COPY public.test FROM stdin WITH (FORMAT 'test_schema.test_copy_format_wrong_return_type');
+ERROR:  function test_schema.test_copy_format_wrong_return_type must return type copy_handler
+LINE 1: COPY public.test FROM stdin WITH (FORMAT 'test_schema.test_c...
+                                          ^
+COPY public.test TO stdout WITH (FORMAT 'test_schema.test_copy_format_wrong_return_type');
+ERROR:  function test_schema.test_copy_format_wrong_return_type must return type copy_handler
+LINE 1: COPY public.test TO stdout WITH (FORMAT 'test_schema.test_co...
+                                         ^
+COPY public.test FROM stdin WITH (FORMAT 'test_schema.test_copy_format_wrong_return_value');
+ERROR:  COPY handler function test_schema.test_copy_format_wrong_return_value did not return CopyFromRoutine struct
+COPY public.test TO stdout WITH (FORMAT 'test_schema.test_copy_format_wrong_return_value');
+ERROR:  COPY handler function test_schema.test_copy_format_wrong_return_value did not return CopyToRoutine struct
+COPY public.test FROM stdin WITH (FORMAT 'nonexistent');
+ERROR:  COPY format "nonexistent" not recognized
+LINE 1: COPY public.test FROM stdin WITH (FORMAT 'nonexistent');
+                                          ^
+COPY public.test TO stdout WITH (FORMAT 'nonexistent');
+ERROR:  COPY format "nonexistent" not recognized
+LINE 1: COPY public.test TO stdout WITH (FORMAT 'nonexistent');
+                                         ^
+COPY public.test FROM stdin WITH (FORMAT 'invalid.qualified.name');
+ERROR:  cross-database references are not implemented: invalid.qualified.name
+COPY public.test TO stdout WITH (FORMAT 'invalid.qualified.name');
+ERROR:  cross-database references are not implemented: invalid.qualified.name
+DROP TABLE public.test;
+DROP EXTENSION test_copy_format;
+DROP SCHEMA test_schema;
diff --git a/src/test/modules/test_copy_format/expected/no_schema.out
b/src/test/modules/test_copy_format/expected/no_schema.out
new file mode 100644
index 00000000000..d5903632b2e
--- /dev/null
+++ b/src/test/modules/test_copy_format/expected/no_schema.out
@@ -0,0 +1,23 @@
+CREATE EXTENSION test_copy_format;
+CREATE TABLE public.test (a smallint, b integer, c bigint);
+INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
+NOTICE:  test_copy_format: is_from=true
+NOTICE:  CopyFromInFunc: attribute: smallint
+NOTICE:  CopyFromInFunc: attribute: integer
+NOTICE:  CopyFromInFunc: attribute: bigint
+NOTICE:  CopyFromStart: the number of attributes: 3
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromEnd
+COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
+NOTICE:  test_copy_format: is_from=false
+NOTICE:  CopyToOutFunc: attribute: smallint
+NOTICE:  CopyToOutFunc: attribute: integer
+NOTICE:  CopyToOutFunc: attribute: bigint
+NOTICE:  CopyToStart: the number of attributes: 3
+NOTICE:  CopyToOneRow: the number of valid values: 3
+NOTICE:  CopyToOneRow: the number of valid values: 3
+NOTICE:  CopyToOneRow: the number of valid values: 3
+NOTICE:  CopyToEnd
+DROP TABLE public.test;
+DROP EXTENSION test_copy_format;
diff --git a/src/test/modules/test_copy_format/expected/schema.out
b/src/test/modules/test_copy_format/expected/schema.out
new file mode 100644
index 00000000000..698189fbeae
--- /dev/null
+++ b/src/test/modules/test_copy_format/expected/schema.out
@@ -0,0 +1,56 @@
+CREATE SCHEMA test_schema;
+CREATE EXTENSION test_copy_format WITH SCHEMA test_schema;
+CREATE TABLE public.test (a smallint, b integer, c bigint);
+INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+-- Qualified name
+COPY public.test FROM stdin WITH (FORMAT 'test_schema.test_copy_format');
+NOTICE:  test_copy_format: is_from=true
+NOTICE:  CopyFromInFunc: attribute: smallint
+NOTICE:  CopyFromInFunc: attribute: integer
+NOTICE:  CopyFromInFunc: attribute: bigint
+NOTICE:  CopyFromStart: the number of attributes: 3
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromEnd
+COPY public.test TO stdout WITH (FORMAT 'test_schema.test_copy_format');
+NOTICE:  test_copy_format: is_from=false
+NOTICE:  CopyToOutFunc: attribute: smallint
+NOTICE:  CopyToOutFunc: attribute: integer
+NOTICE:  CopyToOutFunc: attribute: bigint
+NOTICE:  CopyToStart: the number of attributes: 3
+NOTICE:  CopyToOneRow: the number of valid values: 3
+NOTICE:  CopyToOneRow: the number of valid values: 3
+NOTICE:  CopyToOneRow: the number of valid values: 3
+NOTICE:  CopyToEnd
+-- No schema, no search path
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
+ERROR:  COPY format "test_copy_format" not recognized
+LINE 1: COPY public.test FROM stdin WITH (FORMAT 'test_copy_format')...
+                                          ^
+COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
+ERROR:  COPY format "test_copy_format" not recognized
+LINE 1: COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
+                                         ^
+-- No schema, with search path
+SET search_path = test_schema,public;
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
+NOTICE:  test_copy_format: is_from=true
+NOTICE:  CopyFromInFunc: attribute: smallint
+NOTICE:  CopyFromInFunc: attribute: integer
+NOTICE:  CopyFromInFunc: attribute: bigint
+NOTICE:  CopyFromStart: the number of attributes: 3
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromEnd
+COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
+NOTICE:  test_copy_format: is_from=false
+NOTICE:  CopyToOutFunc: attribute: smallint
+NOTICE:  CopyToOutFunc: attribute: integer
+NOTICE:  CopyToOutFunc: attribute: bigint
+NOTICE:  CopyToStart: the number of attributes: 3
+NOTICE:  CopyToOneRow: the number of valid values: 3
+NOTICE:  CopyToOneRow: the number of valid values: 3
+NOTICE:  CopyToOneRow: the number of valid values: 3
+NOTICE:  CopyToEnd
+RESET search_path;
+DROP TABLE public.test;
+DROP EXTENSION test_copy_format;
+DROP SCHEMA test_schema;
diff --git a/src/test/modules/test_copy_format/meson.build b/src/test/modules/test_copy_format/meson.build
new file mode 100644
index 00000000000..8010659585b
--- /dev/null
+++ b/src/test/modules/test_copy_format/meson.build
@@ -0,0 +1,35 @@
+# Copyright (c) 2025, PostgreSQL Global Development Group
+
+test_copy_format_sources = files(
+  'test_copy_format.c',
+)
+
+if host_system == 'windows'
+  test_copy_format_sources += rc_lib_gen.process(win32ver_rc, extra_args: [
+    '--NAME', 'test_copy_format',
+    '--FILEDESC', 'test_copy_format - test custom COPY FORMAT',])
+endif
+
+test_copy_format = shared_module('test_copy_format',
+  test_copy_format_sources,
+  kwargs: pg_test_mod_args,
+)
+test_install_libs += test_copy_format
+
+test_install_data += files(
+  'test_copy_format.control',
+  'test_copy_format--1.0.sql',
+)
+
+tests += {
+  'name': 'test_copy_format',
+  'sd': meson.current_source_dir(),
+  'bd': meson.current_build_dir(),
+  'regress': {
+    'sql': [
+      'invalid',
+      'no_schema',
+      'schema',
+    ],
+  },
+}
diff --git a/src/test/modules/test_copy_format/sql/invalid.sql b/src/test/modules/test_copy_format/sql/invalid.sql
new file mode 100644
index 00000000000..e475f6a38c6
--- /dev/null
+++ b/src/test/modules/test_copy_format/sql/invalid.sql
@@ -0,0 +1,29 @@
+CREATE SCHEMA test_schema;
+CREATE EXTENSION test_copy_format WITH SCHEMA test_schema;
+CREATE TABLE public.test (a smallint, b integer, c bigint);
+INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+
+SET search_path = public,test_schema;
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format_wrong_input_type');
+COPY public.test TO stdout WITH (FORMAT 'test_copy_format_wrong_input_type');
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format_wrong_return_type');
+COPY public.test TO stdout WITH (FORMAT 'test_copy_format_wrong_return_type');
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format_wrong_return_value');
+COPY public.test TO stdout WITH (FORMAT 'test_copy_format_wrong_return_value');
+RESET search_path;
+
+COPY public.test FROM stdin WITH (FORMAT 'test_schema.test_copy_format_wrong_input_type');
+COPY public.test TO stdout WITH (FORMAT 'test_schema.test_copy_format_wrong_input_type');
+COPY public.test FROM stdin WITH (FORMAT 'test_schema.test_copy_format_wrong_return_type');
+COPY public.test TO stdout WITH (FORMAT 'test_schema.test_copy_format_wrong_return_type');
+COPY public.test FROM stdin WITH (FORMAT 'test_schema.test_copy_format_wrong_return_value');
+COPY public.test TO stdout WITH (FORMAT 'test_schema.test_copy_format_wrong_return_value');
+
+COPY public.test FROM stdin WITH (FORMAT 'nonexistent');
+COPY public.test TO stdout WITH (FORMAT 'nonexistent');
+COPY public.test FROM stdin WITH (FORMAT 'invalid.qualified.name');
+COPY public.test TO stdout WITH (FORMAT 'invalid.qualified.name');
+
+DROP TABLE public.test;
+DROP EXTENSION test_copy_format;
+DROP SCHEMA test_schema;
diff --git a/src/test/modules/test_copy_format/sql/no_schema.sql b/src/test/modules/test_copy_format/sql/no_schema.sql
new file mode 100644
index 00000000000..1e049f799f0
--- /dev/null
+++ b/src/test/modules/test_copy_format/sql/no_schema.sql
@@ -0,0 +1,8 @@
+CREATE EXTENSION test_copy_format;
+CREATE TABLE public.test (a smallint, b integer, c bigint);
+INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
+\.
+COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
+DROP TABLE public.test;
+DROP EXTENSION test_copy_format;
diff --git a/src/test/modules/test_copy_format/sql/schema.sql b/src/test/modules/test_copy_format/sql/schema.sql
new file mode 100644
index 00000000000..ab9492158e1
--- /dev/null
+++ b/src/test/modules/test_copy_format/sql/schema.sql
@@ -0,0 +1,24 @@
+CREATE SCHEMA test_schema;
+CREATE EXTENSION test_copy_format WITH SCHEMA test_schema;
+CREATE TABLE public.test (a smallint, b integer, c bigint);
+INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+
+-- Qualified name
+COPY public.test FROM stdin WITH (FORMAT 'test_schema.test_copy_format');
+\.
+COPY public.test TO stdout WITH (FORMAT 'test_schema.test_copy_format');
+
+-- No schema, no search path
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
+COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
+
+-- No schema, with search path
+SET search_path = test_schema,public;
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
+\.
+COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
+RESET search_path;
+
+DROP TABLE public.test;
+DROP EXTENSION test_copy_format;
+DROP SCHEMA test_schema;
diff --git a/src/test/modules/test_copy_format/test_copy_format--1.0.sql
b/src/test/modules/test_copy_format/test_copy_format--1.0.sql
new file mode 100644
index 00000000000..c1a137181f8
--- /dev/null
+++ b/src/test/modules/test_copy_format/test_copy_format--1.0.sql
@@ -0,0 +1,24 @@
+/* src/test/modules/test_copy_format/test_copy_format--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_copy_format" to load this file. \quit
+
+CREATE FUNCTION test_copy_format(internal)
+    RETURNS copy_handler
+    AS 'MODULE_PATHNAME', 'test_copy_format'
+    LANGUAGE C;
+
+CREATE FUNCTION test_copy_format_wrong_input_type(bool)
+    RETURNS copy_handler
+    AS 'MODULE_PATHNAME', 'test_copy_format'
+    LANGUAGE C;
+
+CREATE FUNCTION test_copy_format_wrong_return_type(internal)
+    RETURNS bool
+    AS 'MODULE_PATHNAME', 'test_copy_format'
+    LANGUAGE C;
+
+CREATE FUNCTION test_copy_format_wrong_return_value(internal)
+    RETURNS copy_handler
+    AS 'MODULE_PATHNAME', 'test_copy_format_wrong_return_value'
+    LANGUAGE C;
diff --git a/src/test/modules/test_copy_format/test_copy_format.c
b/src/test/modules/test_copy_format/test_copy_format.c
new file mode 100644
index 00000000000..1d754201336
--- /dev/null
+++ b/src/test/modules/test_copy_format/test_copy_format.c
@@ -0,0 +1,113 @@
+/*--------------------------------------------------------------------------
+ *
+ * test_copy_format.c
+ *        Code for testing custom COPY format.
+ *
+ * Portions Copyright (c) 2025, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *        src/test/modules/test_copy_format/test_copy_format.c
+ *
+ * -------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "commands/copyapi.h"
+#include "commands/defrem.h"
+#include "utils/builtins.h"
+
+PG_MODULE_MAGIC;
+
+static void
+TestCopyFromInFunc(CopyFromState cstate, Oid atttypid,
+                   FmgrInfo *finfo, Oid *typioparam)
+{
+    ereport(NOTICE, (errmsg("CopyFromInFunc: attribute: %s", format_type_be(atttypid))));
+}
+
+static void
+TestCopyFromStart(CopyFromState cstate, TupleDesc tupDesc)
+{
+    ereport(NOTICE, (errmsg("CopyFromStart: the number of attributes: %d", tupDesc->natts)));
+}
+
+static bool
+TestCopyFromOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls)
+{
+    ereport(NOTICE, (errmsg("CopyFromOneRow")));
+    return false;
+}
+
+static void
+TestCopyFromEnd(CopyFromState cstate)
+{
+    ereport(NOTICE, (errmsg("CopyFromEnd")));
+}
+
+static const CopyFromRoutine CopyFromRoutineTestCopyFormat = {
+    .type = T_CopyFromRoutine,
+    .CopyFromInFunc = TestCopyFromInFunc,
+    .CopyFromStart = TestCopyFromStart,
+    .CopyFromOneRow = TestCopyFromOneRow,
+    .CopyFromEnd = TestCopyFromEnd,
+};
+
+static void
+TestCopyToOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo)
+{
+    ereport(NOTICE, (errmsg("CopyToOutFunc: attribute: %s", format_type_be(atttypid))));
+}
+
+static void
+TestCopyToStart(CopyToState cstate, TupleDesc tupDesc)
+{
+    ereport(NOTICE, (errmsg("CopyToStart: the number of attributes: %d", tupDesc->natts)));
+}
+
+static void
+TestCopyToOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+    ereport(NOTICE, (errmsg("CopyToOneRow: the number of valid values: %u", slot->tts_nvalid)));
+}
+
+static void
+TestCopyToEnd(CopyToState cstate)
+{
+    ereport(NOTICE, (errmsg("CopyToEnd")));
+}
+
+static const CopyToRoutine CopyToRoutineTestCopyFormat = {
+    .type = T_CopyToRoutine,
+    .CopyToOutFunc = TestCopyToOutFunc,
+    .CopyToStart = TestCopyToStart,
+    .CopyToOneRow = TestCopyToOneRow,
+    .CopyToEnd = TestCopyToEnd,
+};
+
+PG_FUNCTION_INFO_V1(test_copy_format);
+Datum
+test_copy_format(PG_FUNCTION_ARGS)
+{
+    bool        is_from = PG_GETARG_BOOL(0);
+
+    ereport(NOTICE,
+            (errmsg("test_copy_format: is_from=%s", is_from ? "true" : "false")));
+
+    if (is_from)
+        PG_RETURN_POINTER(&CopyFromRoutineTestCopyFormat);
+    else
+        PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat);
+}
+
+PG_FUNCTION_INFO_V1(test_copy_format_wrong_return_value);
+Datum
+test_copy_format_wrong_return_value(PG_FUNCTION_ARGS)
+{
+    bool        is_from = PG_GETARG_BOOL(0);
+
+    if (is_from)
+        PG_RETURN_CSTRING(pstrdup("is_from=true"));
+    else
+        PG_RETURN_CSTRING(pstrdup("is_from=false"));
+}
diff --git a/src/test/modules/test_copy_format/test_copy_format.control
b/src/test/modules/test_copy_format/test_copy_format.control
new file mode 100644
index 00000000000..f05a6362358
--- /dev/null
+++ b/src/test/modules/test_copy_format/test_copy_format.control
@@ -0,0 +1,4 @@
+comment = 'Test code for custom COPY format'
+default_version = '1.0'
+module_pathname = '$libdir/test_copy_format'
+relocatable = true
-- 
2.47.2
From f0fcd9ee75b82d7365306317f99dbe1efff1f34b Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Thu, 27 Mar 2025 11:24:15 +0900
Subject: [PATCH v39 3/5] Add support for implementing custom COPY handler as
 extension
* TO: Add CopyToStateData::opaque that can be used to keep
  data for custom COPY TO handler implementation
* TO: Export CopySendEndOfRow() to send end of row data as
  CopyToStateFlush()
* FROM: Add CopyFromStateData::opaque that can be used to
  keep data for custom COPY FROM handler implementation
* FROM: Export CopyGetData() to get the next data as
  CopyFromStateGetData()
* FROM: Add CopyFromSkipErrorRow() for "ON_ERROR stop" and
  "LOG_VERBOSITY verbose"
COPY FROM extensions must call CopyFromSkipErrorRow() when
CopyFromOneRow callback reports an error by
errsave(). CopyFromSkipErrorRow() handles "ON_ERROR stop" and
"LOG_VERBOSITY verbose" cases.
---
 src/backend/commands/copyfromparse.c          | 93 ++++++++++++-------
 src/backend/commands/copyto.c                 | 12 +++
 src/include/commands/copyapi.h                |  6 ++
 src/include/commands/copyfrom_internal.h      |  3 +
 src/include/commands/copyto_internal.h        |  3 +
 .../test_copy_format/expected/no_schema.out   | 47 ++++++++++
 .../test_copy_format/sql/no_schema.sql        | 24 +++++
 .../test_copy_format/test_copy_format.c       | 80 +++++++++++++++-
 8 files changed, 231 insertions(+), 37 deletions(-)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 17e51f02e04..2070f51a963 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -739,6 +739,17 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes)
     return copied_bytes;
 }
 
+/*
+ * Export CopyGetData() for extensions. We want to keep CopyGetData() as a
+ * static function for optimization. CopyGetData() calls in this file may be
+ * optimized by a compiler.
+ */
+int
+CopyFromStateGetData(CopyFromState cstate, void *dest, int minread, int maxread)
+{
+    return CopyGetData(cstate, dest, minread, maxread);
+}
+
 /*
  * This function is exposed for use by extensions that read raw fields in the
  * next line. See NextCopyFromRawFieldsInternal() for details.
@@ -927,6 +938,51 @@ CopyFromCSVOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values,
     return CopyFromTextLikeOneRow(cstate, econtext, values, nulls, true);
 }
 
+/*
+ * Call this when you report an error by errsave() in your CopyFromOneRow
+ * callback. This handles "ON_ERROR stop" and "LOG_VERBOSITY verbose" cases
+ * for you.
+ */
+void
+CopyFromSkipErrorRow(CopyFromState cstate)
+{
+    Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
+
+    cstate->num_errors++;
+
+    if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+    {
+        /*
+         * Since we emit line number and column info in the below notice
+         * message, we suppress error context information other than the
+         * relation name.
+         */
+        Assert(!cstate->relname_only);
+        cstate->relname_only = true;
+
+        if (cstate->cur_attval)
+        {
+            char       *attval;
+
+            attval = CopyLimitPrintoutLength(cstate->cur_attval);
+            ereport(NOTICE,
+                    errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": \"%s\"",
+                           (unsigned long long) cstate->cur_lineno,
+                           cstate->cur_attname,
+                           attval));
+            pfree(attval);
+        }
+        else
+            ereport(NOTICE,
+                    errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": null
input",
+                           (unsigned long long) cstate->cur_lineno,
+                           cstate->cur_attname));
+
+        /* reset relname_only */
+        cstate->relname_only = false;
+    }
+}
+
 /*
  * Workhorse for CopyFromTextOneRow() and CopyFromCSVOneRow().
  *
@@ -1033,42 +1089,7 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
                                         (Node *) cstate->escontext,
                                         &values[m]))
         {
-            Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
-
-            cstate->num_errors++;
-
-            if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
-            {
-                /*
-                 * Since we emit line number and column info in the below
-                 * notice message, we suppress error context information other
-                 * than the relation name.
-                 */
-                Assert(!cstate->relname_only);
-                cstate->relname_only = true;
-
-                if (cstate->cur_attval)
-                {
-                    char       *attval;
-
-                    attval = CopyLimitPrintoutLength(cstate->cur_attval);
-                    ereport(NOTICE,
-                            errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\":
\"%s\"",
-                                   (unsigned long long) cstate->cur_lineno,
-                                   cstate->cur_attname,
-                                   attval));
-                    pfree(attval);
-                }
-                else
-                    ereport(NOTICE,
-                            errmsg("skipping row due to data type incompatibility at line %llu for column \"%s\": null
input",
-                                   (unsigned long long) cstate->cur_lineno,
-                                   cstate->cur_attname));
-
-                /* reset relname_only */
-                cstate->relname_only = false;
-            }
-
+            CopyFromSkipErrorRow(cstate);
             return true;
         }
 
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index b7ff6466ce3..23cbdad184c 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -456,6 +456,18 @@ CopySendEndOfRow(CopyToState cstate)
     resetStringInfo(fe_msgbuf);
 }
 
+/*
+ * Export CopySendEndOfRow() for extensions. We want to keep
+ * CopySendEndOfRow() as a static function for
+ * optimization. CopySendEndOfRow() calls in this file may be optimized by a
+ * compiler.
+ */
+void
+CopyToStateFlush(CopyToState cstate)
+{
+    CopySendEndOfRow(cstate);
+}
+
 /*
  * Wrapper function of CopySendEndOfRow for text and CSV formats. Sends the
  * line termination and do common appropriate things for the end of row.
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 53ad3337f86..500ece7d5bb 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -56,6 +56,8 @@ typedef struct CopyToRoutine
     void        (*CopyToEnd) (CopyToState cstate);
 } CopyToRoutine;
 
+extern void CopyToStateFlush(CopyToState cstate);
+
 /*
  * API structure for a COPY FROM format implementation. Note this must be
  * allocated in a server-lifetime manner, typically as a static const struct.
@@ -106,4 +108,8 @@ typedef struct CopyFromRoutine
     void        (*CopyFromEnd) (CopyFromState cstate);
 } CopyFromRoutine;
 
+extern int    CopyFromStateGetData(CopyFromState cstate, void *dest, int minread, int maxread);
+
+extern void CopyFromSkipErrorRow(CopyFromState cstate);
+
 #endif                            /* COPYAPI_H */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 3a306e3286e..af425cf5fd9 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -181,6 +181,9 @@ typedef struct CopyFromStateData
 #define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index)
 
     uint64        bytes_processed;    /* number of bytes processed so far */
+
+    /* For custom format implementation */
+    void       *opaque;            /* private space */
 } CopyFromStateData;
 
 extern void ReceiveCopyBegin(CopyFromState cstate);
diff --git a/src/include/commands/copyto_internal.h b/src/include/commands/copyto_internal.h
index 12c4a0f5979..14ee0f50588 100644
--- a/src/include/commands/copyto_internal.h
+++ b/src/include/commands/copyto_internal.h
@@ -78,6 +78,9 @@ typedef struct CopyToStateData
     FmgrInfo   *out_functions;    /* lookup info for output functions */
     MemoryContext rowcontext;    /* per-row evaluation context */
     uint64        bytes_processed;    /* number of bytes processed so far */
+
+    /* For custom format implementation */
+    void       *opaque;            /* private space */
 } CopyToStateData;
 
 #endif                            /* COPYTO_INTERNAL_H */
diff --git a/src/test/modules/test_copy_format/expected/no_schema.out
b/src/test/modules/test_copy_format/expected/no_schema.out
index d5903632b2e..05d160c1eae 100644
--- a/src/test/modules/test_copy_format/expected/no_schema.out
+++ b/src/test/modules/test_copy_format/expected/no_schema.out
@@ -1,6 +1,8 @@
 CREATE EXTENSION test_copy_format;
 CREATE TABLE public.test (a smallint, b integer, c bigint);
 INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+-- 987 is accepted.
+-- 654 is a hard error because ON_ERROR is stop by default.
 COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
 NOTICE:  test_copy_format: is_from=true
 NOTICE:  CopyFromInFunc: attribute: smallint
@@ -8,7 +10,50 @@ NOTICE:  CopyFromInFunc: attribute: integer
 NOTICE:  CopyFromInFunc: attribute: bigint
 NOTICE:  CopyFromStart: the number of attributes: 3
 NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromOneRow
+ERROR:  invalid value: "6"
+CONTEXT:  COPY test, line 2, column a: "6"
+-- 987 is accepted.
+-- 654 is a soft error because ON_ERROR is ignore.
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore);
+NOTICE:  test_copy_format: is_from=true
+NOTICE:  CopyFromInFunc: attribute: smallint
+NOTICE:  CopyFromInFunc: attribute: integer
+NOTICE:  CopyFromInFunc: attribute: bigint
+NOTICE:  CopyFromStart: the number of attributes: 3
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromOneRow
+NOTICE:  1 row was skipped due to data type incompatibility
 NOTICE:  CopyFromEnd
+-- 987 is accepted.
+-- 654 is a soft error because ON_ERROR is ignore.
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore, LOG_VERBOSITY verbose);
+NOTICE:  test_copy_format: is_from=true
+NOTICE:  CopyFromInFunc: attribute: smallint
+NOTICE:  CopyFromInFunc: attribute: integer
+NOTICE:  CopyFromInFunc: attribute: bigint
+NOTICE:  CopyFromStart: the number of attributes: 3
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromOneRow
+NOTICE:  skipping row due to data type incompatibility at line 2 for column "a": "6"
+NOTICE:  CopyFromOneRow
+NOTICE:  1 row was skipped due to data type incompatibility
+NOTICE:  CopyFromEnd
+-- 987 is accepted.
+-- 654 is a soft error because ON_ERROR is ignore.
+-- 321 is a hard error.
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore);
+NOTICE:  test_copy_format: is_from=true
+NOTICE:  CopyFromInFunc: attribute: smallint
+NOTICE:  CopyFromInFunc: attribute: integer
+NOTICE:  CopyFromInFunc: attribute: bigint
+NOTICE:  CopyFromStart: the number of attributes: 3
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromOneRow
+ERROR:  too much lines: 3
+CONTEXT:  COPY test, line 3
 COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
 NOTICE:  test_copy_format: is_from=false
 NOTICE:  CopyToOutFunc: attribute: smallint
@@ -18,6 +63,8 @@ NOTICE:  CopyToStart: the number of attributes: 3
 NOTICE:  CopyToOneRow: the number of valid values: 3
 NOTICE:  CopyToOneRow: the number of valid values: 3
 NOTICE:  CopyToOneRow: the number of valid values: 3
+NOTICE:  CopyToOneRow: the number of valid values: 3
+NOTICE:  CopyToOneRow: the number of valid values: 3
 NOTICE:  CopyToEnd
 DROP TABLE public.test;
 DROP EXTENSION test_copy_format;
diff --git a/src/test/modules/test_copy_format/sql/no_schema.sql b/src/test/modules/test_copy_format/sql/no_schema.sql
index 1e049f799f0..1901c4a9f43 100644
--- a/src/test/modules/test_copy_format/sql/no_schema.sql
+++ b/src/test/modules/test_copy_format/sql/no_schema.sql
@@ -1,7 +1,31 @@
 CREATE EXTENSION test_copy_format;
 CREATE TABLE public.test (a smallint, b integer, c bigint);
 INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+-- 987 is accepted.
+-- 654 is a hard error because ON_ERROR is stop by default.
 COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
+987
+654
+\.
+-- 987 is accepted.
+-- 654 is a soft error because ON_ERROR is ignore.
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore);
+987
+654
+\.
+-- 987 is accepted.
+-- 654 is a soft error because ON_ERROR is ignore.
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore, LOG_VERBOSITY verbose);
+987
+654
+\.
+-- 987 is accepted.
+-- 654 is a soft error because ON_ERROR is ignore.
+-- 321 is a hard error.
+COPY public.test FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore);
+987
+654
+321
 \.
 COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
 DROP TABLE public.test;
diff --git a/src/test/modules/test_copy_format/test_copy_format.c
b/src/test/modules/test_copy_format/test_copy_format.c
index 1d754201336..34ec693a7ec 100644
--- a/src/test/modules/test_copy_format/test_copy_format.c
+++ b/src/test/modules/test_copy_format/test_copy_format.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 
 #include "commands/copyapi.h"
+#include "commands/copyfrom_internal.h"
 #include "commands/defrem.h"
 #include "utils/builtins.h"
 
@@ -35,8 +36,85 @@ TestCopyFromStart(CopyFromState cstate, TupleDesc tupDesc)
 static bool
 TestCopyFromOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls)
 {
+    int            n_attributes = list_length(cstate->attnumlist);
+    char       *line;
+    int            line_size = n_attributes + 1;    /* +1 is for new line */
+    int            read_bytes;
+
     ereport(NOTICE, (errmsg("CopyFromOneRow")));
-    return false;
+
+    cstate->cur_lineno++;
+    line = palloc(line_size);
+    read_bytes = CopyFromStateGetData(cstate, line, line_size, line_size);
+    if (read_bytes == 0)
+        return false;
+    if (read_bytes != line_size)
+        ereport(ERROR,
+                (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                 errmsg("one line must be %d bytes: %d",
+                        line_size, read_bytes)));
+
+    if (cstate->cur_lineno == 1)
+    {
+        /* Success */
+        TupleDesc    tupDesc = RelationGetDescr(cstate->rel);
+        ListCell   *cur;
+        int            i = 0;
+
+        foreach(cur, cstate->attnumlist)
+        {
+            int            attnum = lfirst_int(cur);
+            int            m = attnum - 1;
+            Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+            if (att->atttypid == INT2OID)
+            {
+                values[i] = Int16GetDatum(line[i] - '0');
+            }
+            else if (att->atttypid == INT4OID)
+            {
+                values[i] = Int32GetDatum(line[i] - '0');
+            }
+            else if (att->atttypid == INT8OID)
+            {
+                values[i] = Int64GetDatum(line[i] - '0');
+            }
+            nulls[i] = false;
+            i++;
+        }
+    }
+    else if (cstate->cur_lineno == 2)
+    {
+        /* Soft error */
+        TupleDesc    tupDesc = RelationGetDescr(cstate->rel);
+        int            attnum = lfirst_int(list_head(cstate->attnumlist));
+        int            m = attnum - 1;
+        Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+        char        value[2];
+
+        cstate->cur_attname = NameStr(att->attname);
+        value[0] = line[0];
+        value[1] = '\0';
+        cstate->cur_attval = value;
+        errsave((Node *) cstate->escontext,
+                (
+                 errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+                 errmsg("invalid value: \"%c\"", line[0])));
+        CopyFromSkipErrorRow(cstate);
+        cstate->cur_attname = NULL;
+        cstate->cur_attval = NULL;
+        return true;
+    }
+    else
+    {
+        /* Hard error */
+        ereport(ERROR,
+                (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                 errmsg("too much lines: %llu",
+                        (unsigned long long) cstate->cur_lineno)));
+    }
+
+    return true;
 }
 
 static void
-- 
2.47.2
From 78c7ba67fb58ff44b492a207cf1833e296547175 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Thu, 27 Mar 2025 11:56:45 +0900
Subject: [PATCH v39 4/5] Use copy handlers for built-in formats
This adds copy handlers for text, csv and binary. We can simplify
Copy{To,From}GetRoutine() by this. We'll be able to remove
CopyFormatOptions::{binary,csv_mode} when we add more callbacks to
Copy{To,From}Routine and move format specific routines to
Copy{To,From}Routine::*.
---
 src/backend/commands/copy.c                   | 101 ++++++++++++------
 src/backend/commands/copyfrom.c               |  42 ++++----
 src/backend/commands/copyto.c                 |  42 ++++----
 src/include/catalog/pg_proc.dat               |  11 ++
 src/include/commands/copy.h                   |   2 +-
 src/include/commands/copyfrom_internal.h      |   6 +-
 src/include/commands/copyto_internal.h        |   6 +-
 .../test_copy_format/expected/builtin.out     |  34 ++++++
 src/test/modules/test_copy_format/meson.build |   1 +
 .../modules/test_copy_format/sql/builtin.sql  |  30 ++++++
 .../test_copy_format--1.0.sql                 |  15 +++
 11 files changed, 209 insertions(+), 81 deletions(-)
 create mode 100644 src/test/modules/test_copy_format/expected/builtin.out
 create mode 100644 src/test/modules/test_copy_format/sql/builtin.sql
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 2539aee3a53..2bd8989b1ae 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -22,7 +22,9 @@
 #include "access/table.h"
 #include "access/xact.h"
 #include "catalog/pg_authid.h"
-#include "commands/copy.h"
+#include "commands/copyapi.h"
+#include "commands/copyto_internal.h"
+#include "commands/copyfrom_internal.h"
 #include "commands/defrem.h"
 #include "executor/executor.h"
 #include "mb/pg_wchar.h"
@@ -521,43 +523,45 @@ ProcessCopyOptions(ParseState *pstate,
 
         if (strcmp(defel->defname, "format") == 0)
         {
-            char       *fmt = defGetString(defel);
+            char       *format = defGetString(defel);
+            List       *qualified_format;
+            char       *schema;
+            char       *fmt;
+            Oid            arg_types[1];
+            Oid            handler = InvalidOid;
 
             if (format_specified)
                 errorConflictingDefElem(defel, pstate);
             format_specified = true;
-            if (strcmp(fmt, "text") == 0)
-                 /* default format */ ;
-            else if (strcmp(fmt, "csv") == 0)
-                opts_out->csv_mode = true;
-            else if (strcmp(fmt, "binary") == 0)
-                opts_out->binary = true;
-            else
+
+            qualified_format = stringToQualifiedNameList(format, NULL);
+            DeconstructQualifiedName(qualified_format, &schema, &fmt);
+            if (!schema || strcmp(schema, "pg_catalog") == 0)
             {
-                List       *qualified_format;
-                Oid            arg_types[1];
-                Oid            handler = InvalidOid;
-
-                qualified_format = stringToQualifiedNameList(fmt, NULL);
-                arg_types[0] = INTERNALOID;
-                handler = LookupFuncName(qualified_format, 1,
-                                         arg_types, true);
-                if (!OidIsValid(handler))
-                    ereport(ERROR,
-                            (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-                             errmsg("COPY format \"%s\" not recognized", fmt),
-                             parser_errposition(pstate, defel->location)));
-
-                /* check that handler has correct return type */
-                if (get_func_rettype(handler) != COPY_HANDLEROID)
-                    ereport(ERROR,
-                            (errcode(ERRCODE_WRONG_OBJECT_TYPE),
-                             errmsg("function %s must return type %s",
-                                    fmt, "copy_handler"),
-                             parser_errposition(pstate, defel->location)));
-
-                opts_out->handler = handler;
+                if (strcmp(fmt, "csv") == 0)
+                    opts_out->csv_mode = true;
+                else if (strcmp(fmt, "binary") == 0)
+                    opts_out->binary = true;
             }
+
+            arg_types[0] = INTERNALOID;
+            handler = LookupFuncName(qualified_format, 1,
+                                     arg_types, true);
+            if (!OidIsValid(handler))
+                ereport(ERROR,
+                        (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                         errmsg("COPY format \"%s\" not recognized", format),
+                         parser_errposition(pstate, defel->location)));
+
+            /* check that handler has correct return type */
+            if (get_func_rettype(handler) != COPY_HANDLEROID)
+                ereport(ERROR,
+                        (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+                         errmsg("function %s must return type %s",
+                                format, "copy_handler"),
+                         parser_errposition(pstate, defel->location)));
+
+            opts_out->handler = handler;
         }
         else if (strcmp(defel->defname, "freeze") == 0)
         {
@@ -1040,3 +1044,36 @@ CopyGetAttnums(TupleDesc tupDesc, Relation rel, List *attnamelist)
 
     return attnums;
 }
+
+Datum
+copy_text_handler(PG_FUNCTION_ARGS)
+{
+    bool        is_from = PG_GETARG_BOOL(0);
+
+    if (is_from)
+        PG_RETURN_POINTER(&CopyFromRoutineText);
+    else
+        PG_RETURN_POINTER(&CopyToRoutineText);
+}
+
+Datum
+copy_csv_handler(PG_FUNCTION_ARGS)
+{
+    bool        is_from = PG_GETARG_BOOL(0);
+
+    if (is_from)
+        PG_RETURN_POINTER(&CopyFromRoutineCSV);
+    else
+        PG_RETURN_POINTER(&CopyToRoutineCSV);
+}
+
+Datum
+copy_binary_handler(PG_FUNCTION_ARGS)
+{
+    bool        is_from = PG_GETARG_BOOL(0);
+
+    if (is_from)
+        PG_RETURN_POINTER(&CopyFromRoutineBinary);
+    else
+        PG_RETURN_POINTER(&CopyToRoutineBinary);
+}
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 91f44193abf..7244eb6368a 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -45,6 +45,7 @@
 #include "rewrite/rewriteHandler.h"
 #include "storage/fd.h"
 #include "tcop/tcopprot.h"
+#include "utils/fmgroids.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/portal.h"
@@ -128,7 +129,7 @@ static void CopyFromBinaryEnd(CopyFromState cstate);
  */
 
 /* text format */
-static const CopyFromRoutine CopyFromRoutineText = {
+const CopyFromRoutine CopyFromRoutineText = {
     .type = T_CopyFromRoutine,
     .CopyFromInFunc = CopyFromTextLikeInFunc,
     .CopyFromStart = CopyFromTextLikeStart,
@@ -137,7 +138,7 @@ static const CopyFromRoutine CopyFromRoutineText = {
 };
 
 /* CSV format */
-static const CopyFromRoutine CopyFromRoutineCSV = {
+const CopyFromRoutine CopyFromRoutineCSV = {
     .type = T_CopyFromRoutine,
     .CopyFromInFunc = CopyFromTextLikeInFunc,
     .CopyFromStart = CopyFromTextLikeStart,
@@ -146,7 +147,7 @@ static const CopyFromRoutine CopyFromRoutineCSV = {
 };
 
 /* binary format */
-static const CopyFromRoutine CopyFromRoutineBinary = {
+const CopyFromRoutine CopyFromRoutineBinary = {
     .type = T_CopyFromRoutine,
     .CopyFromInFunc = CopyFromBinaryInFunc,
     .CopyFromStart = CopyFromBinaryStart,
@@ -158,28 +159,23 @@ static const CopyFromRoutine CopyFromRoutineBinary = {
 static const CopyFromRoutine *
 CopyFromGetRoutine(const CopyFormatOptions *opts)
 {
-    if (OidIsValid(opts->handler))
-    {
-        Datum        datum;
-        Node       *routine;
-
-        datum = OidFunctionCall1(opts->handler, BoolGetDatum(true));
-        routine = (Node *) DatumGetPointer(datum);
-        if (routine == NULL || !IsA(routine, CopyFromRoutine))
-            ereport(ERROR,
-                    (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-                     errmsg("COPY handler function %s.%s did not return CopyFromRoutine struct",
-                            get_namespace_name(get_func_namespace(opts->handler)),
-                            get_func_name(opts->handler))));
-        return castNode(CopyFromRoutine, routine);
-    }
-    else if (opts->csv_mode)
-        return &CopyFromRoutineCSV;
-    else if (opts->binary)
-        return &CopyFromRoutineBinary;
+    Oid            handler = opts->handler;
+    Datum        datum;
+    Node       *routine;
 
     /* default is text */
-    return &CopyFromRoutineText;
+    if (!OidIsValid(handler))
+        handler = F_TEXT_INTERNAL;
+
+    datum = OidFunctionCall1(handler, BoolGetDatum(true));
+    routine = (Node *) DatumGetPointer(datum);
+    if (routine == NULL || !IsA(routine, CopyFromRoutine))
+        ereport(ERROR,
+                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                 errmsg("COPY handler function %s.%s did not return CopyFromRoutine struct",
+                        get_namespace_name(get_func_namespace(handler)),
+                        get_func_name(handler))));
+    return castNode(CopyFromRoutine, routine);
 }
 
 /* Implementation of the start callback for text and CSV formats */
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 23cbdad184c..b244167a568 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -32,6 +32,7 @@
 #include "pgstat.h"
 #include "storage/fd.h"
 #include "tcop/tcopprot.h"
+#include "utils/fmgroids.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
@@ -89,7 +90,7 @@ static void CopySendInt16(CopyToState cstate, int16 val);
  */
 
 /* text format */
-static const CopyToRoutine CopyToRoutineText = {
+const CopyToRoutine CopyToRoutineText = {
     .type = T_CopyToRoutine,
     .CopyToStart = CopyToTextLikeStart,
     .CopyToOutFunc = CopyToTextLikeOutFunc,
@@ -98,7 +99,7 @@ static const CopyToRoutine CopyToRoutineText = {
 };
 
 /* CSV format */
-static const CopyToRoutine CopyToRoutineCSV = {
+const CopyToRoutine CopyToRoutineCSV = {
     .type = T_CopyToRoutine,
     .CopyToStart = CopyToTextLikeStart,
     .CopyToOutFunc = CopyToTextLikeOutFunc,
@@ -107,7 +108,7 @@ static const CopyToRoutine CopyToRoutineCSV = {
 };
 
 /* binary format */
-static const CopyToRoutine CopyToRoutineBinary = {
+const CopyToRoutine CopyToRoutineBinary = {
     .type = T_CopyToRoutine,
     .CopyToStart = CopyToBinaryStart,
     .CopyToOutFunc = CopyToBinaryOutFunc,
@@ -119,28 +120,23 @@ static const CopyToRoutine CopyToRoutineBinary = {
 static const CopyToRoutine *
 CopyToGetRoutine(const CopyFormatOptions *opts)
 {
-    if (OidIsValid(opts->handler))
-    {
-        Datum        datum;
-        Node       *routine;
-
-        datum = OidFunctionCall1(opts->handler, BoolGetDatum(false));
-        routine = (Node *) DatumGetPointer(datum);
-        if (routine == NULL || !IsA(routine, CopyToRoutine))
-            ereport(ERROR,
-                    (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-                     errmsg("COPY handler function %s.%s did not return CopyToRoutine struct",
-                            get_namespace_name(get_func_namespace(opts->handler)),
-                            get_func_name(opts->handler))));
-        return castNode(CopyToRoutine, routine);
-    }
-    else if (opts->csv_mode)
-        return &CopyToRoutineCSV;
-    else if (opts->binary)
-        return &CopyToRoutineBinary;
+    Oid            handler = opts->handler;
+    Datum        datum;
+    Node       *routine;
 
     /* default is text */
-    return &CopyToRoutineText;
+    if (!OidIsValid(handler))
+        handler = F_TEXT_INTERNAL;
+
+    datum = OidFunctionCall1(handler, BoolGetDatum(false));
+    routine = (Node *) DatumGetPointer(datum);
+    if (routine == NULL || !IsA(routine, CopyToRoutine))
+        ereport(ERROR,
+                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                 errmsg("COPY handler function %s.%s did not return CopyToRoutine struct",
+                        get_namespace_name(get_func_namespace(handler)),
+                        get_func_name(handler))));
+    return castNode(CopyToRoutine, routine);
 }
 
 /* Implementation of the start callback for text and CSV formats */
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 79b5a36088c..09bacb80084 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12485,4 +12485,15 @@
   proargtypes => 'int4',
   prosrc => 'gist_stratnum_common' },
 
+# COPY handlers
+{ oid => '8100', descr => 'text COPY FORMAT handler',
+  proname => 'text', provolatile => 'i', prorettype => 'copy_handler',
+  proargtypes => 'internal', prosrc => 'copy_text_handler' },
+{ oid => '8101', descr => 'csv COPY FORMAT handler',
+  proname => 'csv', provolatile => 'i', prorettype => 'copy_handler',
+  proargtypes => 'internal', prosrc => 'copy_csv_handler' },
+{ oid => '8102', descr => 'binary COPY FORMAT handler',
+  proname => 'binary', provolatile => 'i', prorettype => 'copy_handler',
+  proargtypes => 'internal', prosrc => 'copy_binary_handler' },
+
 ]
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 6df1f8a3b9b..4525261fcc4 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -87,7 +87,7 @@ typedef struct CopyFormatOptions
     CopyLogVerbosityChoice log_verbosity;    /* verbosity of logged messages */
     int64        reject_limit;    /* maximum tolerable number of errors */
     List       *convert_select; /* list of column names (can be NIL) */
-    Oid            handler;        /* handler function for custom format routine */
+    Oid            handler;        /* handler function */
 } CopyFormatOptions;
 
 /* These are private in commands/copy[from|to]_internal.h */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index af425cf5fd9..abeccf85c1c 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -14,7 +14,7 @@
 #ifndef COPYFROM_INTERNAL_H
 #define COPYFROM_INTERNAL_H
 
-#include "commands/copy.h"
+#include "commands/copyapi.h"
 #include "commands/trigger.h"
 #include "nodes/miscnodes.h"
 
@@ -197,4 +197,8 @@ extern bool CopyFromCSVOneRow(CopyFromState cstate, ExprContext *econtext,
 extern bool CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext,
                                  Datum *values, bool *nulls);
 
+extern PGDLLIMPORT const CopyFromRoutine CopyFromRoutineText;
+extern PGDLLIMPORT const CopyFromRoutine CopyFromRoutineCSV;
+extern PGDLLIMPORT const CopyFromRoutine CopyFromRoutineBinary;
+
 #endif                            /* COPYFROM_INTERNAL_H */
diff --git a/src/include/commands/copyto_internal.h b/src/include/commands/copyto_internal.h
index 14ee0f50588..1bd83fa7f14 100644
--- a/src/include/commands/copyto_internal.h
+++ b/src/include/commands/copyto_internal.h
@@ -14,7 +14,7 @@
 #ifndef COPYTO_INTERNAL_H
 #define COPYTO_INTERNAL_H
 
-#include "commands/copy.h"
+#include "commands/copyapi.h"
 #include "executor/execdesc.h"
 #include "executor/tuptable.h"
 #include "nodes/execnodes.h"
@@ -83,4 +83,8 @@ typedef struct CopyToStateData
     void       *opaque;            /* private space */
 } CopyToStateData;
 
+extern PGDLLIMPORT const CopyToRoutine CopyToRoutineText;
+extern PGDLLIMPORT const CopyToRoutine CopyToRoutineCSV;
+extern PGDLLIMPORT const CopyToRoutine CopyToRoutineBinary;
+
 #endif                            /* COPYTO_INTERNAL_H */
diff --git a/src/test/modules/test_copy_format/expected/builtin.out
b/src/test/modules/test_copy_format/expected/builtin.out
new file mode 100644
index 00000000000..11b1053c84e
--- /dev/null
+++ b/src/test/modules/test_copy_format/expected/builtin.out
@@ -0,0 +1,34 @@
+CREATE EXTENSION test_copy_format;
+CREATE TABLE public.test (a smallint, b integer, c bigint);
+INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+-- public.text must not be used
+COPY public.test FROM stdin WITH (FORMAT text);
+COPY public.test TO stdout WITH (FORMAT text);
+1    2    3
+12    34    56
+123    456    789
+COPY public.test FROM stdin WITH (FORMAT 'pg_catalog.text');
+COPY public.test TO stdout WITH (FORMAT 'pg_catalog.text');
+1    2    3
+12    34    56
+123    456    789
+-- public.csv must not be used
+COPY public.test FROM stdin WITH (FORMAT csv);
+COPY public.test TO stdout WITH (FORMAT csv);
+1,2,3
+12,34,56
+123,456,789
+COPY public.test FROM stdin WITH (FORMAT 'pg_catalog.csv');
+COPY public.test TO stdout WITH (FORMAT 'pg_catalog.csv');
+1,2,3
+12,34,56
+123,456,789
+-- public.binary must not be used
+\getenv abs_builddir PG_ABS_BUILDDIR
+\set filename :abs_builddir '/results/binary.data'
+COPY public.test TO :'filename' WITH (FORMAT binary);
+COPY public.test FROM :'filename' WITH (FORMAT binary);
+COPY public.test TO :'filename' WITH (FORMAT 'pg_catalog.binary');
+COPY public.test FROM :'filename' WITH (FORMAT 'pg_catalog.binary');
+DROP TABLE public.test;
+DROP EXTENSION test_copy_format;
diff --git a/src/test/modules/test_copy_format/meson.build b/src/test/modules/test_copy_format/meson.build
index 8010659585b..86ba610d4ff 100644
--- a/src/test/modules/test_copy_format/meson.build
+++ b/src/test/modules/test_copy_format/meson.build
@@ -27,6 +27,7 @@ tests += {
   'bd': meson.current_build_dir(),
   'regress': {
     'sql': [
+      'builtin',
       'invalid',
       'no_schema',
       'schema',
diff --git a/src/test/modules/test_copy_format/sql/builtin.sql b/src/test/modules/test_copy_format/sql/builtin.sql
new file mode 100644
index 00000000000..2d24069b538
--- /dev/null
+++ b/src/test/modules/test_copy_format/sql/builtin.sql
@@ -0,0 +1,30 @@
+CREATE EXTENSION test_copy_format;
+CREATE TABLE public.test (a smallint, b integer, c bigint);
+INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+
+-- public.text must not be used
+COPY public.test FROM stdin WITH (FORMAT text);
+\.
+COPY public.test TO stdout WITH (FORMAT text);
+COPY public.test FROM stdin WITH (FORMAT 'pg_catalog.text');
+\.
+COPY public.test TO stdout WITH (FORMAT 'pg_catalog.text');
+
+-- public.csv must not be used
+COPY public.test FROM stdin WITH (FORMAT csv);
+\.
+COPY public.test TO stdout WITH (FORMAT csv);
+COPY public.test FROM stdin WITH (FORMAT 'pg_catalog.csv');
+\.
+COPY public.test TO stdout WITH (FORMAT 'pg_catalog.csv');
+
+-- public.binary must not be used
+\getenv abs_builddir PG_ABS_BUILDDIR
+\set filename :abs_builddir '/results/binary.data'
+COPY public.test TO :'filename' WITH (FORMAT binary);
+COPY public.test FROM :'filename' WITH (FORMAT binary);
+COPY public.test TO :'filename' WITH (FORMAT 'pg_catalog.binary');
+COPY public.test FROM :'filename' WITH (FORMAT 'pg_catalog.binary');
+
+DROP TABLE public.test;
+DROP EXTENSION test_copy_format;
diff --git a/src/test/modules/test_copy_format/test_copy_format--1.0.sql
b/src/test/modules/test_copy_format/test_copy_format--1.0.sql
index c1a137181f8..bfa1900e828 100644
--- a/src/test/modules/test_copy_format/test_copy_format--1.0.sql
+++ b/src/test/modules/test_copy_format/test_copy_format--1.0.sql
@@ -22,3 +22,18 @@ CREATE FUNCTION test_copy_format_wrong_return_value(internal)
     RETURNS copy_handler
     AS 'MODULE_PATHNAME', 'test_copy_format_wrong_return_value'
     LANGUAGE C;
+
+CREATE FUNCTION text(internal)
+    RETURNS copy_handler
+    AS 'MODULE_PATHNAME', 'test_copy_format'
+    LANGUAGE C;
+
+CREATE FUNCTION csv(internal)
+    RETURNS copy_handler
+    AS 'MODULE_PATHNAME', 'test_copy_format'
+    LANGUAGE C;
+
+CREATE FUNCTION binary(internal)
+    RETURNS copy_handler
+    AS 'MODULE_PATHNAME', 'test_copy_format'
+    LANGUAGE C;
-- 
2.47.2
From e35257a1f7b9ae8aec73c4cbf41428f7acc10823 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Wed, 19 Mar 2025 11:46:34 +0900
Subject: [PATCH v39 5/5] Add document how to write a COPY handler
This is WIP because we haven't decided our API yet.
Co-authored-by: David G. Johnston <david.g.johnston@gmail.com>
---
 doc/src/sgml/copy-handler.sgml | 394 +++++++++++++++++++++++++++++++++
 doc/src/sgml/filelist.sgml     |   1 +
 doc/src/sgml/postgres.sgml     |   1 +
 src/include/commands/copyapi.h |   9 +-
 4 files changed, 401 insertions(+), 4 deletions(-)
 create mode 100644 doc/src/sgml/copy-handler.sgml
diff --git a/doc/src/sgml/copy-handler.sgml b/doc/src/sgml/copy-handler.sgml
new file mode 100644
index 00000000000..5bc87d16662
--- /dev/null
+++ b/doc/src/sgml/copy-handler.sgml
@@ -0,0 +1,394 @@
+<!-- doc/src/sgml/copy-handler.sgml -->
+
+<chapter id="copy-handler">
+ <title>Writing a Copy Handler</title>
+
+ <indexterm zone="copy-handler">
+  <primary><literal>COPY</literal> handler</primary>
+ </indexterm>
+
+ <para>
+  <productname>PostgreSQL</productname> supports
+  custom <link linkend="sql-copy"><literal>COPY</literal></link> handlers;
+  adding additional <replaceable>format_name</replaceable> options to
+  the <literal>FORMAT</literal> clause.
+ </para>
+
+ <para>
+  At the SQL level, a copy handler method is represented by a single SQL
+  function (see <xref linkend="sql-createfunction"/>), typically implemented in
+  C, having the signature
+<synopsis>
+<replaceable>format_name</replaceable>(internal) RETURNS <literal>copy_handler</literal>
+</synopsis>
+  The function's name is then accepted as a
+  valid <replaceable>format_name</replaceable>. The return
+  pseudo-type <literal>copy_handler</literal> informs the system that this
+  function needs to be registered as a copy handler.
+  The <type>internal</type> argument is a dummy that prevents this function
+  from being called directly from an SQL command. As the handler
+  implementation must be server-lifetime immutable; this SQL function's
+  volatility should be marked immutable. The <literal>link_symbol</literal>
+  for this function is the name of the implementation function, described
+  next.
+ </para>
+
+ <para>
+  The implementation function signature expected for the function named
+  in the <literal>link_symbol</literal> is:
+<synopsis>
+Datum
+<replaceable>copy_format_handler</replaceable>(PG_FUNCTION_ARGS)
+</synopsis>
+  The convention for the name is to replace the word
+  <replaceable>format</replaceable> in the placeholder above with the value given
+  to <replaceable>format_name</replaceable> in the SQL function.
+  The first argument is a <type>boolean</type> that indicates whether the handler
+  must provide a pointer to its implementation for <literal>COPY FROM</literal>
+  (a <type>CopyFromRoutine *</type>). If <literal>false</literal>, the handler
+  must provide a pointer to its implementation of <literal>COPY TO</literal>
+  (a <type>CopyToRoutine *</type>). These structs are declared in
+  <filename>src/include/commands/copyapi.h</filename>.
+ </para>
+
+ <para>
+  The structs hold pointers to implementation functions for initializing,
+  starting, processing rows, and ending a copy operation. The specific
+  structures vary a bit between <literal>COPY FROM</literal> and
+  <literal>COPY TO</literal> so the next two sections describes each
+  in detail.
+ </para>
+
+ <sect1 id="copy-handler-from">
+  <title>Copy From Handler</title>
+
+  <para>
+   The opening to this chapter describes how the executor will call the main
+   handler function with, in this case,
+   a <type>boolean</type> <literal>true</literal>, and expect to receive a
+   <type>CopyFromRoutine *</type> <type>Datum</type>. This section describes
+   the components of the <type>CopyFromRoutine</type> struct.
+  </para>
+
+  <para>
+<programlisting>
+void
+CopyFromInFunc(CopyFromState cstate,
+               Oid atttypid,
+               FmgrInfo *finfo,
+               Oid *typioparam);
+</programlisting>
+
+   This sets input function information for the
+   given <literal>atttypid</literal> attribute. This function is called once
+   at the beginning of <literal>COPY FROM</literal>. If
+   this <literal>COPY</literal> handler doesn't use any input functions, this
+   function doesn't need to do anything.
+
+   <variablelist>
+    <varlistentry>
+     <term><literal>CopyFromState *cstate</literal></term>
+     <listitem>
+      <para>
+       This is an internal struct that contains all the state variables used
+       throughout a <literal>COPY FROM</literal> operation.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><literal>Oid atttypid</literal></term>
+     <listitem>
+      <para>
+       This is the OID of data type used by the relation's attribute.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><literal>FmgrInfo *finfo</literal></term>
+     <listitem>
+      <para>
+       This can be optionally filled to provide the catalog information of
+       the input function.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><literal>Oid *typioparam</literal></term>
+     <listitem>
+      <para>
+       This can be optionally filled to define the OID of the type to
+       pass to the input function.
+      </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+  </para>
+
+  <para>
+<programlisting>
+void
+CopyFromStart(CopyFromState cstate,
+              TupleDesc tupDesc);
+</programlisting>
+
+   This starts a <literal>COPY FROM</literal>. This function is called once at
+   the beginning of <literal>COPY FROM</literal>.
+
+   <variablelist>
+    <varlistentry>
+     <term><literal>CopyFromState *cstate</literal></term>
+     <listitem>
+      <para>
+       This is an internal struct that contains all the state variables used
+       throughout a <literal>COPY FROM</literal> operation.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><literal>TupleDesc tupDesc</literal></term>
+     <listitem>
+      <para>
+       This is the tuple descriptor of the relation where the data needs to be
+       copied. This can be used for any initialization steps required by a
+       format.
+      </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+  </para>
+
+  <para>
+<programlisting>
+bool
+CopyFromOneRow(CopyFromState cstate,
+               ExprContext *econtext,
+               Datum *values,
+               bool *nulls);
+</programlisting>
+
+   This reads one row from the source and fill <literal>values</literal>
+   and <literal>nulls</literal>. If there is one or more tuples to be read,
+   this must return <literal>true</literal>. If there are no more tuples to
+   read, this must return <literal>false</literal>.
+
+   <variablelist>
+    <varlistentry>
+     <term><literal>CopyFromState *cstate</literal></term>
+     <listitem>
+      <para>
+       This is an internal struct that contains all the state variables used
+       throughout a <literal>COPY FROM</literal> operation.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><literal>ExprContext *econtext</literal></term>
+     <listitem>
+      <para>
+       This is used to evaluate default expression for each column that is
+       either not read from the file or is using
+       the <literal>DEFAULT</literal> option of <literal>COPY
+       FROM</literal>. It is <literal>NULL</literal> if no default values are
+       used.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><literal>Datum *values</literal></term>
+     <listitem>
+      <para>
+       This is an output variable to store read tuples.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><literal>bool *nulls</literal></term>
+     <listitem>
+      <para>
+       This is an output variable to store whether the read columns
+       are <literal>NULL</literal> or not.
+      </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+  </para>
+
+  <para>
+<programlisting>
+void
+CopyFromEnd(CopyFromState cstate);
+</programlisting>
+
+   This ends a <literal>COPY FROM</literal>. This function is called once at
+   the end of <literal>COPY FROM</literal>.
+
+   <variablelist>
+    <varlistentry>
+     <term><literal>CopyFromState *cstate</literal></term>
+     <listitem>
+      <para>
+       This is an internal struct that contains all the state variables used
+       throughout a <literal>COPY FROM</literal> operation.
+      </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+  </para>
+
+  <para>
+   TODO: Add CopyFromStateGetData() and CopyFromSkipErrowRow()?
+  </para>
+ </sect1>
+
+ <sect1 id="copy-handler-to">
+  <title>Copy To Handler</title>
+
+  <para>
+   The <literal>COPY</literal> handler function for <literal>COPY
+   TO</literal> returns a <type>CopyToRoutine</type> struct containing
+   pointers to the functions described below. All functions are required.
+  </para>
+
+  <para>
+<programlisting>
+void
+CopyToOutFunc(CopyToState cstate,
+              Oid atttypid,
+              FmgrInfo *finfo);
+</programlisting>
+
+   This sets output function information for the
+   given <literal>atttypid</literal> attribute. This function is called once
+   at the beginning of <literal>COPY TO</literal>. If
+   this <literal>COPY</literal> handler doesn't use any output functions, this
+   function doesn't need to do anything.
+
+   <variablelist>
+    <varlistentry>
+     <term><literal>CopyToState *cstate</literal></term>
+     <listitem>
+      <para>
+       This is an internal struct that contains all the state variables used
+       throughout a <literal>COPY TO</literal> operation.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><literal>Oid atttypid</literal></term>
+     <listitem>
+      <para>
+       This is the OID of data type used by the relation's attribute.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><literal>FmgrInfo *finfo</literal></term>
+     <listitem>
+      <para>
+       This can be optionally filled to provide the catalog information of
+       the output function.
+      </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+  </para>
+
+  <para>
+<programlisting>
+void
+CopyToStart(CopyToState cstate,
+            TupleDesc tupDesc);
+</programlisting>
+
+   This starts a <literal>COPY TO</literal>. This function is called once at
+   the beginning of <literal>COPY TO</literal>.
+
+   <variablelist>
+    <varlistentry>
+     <term><literal>CopyToState *cstate</literal></term>
+     <listitem>
+      <para>
+       This is an internal struct that contains all the state variables used
+       throughout a <literal>COPY TO</literal> operation.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><literal>TupleDesc tupDesc</literal></term>
+     <listitem>
+      <para>
+       This is the tuple descriptor of the relation where the data is read.
+      </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+  </para>
+
+  <para>
+<programlisting>
+bool
+CopyToOneRow(CopyToState cstate,
+             TupleTableSlot *slot);
+</programlisting>
+
+   This writes one row stored in <literal>slot</literal> to the destination.
+
+   <variablelist>
+    <varlistentry>
+     <term><literal>CopyToState *cstate</literal></term>
+     <listitem>
+      <para>
+       This is an internal struct that contains all the state variables used
+       throughout a <literal>COPY TO</literal> operation.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><literal>TupleTableSlot *slot</literal></term>
+     <listitem>
+      <para>
+       This is used to get row to be written.
+      </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+  </para>
+
+  <para>
+<programlisting>
+void
+CopyToEnd(CopyToState cstate);
+</programlisting>
+
+   This ends a <literal>COPY TO</literal>. This function is called once at
+   the end of <literal>COPY TO</literal>.
+
+   <variablelist>
+    <varlistentry>
+     <term><literal>CopyToState *cstate</literal></term>
+     <listitem>
+      <para>
+       This is an internal struct that contains all the state variables used
+       throughout a <literal>COPY TO</literal> operation.
+      </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+  </para>
+
+  <para>
+   TODO: Add CopyToStateFlush()?
+  </para>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 25fb99cee69..1fd6d32d5ec 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -107,6 +107,7 @@
 <!ENTITY storage    SYSTEM "storage.sgml">
 <!ENTITY transaction     SYSTEM "xact.sgml">
 <!ENTITY tablesample-method SYSTEM "tablesample-method.sgml">
+<!ENTITY copy-handler SYSTEM "copy-handler.sgml">
 <!ENTITY wal-for-extensions SYSTEM "wal-for-extensions.sgml">
 <!ENTITY generic-wal SYSTEM "generic-wal.sgml">
 <!ENTITY custom-rmgr SYSTEM "custom-rmgr.sgml">
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index af476c82fcc..8ba319ae2df 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -254,6 +254,7 @@ break is not needed in a wider output rendering.
   &plhandler;
   &fdwhandler;
   &tablesample-method;
+  ©-handler;
   &custom-scan;
   &geqo;
   &tableam;
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 500ece7d5bb..24710cb667a 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -28,10 +28,10 @@ typedef struct CopyToRoutine
      * Set output function information. This callback is called once at the
      * beginning of COPY TO.
      *
+     * 'atttypid' is the OID of data type used by the relation's attribute.
+     *
      * 'finfo' can be optionally filled to provide the catalog information of
      * the output function.
-     *
-     * 'atttypid' is the OID of data type used by the relation's attribute.
      */
     void        (*CopyToOutFunc) (CopyToState cstate, Oid atttypid,
                                   FmgrInfo *finfo);
@@ -70,12 +70,13 @@ typedef struct CopyFromRoutine
      * Set input function information. This callback is called once at the
      * beginning of COPY FROM.
      *
+     * 'atttypid' is the OID of data type used by the relation's attribute.
+     *
      * 'finfo' can be optionally filled to provide the catalog information of
      * the input function.
      *
      * 'typioparam' can be optionally filled to define the OID of the type to
-     * pass to the input function.'atttypid' is the OID of data type used by
-     * the relation's attribute.
+     * pass to the input function.
      */
     void        (*CopyFromInFunc) (CopyFromState cstate, Oid atttypid,
                                    FmgrInfo *finfo, Oid *typioparam);
-- 
2.47.2
			
		On Wed, Mar 26, 2025 at 8:28 PM Sutou Kouhei <kou@clear-code.com> wrote:
>
> > We need more regression tests for handling the given format name. For example,
> >
> > - more various input patterns.
> > - a function with the specified format name exists but it returns an
> > unexpected Node.
> > - looking for a handler function in a different namespace.
> > etc.
>
> I've added the following tests:
>
> * Wrong input type handler without namespace
> * Wrong input type handler with namespace
> * Wrong return type handler without namespace
> * Wrong return type handler with namespace
> * Wrong return value (Copy*Routine isn't returned) handler without namespace
> * Wrong return value (Copy*Routine isn't returned) handler with namespace
> * Nonexistent handler
> * Invalid qualified name
> * Valid handler without namespace and without search_path
> * Valid handler without namespace and with search_path
> * Valid handler with namespace
Probably we can merge these newly added four files into one .sql file?
Also we need to add some comments to describe what these queries test.
For example, it's not clear to me at a glance what queries in
no-schema.sql are going to test as there is no comment there.
>
> 0002 also includes this.
>
> > I think that we should accept qualified names too as the format name
> > like tablesample does. That way, different extensions implementing the
> > same format can be used.
>
> Implemented. It's implemented after parsing SQL. Is it OK?
> (It seems that tablesample does it in parsing SQL.)
I think it's okay.
One problem in the following chunk I can see is:
+           qualified_format = stringToQualifiedNameList(format, NULL);
+           DeconstructQualifiedName(qualified_format, &schema, &fmt);
+           if (!schema || strcmp(schema, "pg_catalog") == 0)
+           {
+               if (strcmp(fmt, "csv") == 0)
+                   opts_out->csv_mode = true;
+               else if (strcmp(fmt, "binary") == 0)
+                   opts_out->binary = true;
+           }
Non-qualified names depend on the search_path value so it's not
necessarily a built-in format. If the user specifies 'csv' with
seach_patch = 'myschema, pg_catalog', the COPY command unnecessarily
sets csv_mode true. I think we can instead check if the retrieved
handler function's OID matches the built-in formats' ones. Also, it's
weired to me that cstate has csv_mode and binary fields even though
the format should have already been known by the callback functions.
Regarding the documentation for the existing options, it says "...
only when not using XXX format." some places, where XXX can be
replaced with binary or CSV. Once we support custom formats, 'non-CSV
mode' would actually include custom formats as well, so we need to
update the description too.
>
> > ---
> > +static void
> > +CopyToOneRow(CopyToState cstate, TupleTableSlot *slot)
> > +{
> > +        ereport(NOTICE, (errmsg("CopyToOneRow: tts_nvalid=%u",
> > slot->tts_nvalid)));
> > +}
> >
> > Similar to the above comment, the field name 'tts_nvalid' might also
> > be changed in the future, let's use another name.
>
> Hmm. If the field name is changed, we need to change this
> code.
Yes, but if we use independe name in the NOTICE message we would not
need to update the expected files.
>
> > ---
> > +/*
> > + * Export CopySendEndOfRow() for extensions. We want to keep
> > + * CopySendEndOfRow() as a static function for
> > + * optimization. CopySendEndOfRow() calls in this file may be optimized by a
> > + * compiler.
> > + */
> > +void
> > +CopyToStateFlush(CopyToState cstate)
> > +{
> > +        CopySendEndOfRow(cstate);
> > +}
> >
> > Is there any reason to use a different name for public functions?
>
> In this patch set, I use "CopyFrom"/"CopyTo" prefixes for
> public APIs for custom COPY FORMAT handler extensions. It
> will help understanding related APIs. Is it strange in
> PostgreSQL?
I see your point. Probably we need to find a better name as the name
CopyToStateFlush doesn't sound well like this API should be called
only once at the end of a row (IOW user might try to call it multiple
times to 'flush' the state while processing a row). How about
CopyToEndOfRow()?
>
> > ---
> > +        /* For custom format implementation */
> > +        void      *opaque;                     /* private space */
> >
> > How about renaming 'private'?
>
> We should not use "private" because it's a keyword in
> C++. If we use "private" here, we can't include this file
> from C++ code.
Understood.
>
> > ---
> > I've not reviewed the documentation patch yet but I think the patch
> > seems to miss the updates to the description of the FORMAT option in
> > the COPY command section.
>
> I defer this for now. We can revisit the last documentation
> patch after we finalize our API. (Or could someone help us?)
>
> > I think we can reorganize the patch set as follows:
> >
> > 1. Create copyto_internal.h and change COPY_XXX to COPY_SOURCE_XXX and
> > COPY_DEST_XXX accordingly.
> > 2. Support custom format for both COPY TO and COPY FROM.
> > 3. Expose necessary helper functions such as CopySendEndOfRow().
> > 4. Add CopyFromSkipErrorRow().
> > 5. Documentation.
>
> The attached v39 patch set uses the followings:
>
> 0001: Create copyto_internal.h and change COPY_XXX to
>       COPY_SOURCE_XXX and COPY_DEST_XXX accordingly.
>       (Same as 1. in your suggestion)
> 0002: Support custom format for both COPY TO and COPY FROM.
>       (Same as 2. in your suggestion)
> 0003: Expose necessary helper functions such as CopySendEndOfRow()
>       and add CopyFromSkipErrorRow().
>       (3. + 4. in your suggestion)
> 0004: Define handler functions for built-in formats.
>       (Not included in your suggestion)
> 0005: Documentation. (WIP)
>       (Same as 5. in your suggestion)
Can we merge 0002 and 0004?
> We can merge 0001 quickly, right?
I don't think it makes sense to push only 0001 as it's a completely
preliminary patch for subsequent patches. It would be prudent to push
it once other patches are ready too.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
			
		Re: Make COPY format extendable: Extract COPY TO format implementations
От
 
		    	"David G. Johnston"
		    Дата:
		        
			
				On Friday, March 28, 2025, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
			
		
		
	
One problem in the following chunk I can see is:
+ qualified_format = stringToQualifiedNameList(format, NULL); 
+ DeconstructQualifiedName(qualified_format, &schema, &fmt); 
+ if (!schema || strcmp(schema, "pg_catalog") == 0)
+ {
+ if (strcmp(fmt, "csv") == 0)
+ opts_out->csv_mode = true;
+ else if (strcmp(fmt, "binary") == 0)
+ opts_out->binary = true;
+ }
Non-qualified names depend on the search_path value so it's not
necessarily a built-in format. If the user specifies 'csv' with
seach_patch = 'myschema, pg_catalog', the COPY command unnecessarily
sets csv_mode true. I think we can instead check if the retrieved
handler function's OID matches the built-in formats' ones. Also, it's
weired to me that cstate has csv_mode and binary fields even though
the format should have already been known by the callback functions.
I considered it a feature that a schema-less reference to text, csv, or binary always resolves to the core built-in handlers.  As does an unspecified format default of text.
To use an extension that chooses to override that format name would require schema qualification.
David J.
Hi,
In <CAD21AoBKMNsO+b6wahb6847xwFci1JCfV+JykoMziVgiFxB6cw@mail.gmail.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 28 Mar 2025 22:37:03 -0700,
  Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> I've added the following tests:
>>
>> * Wrong input type handler without namespace
>> * Wrong input type handler with namespace
>> * Wrong return type handler without namespace
>> * Wrong return type handler with namespace
>> * Wrong return value (Copy*Routine isn't returned) handler without namespace
>> * Wrong return value (Copy*Routine isn't returned) handler with namespace
>> * Nonexistent handler
>> * Invalid qualified name
>> * Valid handler without namespace and without search_path
>> * Valid handler without namespace and with search_path
>> * Valid handler with namespace
> 
> Probably we can merge these newly added four files into one .sql file?
I know that src/test/regress/sql/ uses this style (one .sql
file includes many test patterns in one large category). I
understand that the style is preferable in
src/test/regress/sql/ because src/test/regress/sql/ has
tests for many categories.
But do we need to follow the style in
src/test/modules/*/sql/ too? If we use the style in
src/test/modules/*/sql/, we need to have only one .sql in
src/test/modules/*/sql/ because src/test/modules/*/ are for
each category.
And the current .sql per sub-category style is easy to debug
(at least for me). For example, if we try qualified name
cases on debugger, we can use "\i sql/schema.sql" instead of
extracting target statements from .sql that includes many
unrelated statements. (Or we can use "\i sql/all.sql" and
many GDB "continue"s.)
BTW, it seems that src/test/modules/test_ddl_deparse/sql/
uses .sql per sub-category style. Should we use one .sql
file for sql/test/modules/test_copy_format/sql/? If it's
required for merging this patch set, I'll do it.
> Also we need to add some comments to describe what these queries test.
> For example, it's not clear to me at a glance what queries in
> no-schema.sql are going to test as there is no comment there.
Hmm. You refer no_schema.sql in 0002, right?
----
CREATE EXTENSION test_copy_format;
CREATE TABLE public.test (a smallint, b integer, c bigint);
INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
\.
COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
DROP TABLE public.test;
DROP EXTENSION test_copy_format;
----
In general, COPY FORMAT tests focus on "COPY FROM WITH
(FORMAT ...)" and "COPY TO WITH (FORMAT ...)". And the file
name "no_schema" shows that it doesn't use qualified
name. Based on this, I feel that the above content is very
straightforward without any comment.
What should we add as comments? For example, do we need the
following comments?
----
-- This extension includes custom COPY handler: test_copy_format
CREATE EXTENSION test_copy_format;
-- Test data
CREATE TABLE public.test (a smallint, b integer, c bigint);
INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
-- Use custom COPY handler, test_copy_format, without
-- schema for FROM.
COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
\.
-- Use custom COPY handler, test_copy_format, without
-- schema for TO.
COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
-- Cleanup
DROP TABLE public.test;
DROP EXTENSION test_copy_format;
----
> One problem in the following chunk I can see is:
> 
> +           qualified_format = stringToQualifiedNameList(format, NULL);
> +           DeconstructQualifiedName(qualified_format, &schema, &fmt);
> +           if (!schema || strcmp(schema, "pg_catalog") == 0)
> +           {
> +               if (strcmp(fmt, "csv") == 0)
> +                   opts_out->csv_mode = true;
> +               else if (strcmp(fmt, "binary") == 0)
> +                   opts_out->binary = true;
> +           }
> 
> Non-qualified names depend on the search_path value so it's not
> necessarily a built-in format. If the user specifies 'csv' with
> seach_patch = 'myschema, pg_catalog', the COPY command unnecessarily
> sets csv_mode true.
I think that we should always use built-in COPY handlers for
(not-qualified) "text", "csv" and "binary" for
compatibility. If we allow custom COPY handlers for
(not-qualified) "text", "csv" and "binary", pg_dump or
existing dump may be broken. Because we must use the same
COPY handler when we dump (COPY TO) and we restore (COPY
FROM).
BTW, the current implementation always uses
pg_catalog.{text,csv,binary} for (not-qualified) "text",
"csv" and "binary" even when there are
myschema.{text,csv,binary}. See
src/test/modules/test_copy_format/sql/builtin.sql. But I
haven't looked into it why...
>                     I think we can instead check if the retrieved
> handler function's OID matches the built-in formats' ones.
I agree that the approach is clear than the current
implementation. I'll use it when I create the next patch
set.
>                                                            Also, it's
> weired to me that cstate has csv_mode and binary fields even though
> the format should have already been known by the callback functions.
You refer CopyFomratOptions::{csv_mode,binary} not
Copy{To,From}StateData, right? And you suggest that we
should replace all opts.csv_mode and opts.binary with
opts.handler == F_CSV and opts.handler == F_BINARY, right?
We can do it but I suggest that we do it as a refactoring
(or cleanup) in a separated patch for easy to review.
> Regarding the documentation for the existing options, it says "...
> only when not using XXX format." some places, where XXX can be
> replaced with binary or CSV. Once we support custom formats, 'non-CSV
> mode' would actually include custom formats as well, so we need to
> update the description too.
I agree with you.
>> > ---
>> > +/*
>> > + * Export CopySendEndOfRow() for extensions. We want to keep
>> > + * CopySendEndOfRow() as a static function for
>> > + * optimization. CopySendEndOfRow() calls in this file may be optimized by a
>> > + * compiler.
>> > + */
>> > +void
>> > +CopyToStateFlush(CopyToState cstate)
>> > +{
>> > +        CopySendEndOfRow(cstate);
>> > +}
>> >
>> > Is there any reason to use a different name for public functions?
>>
>> In this patch set, I use "CopyFrom"/"CopyTo" prefixes for
>> public APIs for custom COPY FORMAT handler extensions. It
>> will help understanding related APIs. Is it strange in
>> PostgreSQL?
> 
> I see your point. Probably we need to find a better name as the name
> CopyToStateFlush doesn't sound well like this API should be called
> only once at the end of a row (IOW user might try to call it multiple
> times to 'flush' the state while processing a row). How about
> CopyToEndOfRow()?
CopyToStateFlush() can be called multiple times in a row. It
can also be called only once with multiple rows. Because it
just flushes the current buffer.
Existing CopySendEndOfRow() is called at the end of a
row. (Buffer is flushed at the end of row.) So I think that
the "EndOfRow" was chosen.
Some custom COPY handlers may not be row based. For example,
Apache Arrow COPY handler doesn't flush buffer for each row.
So, we should provide "flush" API not "end of row" API for
extensibility.
>> The attached v39 patch set uses the followings:
>>
>> 0001: Create copyto_internal.h and change COPY_XXX to
>>       COPY_SOURCE_XXX and COPY_DEST_XXX accordingly.
>>       (Same as 1. in your suggestion)
>> 0002: Support custom format for both COPY TO and COPY FROM.
>>       (Same as 2. in your suggestion)
>> 0003: Expose necessary helper functions such as CopySendEndOfRow()
>>       and add CopyFromSkipErrorRow().
>>       (3. + 4. in your suggestion)
>> 0004: Define handler functions for built-in formats.
>>       (Not included in your suggestion)
>> 0005: Documentation. (WIP)
>>       (Same as 5. in your suggestion)
> 
> Can we merge 0002 and 0004?
Can we do it when we merge this patch set if it's still
desirable at the time? Because:
* I think that separated 0002 and 0004 patches are easier to
  review than squashed 0002 and 0004 patch.
* I still think that someone may don't like defining COPY
  handlers for built-in formats. If we don't define COPY
  handlers for built-in formats finally, we can just drop
  0004.
>> We can merge 0001 quickly, right?
> 
> I don't think it makes sense to push only 0001 as it's a completely
> preliminary patch for subsequent patches. It would be prudent to push
> it once other patches are ready too.
Hmm. I feel that 0001 is a refactoring category patch like
merged patches. In general, distinct enum value names are
easier to understand.
BTW, does the "other patches" include the documentation
patch...?
Thanks,
-- 
kou
			
		Re: Make COPY format extendable: Extract COPY TO format implementations
От
 
		    	"David G. Johnston"
		    Дата:
		        On Sat, Mar 29, 2025 at 1:57 AM Sutou Kouhei <kou@clear-code.com> wrote:
* I still think that someone may don't like defining COPY
handlers for built-in formats. If we don't define COPY
handlers for built-in formats finally, we can just drop
0004.
We should (and usually do) dog-food APIs when reasonable and this situation seems quite reasonable.  I'd push back quite a bit about publishing this without any internal code leveraging it.
>> We can merge 0001 quickly, right?
>
> I don't think it makes sense to push only 0001 as it's a completely
> preliminary patch for subsequent patches. It would be prudent to push
> it once other patches are ready too.
Hmm. I feel that 0001 is a refactoring category patch like
merged patches. In general, distinct enum value names are
easier to understand.
I'm for pushing 0001.  We've had copyfrom_internal.h for a while now and this seems like a simple refactor to make that area of the code cleaner via symmetry.
David J.
Re: Make COPY format extendable: Extract COPY TO format implementations
От
 
		    	"David G. Johnston"
		    Дата:
		        On Wed, Mar 26, 2025 at 8:28 PM Sutou Kouhei <kou@clear-code.com> wrote:
The attached v39 patch set uses the followings:
0001: Create copyto_internal.h and change COPY_XXX to
COPY_SOURCE_XXX and COPY_DEST_XXX accordingly.
(Same as 1. in your suggestion)
0002: Support custom format for both COPY TO and COPY FROM.
(Same as 2. in your suggestion)
0003: Expose necessary helper functions such as CopySendEndOfRow()
and add CopyFromSkipErrorRow().
(3. + 4. in your suggestion)
0004: Define handler functions for built-in formats.
(Not included in your suggestion)
0005: Documentation. (WIP)
(Same as 5. in your suggestion)
I don't think this module should be responsible for testing the validity of "qualified names in a string literal" behavior.  Having some of the tests use a schema qualification, and I'd suggest explicit double-quoting/case-folding, wouldn't hurt just to demonstrate it's possible, and how extensions should be referenced, but definitely don't need tests to prove the broken cases are indeed broken.  This relies on an existing API that has its own tests.  It is definitely pointlessly redundant to have 6 tests that only differ from 6 other tests in their use of a schema qualification.
I prefer keeping 0002 and 0004 separate.  In particular, keeping the design choice of "unqualified internal format names ignore search_path" should stand out as its own commit.
David J.
Hi,
In <CAKFQuwYF7VnYcS9dkfvdzt-dgftMB1DV0bjRcNC8-4iYGS+gjw@mail.gmail.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Sat, 29 Mar 2025 09:48:22 -0700,
  "David G. Johnston" <david.g.johnston@gmail.com> wrote:
> I don't think this module should be responsible for testing the validity of
> "qualified names in a string literal" behavior.  Having some of the tests
> use a schema qualification, and I'd suggest explicit
> double-quoting/case-folding, wouldn't hurt just to demonstrate it's
> possible, and how extensions should be referenced, but definitely don't
> need tests to prove the broken cases are indeed broken.  This relies on an
> existing API that has its own tests.  It is definitely pointlessly
> redundant to have 6 tests that only differ from 6 other tests in their use
> of a schema qualification.
You suggest the followings, right?
1. Add tests for "Schema.Name" with mixed cases
2. Remove the following 6 tests in
   src/test/modules/test_copy_format/sql/invalid.sql
   ----
   COPY public.test FROM stdin WITH (FORMAT 'test_schema.test_copy_format_wrong_input_type');
   COPY public.test TO stdout WITH (FORMAT 'test_schema.test_copy_format_wrong_input_type');
   COPY public.test FROM stdin WITH (FORMAT 'test_schema.test_copy_format_wrong_return_type');
   COPY public.test TO stdout WITH (FORMAT 'test_schema.test_copy_format_wrong_return_type');
   COPY public.test FROM stdin WITH (FORMAT 'test_schema.test_copy_format_wrong_return_value');
   COPY public.test TO stdout WITH (FORMAT 'test_schema.test_copy_format_wrong_return_value');
   ----
   because we have the following 6 tests:
   ----
   SET search_path = public,test_schema;
   COPY public.test FROM stdin WITH (FORMAT 'test_copy_format_wrong_input_type');
   COPY public.test TO stdout WITH (FORMAT 'test_copy_format_wrong_input_type');
   COPY public.test FROM stdin WITH (FORMAT 'test_copy_format_wrong_return_type');
   COPY public.test TO stdout WITH (FORMAT 'test_copy_format_wrong_return_type');
   COPY public.test FROM stdin WITH (FORMAT 'test_copy_format_wrong_return_value');
   COPY public.test TO stdout WITH (FORMAT 'test_copy_format_wrong_return_value');
   RESET search_path;
   ----
3. Remove the following tests because the behavior must be
   tested in other places:
   ----
   COPY public.test FROM stdin WITH (FORMAT 'nonexistent');
   COPY public.test TO stdout WITH (FORMAT 'nonexistent');
   COPY public.test FROM stdin WITH (FORMAT 'invalid.qualified.name');
   COPY public.test TO stdout WITH (FORMAT 'invalid.qualified.name');
   ----
Does it miss something?
1.: There is no difference between single-quoting and
    double-quoting here. Because the information what quote
    was used for the given FORMAT value isn't remained
    here. Should we update gram.y?
2.: I don't have a strong opinion for it. If nobody objects
    it, I'll remove them.
3.: I don't have a strong opinion for it. If nobody objects
    it, I'll remove them.
Thanks,
-- 
kou
			
		Re: Make COPY format extendable: Extract COPY TO format implementations
От
 
		    	"David G. Johnston"
		    Дата:
		        On Wed, Mar 26, 2025 at 8:28 PM Sutou Kouhei <kou@clear-code.com> wrote:
> ---
> +static const CopyFromRoutine CopyFromRoutineTestCopyFormat = {
> + .type = T_CopyFromRoutine,
> + .CopyFromInFunc = CopyFromInFunc,
> + .CopyFromStart = CopyFromStart,
> + .CopyFromOneRow = CopyFromOneRow,
> + .CopyFromEnd = CopyFromEnd,
> +};
>
In trying to document the current API I'm strongly disliking it.  Namely, the amount of internal code an extension needs to care about/copy-paste to create a working handler.
Presently, pg_type defines text and binary I/O routines and the text/csv formats use the text I/O while binary uses binary I/O - for all attributes.  The CopyFromInFunc API allows for each attribute to somehow have its I/O format individualized.  But I don't see how that is practical or useful, and it adds burden on API users.
I suggest we remove both .CopyFromInFunc and .CopyFromStart/End and add a property to CopyFromRoutine (.ioMode?) with values of either Copy_IO_Text or Copy_IO_Binary and then just branch to either:
CopyFromTextLikeInFunc & CopyFromTextLikeStart/End
or
CopyFromBinaryInFunc & CopyFromStart/End
So, in effect, the only method an extension needs to write is converting to/from the 'serialized' form to the text/binary form (text being near unanimous).
In a similar manner, the amount of boilerplate within CopyFromOneRow seems undesirable from an API perspective.
cstate->cur_attname = NameStr(att->attname);
cstate->cur_attval = string;
if (string != NULL)
    nulls[m] = false;
if (cstate->defaults[m])
{
    /* We must have switched into the per-tuple memory context */
    Assert(econtext != NULL);
    Assert(CurrentMemoryContext == econtext->ecxt_per_tuple_memory);
    values[m] = ExecEvalExpr(defexprs[m], econtext, &nulls[m]);
}
/*
    * If ON_ERROR is specified with IGNORE, skip rows with soft errors
    */
else if (!InputFunctionCallSafe(&in_functions[m],
                                string,
                                typioparams[m],
                                att->atttypmod,
                                (Node *) cstate->escontext,
                                &values[m]))
{
    CopyFromSkipErrorRow(cstate);
    return true;
}
cstate->cur_attname = NULL;
cstate->cur_attval = NULL;
It seems to me that CopyFromOneRow could simply produce a *string collection, 
one cell per attribute, and NextCopyFrom could do all of the above on a for-loop over *string
The pg_type and pg_proc catalogs are not extensible so the API can and should be limited to
producing the, usually text, values that are ready to be passed into the text I/O routines and all
the work to find and use types and functions left in the template code.
I haven't looked at COPY TO but I am expecting much the same. The API should simply receive
the content of the type I/O output routine (binary or text as it dictates) for each output attribute, by
row, and be expected to take those values and produce a final output from them.
David J.
On Sat, Mar 29, 2025 at 9:49 AM David G. Johnston <david.g.johnston@gmail.com> wrote: > > On Wed, Mar 26, 2025 at 8:28 PM Sutou Kouhei <kou@clear-code.com> wrote: >> >> >> The attached v39 patch set uses the followings: >> >> 0001: Create copyto_internal.h and change COPY_XXX to >> COPY_SOURCE_XXX and COPY_DEST_XXX accordingly. >> (Same as 1. in your suggestion) >> 0002: Support custom format for both COPY TO and COPY FROM. >> (Same as 2. in your suggestion) >> 0003: Expose necessary helper functions such as CopySendEndOfRow() >> and add CopyFromSkipErrorRow(). >> (3. + 4. in your suggestion) >> 0004: Define handler functions for built-in formats. >> (Not included in your suggestion) >> 0005: Documentation. (WIP) >> (Same as 5. in your suggestion) >> > > I prefer keeping 0002 and 0004 separate. In particular, keeping the design choice of "unqualified internal format namesignore search_path" should stand out as its own commit. What is the point of having separate commits for already-agreed design choices? I guess that it would make it easier to revert that decision. But I think it makes more sense that if we agree with "unqualified internal format names ignore search_path" the original commit includes that decision and describes it in the commit message. If we want to change that design based on the discussion later on, we can have a separate commit that makes that change and has the link to the discussion. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
On Sat, Mar 29, 2025 at 1:57 AM Sutou Kouhei <kou@clear-code.com> wrote:
>
> Hi,
>
> In <CAD21AoBKMNsO+b6wahb6847xwFci1JCfV+JykoMziVgiFxB6cw@mail.gmail.com>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 28 Mar 2025 22:37:03 -0700,
>   Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> >> I've added the following tests:
> >>
> >> * Wrong input type handler without namespace
> >> * Wrong input type handler with namespace
> >> * Wrong return type handler without namespace
> >> * Wrong return type handler with namespace
> >> * Wrong return value (Copy*Routine isn't returned) handler without namespace
> >> * Wrong return value (Copy*Routine isn't returned) handler with namespace
> >> * Nonexistent handler
> >> * Invalid qualified name
> >> * Valid handler without namespace and without search_path
> >> * Valid handler without namespace and with search_path
> >> * Valid handler with namespace
> >
> > Probably we can merge these newly added four files into one .sql file?
>
> I know that src/test/regress/sql/ uses this style (one .sql
> file includes many test patterns in one large category). I
> understand that the style is preferable in
> src/test/regress/sql/ because src/test/regress/sql/ has
> tests for many categories.
>
> But do we need to follow the style in
> src/test/modules/*/sql/ too? If we use the style in
> src/test/modules/*/sql/, we need to have only one .sql in
> src/test/modules/*/sql/ because src/test/modules/*/ are for
> each category.
>
> And the current .sql per sub-category style is easy to debug
> (at least for me). For example, if we try qualified name
> cases on debugger, we can use "\i sql/schema.sql" instead of
> extracting target statements from .sql that includes many
> unrelated statements. (Or we can use "\i sql/all.sql" and
> many GDB "continue"s.)
>
> BTW, it seems that src/test/modules/test_ddl_deparse/sql/
> uses .sql per sub-category style. Should we use one .sql
> file for sql/test/modules/test_copy_format/sql/? If it's
> required for merging this patch set, I'll do it.
I'm not sure that the regression test queries are categorized in the
same way as in test_ddl_deparse. While the former have separate .sql
files for different types of inputs (valid inputs and invalid inputs
etc.) , which seems finer graind, the latter has .sql files for each
DDL command.
Most of the queries under test_copy_format/sql verifies the input
patterns of the FORMAT option. I find that the regression tests
included in that directory probably should focus on testing new
callback APIs and some regression tests for FORMAT option handling can
be moved into the normal regression test suite (e.g., in copy.sql or a
new file like copy_format.sql). IIUC testing for invalid input
patterns can be done even without creating artificial wrong handler
functions.
>
> > Also we need to add some comments to describe what these queries test.
> > For example, it's not clear to me at a glance what queries in
> > no-schema.sql are going to test as there is no comment there.
>
> Hmm. You refer no_schema.sql in 0002, right?
>
> ----
> CREATE EXTENSION test_copy_format;
> CREATE TABLE public.test (a smallint, b integer, c bigint);
> INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
> COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
> \.
> COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
> DROP TABLE public.test;
> DROP EXTENSION test_copy_format;
> ----
>
> In general, COPY FORMAT tests focus on "COPY FROM WITH
> (FORMAT ...)" and "COPY TO WITH (FORMAT ...)". And the file
> name "no_schema" shows that it doesn't use qualified
> name. Based on this, I feel that the above content is very
> straightforward without any comment.
>
> What should we add as comments? For example, do we need the
> following comments?
>
> ----
> -- This extension includes custom COPY handler: test_copy_format
> CREATE EXTENSION test_copy_format;
> -- Test data
> CREATE TABLE public.test (a smallint, b integer, c bigint);
> INSERT INTO public.test VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
> -- Use custom COPY handler, test_copy_format, without
> -- schema for FROM.
> COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
> \.
> -- Use custom COPY handler, test_copy_format, without
> -- schema for TO.
> COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
> -- Cleanup
> DROP TABLE public.test;
> DROP EXTENSION test_copy_format;
> ----
I'd like to see in the comment what the tests expect. Taking the
following queries as an example,
COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
\.
COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
it would help readers understand the test case better if we have a
comment like for example:
-- Specify the custom format name without schema. We test if both
-- COPY TO and COPY FROM can find the correct handler function
-- in public schema.
>
> > One problem in the following chunk I can see is:
> >
> > +           qualified_format = stringToQualifiedNameList(format, NULL);
> > +           DeconstructQualifiedName(qualified_format, &schema, &fmt);
> > +           if (!schema || strcmp(schema, "pg_catalog") == 0)
> > +           {
> > +               if (strcmp(fmt, "csv") == 0)
> > +                   opts_out->csv_mode = true;
> > +               else if (strcmp(fmt, "binary") == 0)
> > +                   opts_out->binary = true;
> > +           }
> >
> > Non-qualified names depend on the search_path value so it's not
> > necessarily a built-in format. If the user specifies 'csv' with
> > seach_patch = 'myschema, pg_catalog', the COPY command unnecessarily
> > sets csv_mode true.
>
> I think that we should always use built-in COPY handlers for
> (not-qualified) "text", "csv" and "binary" for
> compatibility. If we allow custom COPY handlers for
> (not-qualified) "text", "csv" and "binary", pg_dump or
> existing dump may be broken. Because we must use the same
> COPY handler when we dump (COPY TO) and we restore (COPY
> FROM).
I agreed.
>
> BTW, the current implementation always uses
> pg_catalog.{text,csv,binary} for (not-qualified) "text",
> "csv" and "binary" even when there are
> myschema.{text,csv,binary}. See
> src/test/modules/test_copy_format/sql/builtin.sql. But I
> haven't looked into it why...
Sorry, I don't follow that. IIUC test_copy_format extension doesn't
create a handler function in myschema schema, and SQLs in builtin.sql
seem to work as expected (specifying a non-qualified built-in format
unconditionally uses the built-in format).
>
> >                                                            Also, it's
> > weired to me that cstate has csv_mode and binary fields even though
> > the format should have already been known by the callback functions.
>
> You refer CopyFomratOptions::{csv_mode,binary} not
> Copy{To,From}StateData, right?
Yes. I referred to the wrong one.
> And you suggest that we
> should replace all opts.csv_mode and opts.binary with
> opts.handler == F_CSV and opts.handler == F_BINARY, right?
>
> We can do it but I suggest that we do it as a refactoring
> (or cleanup) in a separated patch for easy to review.
I think that csv_mode and binary are used mostly in
ProcessCopyOptions() so probably we can use local variables for that.
I find there are two other places where to use csv_mode:
NextCopyFromRawFields() and CopyToTextLikeStart(). I think we can
simply check the handler function's OID there, or we can define macros
like COPY_FORMAT_IS_TEXT/CSV/BINARY checking the OID and use them
there.
>
> >> > ---
> >> > +/*
> >> > + * Export CopySendEndOfRow() for extensions. We want to keep
> >> > + * CopySendEndOfRow() as a static function for
> >> > + * optimization. CopySendEndOfRow() calls in this file may be optimized by a
> >> > + * compiler.
> >> > + */
> >> > +void
> >> > +CopyToStateFlush(CopyToState cstate)
> >> > +{
> >> > +        CopySendEndOfRow(cstate);
> >> > +}
> >> >
> >> > Is there any reason to use a different name for public functions?
> >>
> >> In this patch set, I use "CopyFrom"/"CopyTo" prefixes for
> >> public APIs for custom COPY FORMAT handler extensions. It
> >> will help understanding related APIs. Is it strange in
> >> PostgreSQL?
> >
> > I see your point. Probably we need to find a better name as the name
> > CopyToStateFlush doesn't sound well like this API should be called
> > only once at the end of a row (IOW user might try to call it multiple
> > times to 'flush' the state while processing a row). How about
> > CopyToEndOfRow()?
>
> CopyToStateFlush() can be called multiple times in a row. It
> can also be called only once with multiple rows. Because it
> just flushes the current buffer.
>
> Existing CopySendEndOfRow() is called at the end of a
> row. (Buffer is flushed at the end of row.) So I think that
> the "EndOfRow" was chosen.
>
> Some custom COPY handlers may not be row based. For example,
> Apache Arrow COPY handler doesn't flush buffer for each row.
> So, we should provide "flush" API not "end of row" API for
> extensibility.
Okay, understood.
>
> >> We can merge 0001 quickly, right?
> >
> > I don't think it makes sense to push only 0001 as it's a completely
> > preliminary patch for subsequent patches. It would be prudent to push
> > it once other patches are ready too.
>
> Hmm. I feel that 0001 is a refactoring category patch like
> merged patches. In general, distinct enum value names are
> easier to understand.
Right, but the patches that have already been merged contributed to
speed up COPY commands, but 0001 patch also introduces
copyto_internal.h, which is not used by anyone in a case where the
custom copy format patch is not merged. Without adding
copyto_internal.h changing enum value names less makes sense to me.
> BTW, does the "other patches" include the documentation
> patch...?
I think that when pushing the main custom COPY format patch, we need
to include the documentation changes into it.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
			
		Re: Make COPY format extendable: Extract COPY TO format implementations
От
 
		    	"David G. Johnston"
		    Дата:
		        On Mon, Mar 31, 2025 at 11:52 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Sat, Mar 29, 2025 at 9:49 AM David G. Johnston
<david.g.johnston@gmail.com> wrote:
>
> On Wed, Mar 26, 2025 at 8:28 PM Sutou Kouhei <kou@clear-code.com> wrote:
>>
>>
>> The attached v39 patch set uses the followings:
>>
>> 0001: Create copyto_internal.h and change COPY_XXX to
>> COPY_SOURCE_XXX and COPY_DEST_XXX accordingly.
>> (Same as 1. in your suggestion)
>> 0002: Support custom format for both COPY TO and COPY FROM.
>> (Same as 2. in your suggestion)
>> 0003: Expose necessary helper functions such as CopySendEndOfRow()
>> and add CopyFromSkipErrorRow().
>> (3. + 4. in your suggestion)
>> 0004: Define handler functions for built-in formats.
>> (Not included in your suggestion)
>> 0005: Documentation. (WIP)
>> (Same as 5. in your suggestion)
>>
>
> I prefer keeping 0002 and 0004 separate. In particular, keeping the design choice of "unqualified internal format names ignore search_path" should stand out as its own commit.
What is the point of having separate commits for already-agreed design
choices? I guess that it would make it easier to revert that decision.
But I think it makes more sense that if we agree with "unqualified
internal format names ignore search_path" the original commit includes
that decision and describes it in the commit message. If we want to
change that design based on the discussion later on, we can have a
separate commit that makes that change and has the link to the
discussion.
Fair.  Comment withdrawn.  Though I was referring to the WIP patches; I figured the final patch would squash this all together in any case.
David J.
Hi, In <CAKFQuwbhSssKTJyeYo9rn20zffV3L7wdQSbEQ8zwRfC=uXLkVA@mail.gmail.com> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 31 Mar 2025 10:05:34 -0700, "David G. Johnston" <david.g.johnston@gmail.com> wrote: > The CopyFromInFunc API allows for each attribute to somehow > have its I/O format individualized. But I don't see how that is practical > or useful, and it adds burden on API users. If an extension want to use I/O routines, it can use the CopyFromInFunc API. Otherwise it can provide an empty function. For example, https://github.com/MasahikoSawada/pg_copy_jsonlines/blob/master/copy_jsonlines.c uses the CopyFromInFunc API but https://github.com/kou/pg-copy-arrow/blob/main/copy_arrow.cc uses an empty function for the CopyFromInFunc API. The "it adds burden" means that "defining an empty function is inconvenient", right? See also our past discussion for this design: https://www.postgresql.org/message-id/ZbijVn9_51mljMAG%40paquier.xyz > Keeping empty options does not strike as a bad idea, because this > forces extension developers to think about this code path rather than > just ignore it. > I suggest we remove both .CopyFromInFunc and .CopyFromStart/End and add a > property to CopyFromRoutine (.ioMode?) with values of either Copy_IO_Text > or Copy_IO_Binary and then just branch to either: > > CopyFromTextLikeInFunc & CopyFromTextLikeStart/End > or > CopyFromBinaryInFunc & CopyFromStart/End > > So, in effect, the only method an extension needs to write is converting > to/from the 'serialized' form to the text/binary form (text being near > unanimous). I object this API. If we choose this API, we can create only custom COPY formats that compatible with PostgreSQL's text/binary form. For example, the above jsonlines format and Apache Arrow format aren't implemented. It's meaningless to introduce this custom COPY format mechanism with the suggested API. > It seems to me that CopyFromOneRow could simply produce a *string > collection, > one cell per attribute, and NextCopyFrom could do all of the above on a > for-loop over *string You suggest that we use a string collection instead of a Datum collection in CopyFromOneRow() and convert a string collection to a Datum collection in NextCopyFrom(), right? I object this API. Because it has needless string <-> Datum conversion overhead. For example, https://github.com/MasahikoSawada/pg_copy_jsonlines/blob/master/copy_jsonlines.c parses a JSON value to Datum. If we use this API, we need to convert parsed Datum to string in an extension and NextCopyFrom() re-converts the converted string to Datum. It will slow down custom COPY format. I want this custom COPY format feature for performance. So APIs that require needless overhead for non text/csv/binary formats isn't acceptable to me. Thanks, -- kou
Hi,
In <CAD21AoDOcYah-nREv09BB3ZoB-k+Yf1XUfJcDMoq=LLvV1v75w@mail.gmail.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 31 Mar 2025 12:35:23 -0700,
  Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> Most of the queries under test_copy_format/sql verifies the input
> patterns of the FORMAT option. I find that the regression tests
> included in that directory probably should focus on testing new
> callback APIs and some regression tests for FORMAT option handling can
> be moved into the normal regression test suite (e.g., in copy.sql or a
> new file like copy_format.sql). IIUC testing for invalid input
> patterns can be done even without creating artificial wrong handler
> functions.
Can we clarify what should we do for the next patch set
before we create the next patch set? Are the followings
correct?
1. Move invalid input patterns in
   src/test/modules/test_copy_format/sql/invalid.sql to
   src/test/regress/sql/copy.sql as much as possible.
   * We can do only 4 patterns in 16 patterns.
   * Other tests in
     src/test/modules/test_copy_format/sql/*.sql depend on
     custom COPY handler for test. So we can't move to
     src/test/regress/sql/copy.sql.
2. Create
   src/test/modules/test_copy_format/sql/test_copy_format.sql
   and move all contents in existing *.sql to the file 
> I'd like to see in the comment what the tests expect. Taking the
> following queries as an example,
> 
> COPY public.test FROM stdin WITH (FORMAT 'test_copy_format');
> \.
> COPY public.test TO stdout WITH (FORMAT 'test_copy_format');
> 
> it would help readers understand the test case better if we have a
> comment like for example:
> 
> -- Specify the custom format name without schema. We test if both
> -- COPY TO and COPY FROM can find the correct handler function
> -- in public schema.
I agree with you that the comment is useful when we use
src/test/modules/test_copy_format/sql/test_copy_format.sql
for all tests. (I feel that it's redundant when we use
no_schema.sql.) I'll add it when I create
test_copy_format.sql in the next patch set.
>> BTW, the current implementation always uses
>> pg_catalog.{text,csv,binary} for (not-qualified) "text",
>> "csv" and "binary" even when there are
>> myschema.{text,csv,binary}. See
>> src/test/modules/test_copy_format/sql/builtin.sql. But I
>> haven't looked into it why...
> 
> Sorry, I don't follow that. IIUC test_copy_format extension doesn't
> create a handler function in myschema schema, and SQLs in builtin.sql
> seem to work as expected (specifying a non-qualified built-in format
> unconditionally uses the built-in format).
Ah, sorry. I should have not used "myschema." in the text
with builtin.sql reference. I just wanted to say "qualified
text,csv,binary formats" by "myschema.{text,csv,binary}". In
builtin.sql uses "public" schema not "myschema"
schema. Sorry.
Yes. builtin.sql works as expected but I don't know why. I
don't add any special codes for them. If "test_copy_format"
exists in public schema, "FORMAT 'test_copy_format'" uses
it. But if "text" exists in public schema, "FORMAT 'text'"
doesn't uses it. ("pg_catalog.text" is used instead.)
>> We can do it but I suggest that we do it as a refactoring
>> (or cleanup) in a separated patch for easy to review.
> 
> I think that csv_mode and binary are used mostly in
> ProcessCopyOptions() so probably we can use local variables for that.
> I find there are two other places where to use csv_mode:
> NextCopyFromRawFields() and CopyToTextLikeStart(). I think we can
> simply check the handler function's OID there, or we can define macros
> like COPY_FORMAT_IS_TEXT/CSV/BINARY checking the OID and use them
> there.
We need this change for "ready for merge", right?
Can we clarify items should be resolved for "ready for
merge"?
Are the followings correct?
1. Move invalid input patterns in
   src/test/modules/test_copy_format/sql/invalid.sql to
   src/test/regress/sql/copy.sql as much as possible.
2. Create
   src/test/modules/test_copy_format/sql/test_copy_format.sql
   and move all contents in existing *.sql to the file.
3. Add comments what the tests expect to
   src/test/modules/test_copy_format/sql/test_copy_format.sql.
4. Remove CopyFormatOptions::{binary,csv_mode}.
5. Squash the "Support custom format" patch and the "Define
   handler functions for built-in formats" patch.
   * Could you do it when you push it? Or is it required for
     "ready for merge"?
6. Use handler OID for detecting the default built-in format
   instead of comparing the given format as string.
7. Update documentation.
There are 3 unconfirmed suggested changes for tests in:
https://www.postgresql.org/message-id/20250330.113126.433742864258096312.kou%40clear-code.com
Here are my opinions for them:
> 1.: There is no difference between single-quoting and
>     double-quoting here. Because the information what quote
>     was used for the given FORMAT value isn't remained
>     here. Should we update gram.y?
> 
> 2.: I don't have a strong opinion for it. If nobody objects
>     it, I'll remove them.
> 
> 3.: I don't have a strong opinion for it. If nobody objects
>     it, I'll remove them.
Is the 1. required for "ready for merge"? If so, is there
any suggestion? I don't have a strong opinion for it.
If there are no more opinions for 2. and 3., I'll remove
them.
Thanks,
-- 
kou
			
		On Thu, Mar 27, 2025 at 11:29 AM Sutou Kouhei <kou@clear-code.com> wrote:
> We can merge 0001 quickly, right?
I did a brief review of v39-0001 and v39-0002.
text:
COPY_FILE
COPY_FRONTEND
still appear on comments in copyfrom_internal.h and copyto.c,
Should it be removed?
+#include "commands/copyto_internal.h"
#include "commands/progress.h"
#include "executor/execdesc.h"
#include "executor/executor.h"
#include "executor/tuptable.h"
"copyto_internal.h" already include:
#include "executor/execdesc.h"
#include "executor/tuptable.h"
so you should removed
"
#include "executor/execdesc.h"
#include "executor/tuptable.h"
"
in copyto.c.
CREATE FUNCTION test_copy_format(internal)
    RETURNS copy_handler
    AS 'MODULE_PATHNAME', 'test_copy_format'
    LANGUAGE C;
src/backend/commands/copy.c: ProcessCopyOptions
            if (strcmp(fmt, "text") == 0)
                 /* default format */ ;
            else if (strcmp(fmt, "csv") == 0)
                opts_out->csv_mode = true;
            else if (strcmp(fmt, "binary") == 0)
                opts_out->binary = true;
            else
            {
                List       *qualified_format;
                ....
            }
what if our customized format name is one of "csv", "binary", "text",
then that ELSE branch will never be reached.
then our customized format is being shadowed?
https://www.postgresql.org/docs/current/error-message-reporting.html
"The extra parentheses were required before PostgreSQL version 12, but
are now optional."
means that
    ereport(NOTICE, (errmsg("CopyFromInFunc: attribute: %s",
format_type_be(atttypid))));
can change to
    ereport(NOTICE, errmsg("CopyFromInFunc: attribute: %s",
format_type_be(atttypid)));
all
ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), ....
can also be simplified to
ereport(ERROR, errcode(ERRCODE_INVALID_PARAMETER_VALUE), ....
			
		Re: Make COPY format extendable: Extract COPY TO format implementations
От
 
		    	"David G. Johnston"
		    Дата:
		        On Sun, Apr 6, 2025 at 4:30 AM jian he <jian.universality@gmail.com> wrote:
CREATE FUNCTION test_copy_format(internal)
RETURNS copy_handler
AS 'MODULE_PATHNAME', 'test_copy_format'
LANGUAGE C;
src/backend/commands/copy.c: ProcessCopyOptions
if (strcmp(fmt, "text") == 0)
/* default format */ ;
else if (strcmp(fmt, "csv") == 0)
opts_out->csv_mode = true;
else if (strcmp(fmt, "binary") == 0)
opts_out->binary = true;
else
{
List *qualified_format;
....
}
what if our customized format name is one of "csv", "binary", "text",
then that ELSE branch will never be reached.
then our customized format is being shadowed?
Yes.  The user of your extension can specify a schema name to get access to your conflicting format name choice but all the existing code out there that relied on text/csv/binary being the built-in options continue to behave the same no matter the search_path.
David J.
Hi,
In <CACJufxG=njY32g=YAF4T6rvXySN56VFbYt4ffjLTRBYQTKPAFg@mail.gmail.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Sun, 6 Apr 2025 19:29:46 +0800,
  jian he <jian.universality@gmail.com> wrote:
> I did a brief review of v39-0001 and v39-0002.
> 
> text:
> COPY_FILE
> COPY_FRONTEND
> still appear on comments in copyfrom_internal.h and copyto.c,
> Should it be removed?
Good catch!
I found them in copy{from,to}_internal.h but couldn't find
them in copyto.c. It's a typo, right?
We should update them instead of removing them. I'll update
them in the next patch set.
> +#include "commands/copyto_internal.h"
> #include "commands/progress.h"
> #include "executor/execdesc.h"
> #include "executor/executor.h"
> #include "executor/tuptable.h"
> 
> "copyto_internal.h" already include:
> 
> #include "executor/execdesc.h"
> #include "executor/tuptable.h"
> so you should removed
> "
> #include "executor/execdesc.h"
> #include "executor/tuptable.h"
> "
> in copyto.c.
You're right. I'll update this too in the next patch set.
> CREATE FUNCTION test_copy_format(internal)
>     RETURNS copy_handler
>     AS 'MODULE_PATHNAME', 'test_copy_format'
>     LANGUAGE C;
> src/backend/commands/copy.c: ProcessCopyOptions
>             if (strcmp(fmt, "text") == 0)
>                  /* default format */ ;
>             else if (strcmp(fmt, "csv") == 0)
>                 opts_out->csv_mode = true;
>             else if (strcmp(fmt, "binary") == 0)
>                 opts_out->binary = true;
>             else
>             {
>                 List       *qualified_format;
>                 ....
>             }
> what if our customized format name is one of "csv", "binary", "text",
> then that ELSE branch will never be reached.
> then our customized format is being shadowed?
Right. We should not use customized format handlers to keep
backward compatibility.
> https://www.postgresql.org/docs/current/error-message-reporting.html
> "The extra parentheses were required before PostgreSQL version 12, but
> are now optional."
> 
> means that
>     ereport(NOTICE, (errmsg("CopyFromInFunc: attribute: %s",
> format_type_be(atttypid))));
> can change to
>     ereport(NOTICE, errmsg("CopyFromInFunc: attribute: %s",
> format_type_be(atttypid)));
> 
> all
> ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), ....
> 
> can also be simplified to
> 
> ereport(ERROR, errcode(ERRCODE_INVALID_PARAMETER_VALUE), ....
Oh, I didn't notice it. Can we do it as a separated patch
because we have many codes that use this style in
copy*.c. The separated patch should update this style at
once.
Thanks,
-- 
kou
			
		On Fri, Apr 4, 2025 at 1:38 AM Sutou Kouhei <kou@clear-code.com> wrote:
>
> Hi,
>
> In <CAD21AoDOcYah-nREv09BB3ZoB-k+Yf1XUfJcDMoq=LLvV1v75w@mail.gmail.com>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 31 Mar 2025 12:35:23 -0700,
>   Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> > Most of the queries under test_copy_format/sql verifies the input
> > patterns of the FORMAT option. I find that the regression tests
> > included in that directory probably should focus on testing new
> > callback APIs and some regression tests for FORMAT option handling can
> > be moved into the normal regression test suite (e.g., in copy.sql or a
> > new file like copy_format.sql). IIUC testing for invalid input
> > patterns can be done even without creating artificial wrong handler
> > functions.
>
> Can we clarify what should we do for the next patch set
> before we create the next patch set? Are the followings
> correct?
>
> 1. Move invalid input patterns in
>    src/test/modules/test_copy_format/sql/invalid.sql to
>    src/test/regress/sql/copy.sql as much as possible.
>    * We can do only 4 patterns in 16 patterns.
>    * Other tests in
>      src/test/modules/test_copy_format/sql/*.sql depend on
>      custom COPY handler for test. So we can't move to
>      src/test/regress/sql/copy.sql.
> 2. Create
>    src/test/modules/test_copy_format/sql/test_copy_format.sql
>    and move all contents in existing *.sql to the file
Agreed.
>
> >> We can do it but I suggest that we do it as a refactoring
> >> (or cleanup) in a separated patch for easy to review.
> >
> > I think that csv_mode and binary are used mostly in
> > ProcessCopyOptions() so probably we can use local variables for that.
> > I find there are two other places where to use csv_mode:
> > NextCopyFromRawFields() and CopyToTextLikeStart(). I think we can
> > simply check the handler function's OID there, or we can define macros
> > like COPY_FORMAT_IS_TEXT/CSV/BINARY checking the OID and use them
> > there.
>
> We need this change for "ready for merge", right?
I think so.
> Can we clarify items should be resolved for "ready for
> merge"?
>
> Are the followings correct?
>
> 1. Move invalid input patterns in
>    src/test/modules/test_copy_format/sql/invalid.sql to
>    src/test/regress/sql/copy.sql as much as possible.
> 2. Create
>    src/test/modules/test_copy_format/sql/test_copy_format.sql
>    and move all contents in existing *.sql to the file.
> 3. Add comments what the tests expect to
>    src/test/modules/test_copy_format/sql/test_copy_format.sql.
> 4. Remove CopyFormatOptions::{binary,csv_mode}.
Agreed with the above items.
> 5. Squash the "Support custom format" patch and the "Define
>    handler functions for built-in formats" patch.
>    * Could you do it when you push it? Or is it required for
>      "ready for merge"?
Let's keep them for now.
> 6. Use handler OID for detecting the default built-in format
>    instead of comparing the given format as string.
> 7. Update documentation.
Agreed.
>
> There are 3 unconfirmed suggested changes for tests in:
> https://www.postgresql.org/message-id/20250330.113126.433742864258096312.kou%40clear-code.com
>
> Here are my opinions for them:
>
> > 1.: There is no difference between single-quoting and
> >     double-quoting here. Because the information what quote
> >     was used for the given FORMAT value isn't remained
> >     here. Should we update gram.y?
> >
> > 2.: I don't have a strong opinion for it. If nobody objects
> >     it, I'll remove them.
> >
> > 3.: I don't have a strong opinion for it. If nobody objects
> >     it, I'll remove them.
>
> Is the 1. required for "ready for merge"? If so, is there
> any suggestion? I don't have a strong opinion for it.
>
> If there are no more opinions for 2. and 3., I'll remove
> them.
Agreed.
I think we would still need some rounds of reviews but the patch is
getting in good shape.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
			
		Hi,
I've updated the patch set. See the attached v40 patch set.
In <CAD21AoAXzwPC7jjPMTcT80hnzmPa2SUJkiqdYHweEY8sZscEMA@mail.gmail.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Wed, 23 Apr 2025 23:44:55 -0700,
  Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> Are the followings correct?
>>
>> 1. Move invalid input patterns in
>>    src/test/modules/test_copy_format/sql/invalid.sql to
>>    src/test/regress/sql/copy.sql as much as possible.
>> 2. Create
>>    src/test/modules/test_copy_format/sql/test_copy_format.sql
>>    and move all contents in existing *.sql to the file.
>> 3. Add comments what the tests expect to
>>    src/test/modules/test_copy_format/sql/test_copy_format.sql.
>> 4. Remove CopyFormatOptions::{binary,csv_mode}.
> 
> Agreed with the above items.
Done except 1. because 1. is removed by 3. in the following
list:
----
>> There are 3 unconfirmed suggested changes for tests in:
>> https://www.postgresql.org/message-id/20250330.113126.433742864258096312.kou%40clear-code.com
>>
>> Here are my opinions for them:
>>
>> > 1.: There is no difference between single-quoting and
>> >     double-quoting here. Because the information what quote
>> >     was used for the given FORMAT value isn't remained
>> >     here. Should we update gram.y?
>> >
>> > 2.: I don't have a strong opinion for it. If nobody objects
>> >     it, I'll remove them.
>> >
>> > 3.: I don't have a strong opinion for it. If nobody objects
>> >     it, I'll remove them.
----
0005 is added for 4. Could you squash 0004 ("Use copy
handler for bult-in formats") and 0005 ("Remove
CopyFormatOptions::{binary,csv_mode}") if needed when you
push?
>> 6. Use handler OID for detecting the default built-in format
>>    instead of comparing the given format as string.
Done.
>> 7. Update documentation.
Could someone help this? 0007 is the draft commit for this.
>> There are 3 unconfirmed suggested changes for tests in:
>> https://www.postgresql.org/message-id/20250330.113126.433742864258096312.kou%40clear-code.com
>>
>> Here are my opinions for them:
>>
>> > 1.: There is no difference between single-quoting and
>> >     double-quoting here. Because the information what quote
>> >     was used for the given FORMAT value isn't remained
>> >     here. Should we update gram.y?
>> >
>> > 2.: I don't have a strong opinion for it. If nobody objects
>> >     it, I'll remove them.
>> >
>> > 3.: I don't have a strong opinion for it. If nobody objects
>> >     it, I'll remove them.
>>
>> Is the 1. required for "ready for merge"? If so, is there
>> any suggestion? I don't have a strong opinion for it.
>>
>> If there are no more opinions for 2. and 3., I'll remove
>> them.
> 
> Agreed.
1.: I didn't do anything. Because there is no suggestion.
2., 3.: Done.
> I think we would still need some rounds of reviews but the patch is
> getting in good shape.
I hope that this is completed in this year...
Thanks,
-- 
kou
From a81eb07a4c92b8b34ed6fbe6610c54bb9b3bb2e4 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Mon, 25 Nov 2024 13:58:33 +0900
Subject: [PATCH v40 1/6] Export CopyDest as private data
This is a preparation to export CopyToStateData as private data.
CopyToStateData depends on CopyDest. So we need to export CopyDest
too.
But CopyDest and CopySource has the same names. So we can't export
CopyDest as-is.
This uses the COPY_DEST_ prefix for CopyDest enum values. CopySource
uses the COPY_FROM_ prefix for consistency.
---
 src/backend/commands/copyfrom.c          |  4 ++--
 src/backend/commands/copyfromparse.c     | 10 ++++----
 src/backend/commands/copyto.c            | 30 ++++++++----------------
 src/include/commands/copyfrom_internal.h |  8 +++----
 src/include/commands/copyto_internal.h   | 28 ++++++++++++++++++++++
 5 files changed, 49 insertions(+), 31 deletions(-)
 create mode 100644 src/include/commands/copyto_internal.h
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index fbbbc09a97b..b4dad744547 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1709,7 +1709,7 @@ BeginCopyFrom(ParseState *pstate,
                             pg_encoding_to_char(GetDatabaseEncoding()))));
     }
 
-    cstate->copy_src = COPY_FILE;    /* default */
+    cstate->copy_src = COPY_SOURCE_FILE;    /* default */
 
     cstate->whereClause = whereClause;
 
@@ -1837,7 +1837,7 @@ BeginCopyFrom(ParseState *pstate,
     if (data_source_cb)
     {
         progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK;
-        cstate->copy_src = COPY_CALLBACK;
+        cstate->copy_src = COPY_SOURCE_CALLBACK;
         cstate->data_source_cb = data_source_cb;
     }
     else if (pipe)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index f5fc346e201..9f7171d1478 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -180,7 +180,7 @@ ReceiveCopyBegin(CopyFromState cstate)
     for (i = 0; i < natts; i++)
         pq_sendint16(&buf, format); /* per-column formats */
     pq_endmessage(&buf);
-    cstate->copy_src = COPY_FRONTEND;
+    cstate->copy_src = COPY_SOURCE_FRONTEND;
     cstate->fe_msgbuf = makeStringInfo();
     /* We *must* flush here to ensure FE knows it can send. */
     pq_flush();
@@ -248,7 +248,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
 
     switch (cstate->copy_src)
     {
-        case COPY_FILE:
+        case COPY_SOURCE_FILE:
             bytesread = fread(databuf, 1, maxread, cstate->copy_file);
             if (ferror(cstate->copy_file))
                 ereport(ERROR,
@@ -257,7 +257,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
             if (bytesread == 0)
                 cstate->raw_reached_eof = true;
             break;
-        case COPY_FRONTEND:
+        case COPY_SOURCE_FRONTEND:
             while (maxread > 0 && bytesread < minread && !cstate->raw_reached_eof)
             {
                 int            avail;
@@ -340,7 +340,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
                 bytesread += avail;
             }
             break;
-        case COPY_CALLBACK:
+        case COPY_SOURCE_CALLBACK:
             bytesread = cstate->data_source_cb(databuf, minread, maxread);
             break;
     }
@@ -1172,7 +1172,7 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
          * after \. up to the protocol end of copy data.  (XXX maybe better
          * not to treat \. as special?)
          */
-        if (cstate->copy_src == COPY_FRONTEND)
+        if (cstate->copy_src == COPY_SOURCE_FRONTEND)
         {
             int            inbytes;
 
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index f87e405351d..d739826afbc 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -20,6 +20,7 @@
 
 #include "access/tableam.h"
 #include "commands/copyapi.h"
+#include "commands/copyto_internal.h"
 #include "commands/progress.h"
 #include "executor/execdesc.h"
 #include "executor/executor.h"
@@ -36,17 +37,6 @@
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
-/*
- * Represents the different dest cases we need to worry about at
- * the bottom level
- */
-typedef enum CopyDest
-{
-    COPY_FILE,                    /* to file (or a piped program) */
-    COPY_FRONTEND,                /* to frontend */
-    COPY_CALLBACK,                /* to callback function */
-} CopyDest;
-
 /*
  * This struct contains all the state variables used throughout a COPY TO
  * operation.
@@ -69,7 +59,7 @@ typedef struct CopyToStateData
 
     /* low-level state data */
     CopyDest    copy_dest;        /* type of copy source/destination */
-    FILE       *copy_file;        /* used if copy_dest == COPY_FILE */
+    FILE       *copy_file;        /* used if copy_dest == COPY_DEST_FILE */
     StringInfo    fe_msgbuf;        /* used for all dests during COPY TO */
 
     int            file_encoding;    /* file or remote side's character encoding */
@@ -401,7 +391,7 @@ SendCopyBegin(CopyToState cstate)
     for (i = 0; i < natts; i++)
         pq_sendint16(&buf, format); /* per-column formats */
     pq_endmessage(&buf);
-    cstate->copy_dest = COPY_FRONTEND;
+    cstate->copy_dest = COPY_DEST_FRONTEND;
 }
 
 static void
@@ -448,7 +438,7 @@ CopySendEndOfRow(CopyToState cstate)
 
     switch (cstate->copy_dest)
     {
-        case COPY_FILE:
+        case COPY_DEST_FILE:
             if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1,
                        cstate->copy_file) != 1 ||
                 ferror(cstate->copy_file))
@@ -482,11 +472,11 @@ CopySendEndOfRow(CopyToState cstate)
                              errmsg("could not write to COPY file: %m")));
             }
             break;
-        case COPY_FRONTEND:
+        case COPY_DEST_FRONTEND:
             /* Dump the accumulated row as one CopyData message */
             (void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len);
             break;
-        case COPY_CALLBACK:
+        case COPY_DEST_CALLBACK:
             cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len);
             break;
     }
@@ -507,7 +497,7 @@ CopySendTextLikeEndOfRow(CopyToState cstate)
 {
     switch (cstate->copy_dest)
     {
-        case COPY_FILE:
+        case COPY_DEST_FILE:
             /* Default line termination depends on platform */
 #ifndef WIN32
             CopySendChar(cstate, '\n');
@@ -515,7 +505,7 @@ CopySendTextLikeEndOfRow(CopyToState cstate)
             CopySendString(cstate, "\r\n");
 #endif
             break;
-        case COPY_FRONTEND:
+        case COPY_DEST_FRONTEND:
             /* The FE/BE protocol uses \n as newline for all platforms */
             CopySendChar(cstate, '\n');
             break;
@@ -903,12 +893,12 @@ BeginCopyTo(ParseState *pstate,
     /* See Multibyte encoding comment above */
     cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding);
 
-    cstate->copy_dest = COPY_FILE;    /* default */
+    cstate->copy_dest = COPY_DEST_FILE; /* default */
 
     if (data_dest_cb)
     {
         progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK;
-        cstate->copy_dest = COPY_CALLBACK;
+        cstate->copy_dest = COPY_DEST_CALLBACK;
         cstate->data_dest_cb = data_dest_cb;
     }
     else if (pipe)
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index c8b22af22d8..24157e11a73 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -24,9 +24,9 @@
  */
 typedef enum CopySource
 {
-    COPY_FILE,                    /* from file (or a piped program) */
-    COPY_FRONTEND,                /* from frontend */
-    COPY_CALLBACK,                /* from callback function */
+    COPY_SOURCE_FILE,            /* from file (or a piped program) */
+    COPY_SOURCE_FRONTEND,        /* from frontend */
+    COPY_SOURCE_CALLBACK,        /* from callback function */
 } CopySource;
 
 /*
@@ -64,7 +64,7 @@ typedef struct CopyFromStateData
     /* low-level state data */
     CopySource    copy_src;        /* type of copy source */
     FILE       *copy_file;        /* used if copy_src == COPY_FILE */
-    StringInfo    fe_msgbuf;        /* used if copy_src == COPY_FRONTEND */
+    StringInfo    fe_msgbuf;        /* used if copy_src == COPY_SOURCE_FRONTEND */
 
     EolType        eol_type;        /* EOL type of input */
     int            file_encoding;    /* file or remote side's character encoding */
diff --git a/src/include/commands/copyto_internal.h b/src/include/commands/copyto_internal.h
new file mode 100644
index 00000000000..42ddb37a8a2
--- /dev/null
+++ b/src/include/commands/copyto_internal.h
@@ -0,0 +1,28 @@
+/*-------------------------------------------------------------------------
+ *
+ * copyto_internal.h
+ *      Internal definitions for COPY TO command.
+ *
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/commands/copyto_internal.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef COPYTO_INTERNAL_H
+#define COPYTO_INTERNAL_H
+
+/*
+ * Represents the different dest cases we need to worry about at
+ * the bottom level
+ */
+typedef enum CopyDest
+{
+    COPY_DEST_FILE,                /* to file (or a piped program) */
+    COPY_DEST_FRONTEND,            /* to frontend */
+    COPY_DEST_CALLBACK,            /* to callback function */
+} CopyDest;
+
+#endif                            /* COPYTO_INTERNAL_H */
-- 
2.47.2
From 398994b555e3b508ce26fc33199bf9badbfc82d5 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Thu, 27 Mar 2025 11:14:43 +0900
Subject: [PATCH v40 2/6] Add support for adding custom COPY format
This uses the handler approach like tablesample. The approach creates
an internal function that returns an internal struct. In this case,
a handler returns a CopyToRoutine for COPY TO and a CopyFromRoutine
for COPY FROM.
Whether COPY TO or COPY FROM is passed as the "is_from" argument:
    copy_handler(true) returns CopyToRoutine
    copy_handler(false) returns CopyFromRoutine
This also add a test module for custom COPY handler.
---
 src/backend/commands/copy.c                   |  31 ++++-
 src/backend/commands/copyfrom.c               |  20 +++-
 src/backend/commands/copyto.c                 |  72 +++--------
 src/backend/nodes/Makefile                    |   1 +
 src/backend/nodes/gen_node_support.pl         |   2 +
 src/backend/utils/adt/pseudotypes.c           |   1 +
 src/include/catalog/pg_proc.dat               |   6 +
 src/include/catalog/pg_type.dat               |   6 +
 src/include/commands/copy.h                   |   3 +-
 src/include/commands/copyapi.h                |   4 +
 src/include/commands/copyto_internal.h        |  55 +++++++++
 src/include/nodes/meson.build                 |   1 +
 src/test/modules/Makefile                     |   1 +
 src/test/modules/meson.build                  |   1 +
 src/test/modules/test_copy_format/.gitignore  |   4 +
 src/test/modules/test_copy_format/Makefile    |  23 ++++
 .../expected/test_copy_format.out             | 107 +++++++++++++++++
 src/test/modules/test_copy_format/meson.build |  33 +++++
 .../test_copy_format/sql/test_copy_format.sql |  52 ++++++++
 .../test_copy_format--1.0.sql                 |  24 ++++
 .../test_copy_format/test_copy_format.c       | 113 ++++++++++++++++++
 .../test_copy_format/test_copy_format.control |   4 +
 22 files changed, 505 insertions(+), 59 deletions(-)
 mode change 100644 => 100755 src/backend/nodes/gen_node_support.pl
 create mode 100644 src/test/modules/test_copy_format/.gitignore
 create mode 100644 src/test/modules/test_copy_format/Makefile
 create mode 100644 src/test/modules/test_copy_format/expected/test_copy_format.out
 create mode 100644 src/test/modules/test_copy_format/meson.build
 create mode 100644 src/test/modules/test_copy_format/sql/test_copy_format.sql
 create mode 100644 src/test/modules/test_copy_format/test_copy_format--1.0.sql
 create mode 100644 src/test/modules/test_copy_format/test_copy_format.c
 create mode 100644 src/test/modules/test_copy_format/test_copy_format.control
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 74ae42b19a7..9515c4d5786 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -32,10 +32,12 @@
 #include "parser/parse_coerce.h"
 #include "parser/parse_collate.h"
 #include "parser/parse_expr.h"
+#include "parser/parse_func.h"
 #include "parser/parse_relation.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
+#include "utils/regproc.h"
 #include "utils/rel.h"
 #include "utils/rls.h"
 
@@ -531,10 +533,31 @@ ProcessCopyOptions(ParseState *pstate,
             else if (strcmp(fmt, "binary") == 0)
                 opts_out->binary = true;
             else
-                ereport(ERROR,
-                        (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-                         errmsg("COPY format \"%s\" not recognized", fmt),
-                         parser_errposition(pstate, defel->location)));
+            {
+                List       *qualified_format;
+                Oid            arg_types[1];
+                Oid            handler = InvalidOid;
+
+                qualified_format = stringToQualifiedNameList(fmt, NULL);
+                arg_types[0] = INTERNALOID;
+                handler = LookupFuncName(qualified_format, 1,
+                                         arg_types, true);
+                if (!OidIsValid(handler))
+                    ereport(ERROR,
+                            (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                             errmsg("COPY format \"%s\" not recognized", fmt),
+                             parser_errposition(pstate, defel->location)));
+
+                /* check that handler has correct return type */
+                if (get_func_rettype(handler) != COPY_HANDLEROID)
+                    ereport(ERROR,
+                            (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+                             errmsg("function %s must return type %s",
+                                    fmt, "copy_handler"),
+                             parser_errposition(pstate, defel->location)));
+
+                opts_out->handler = handler;
+            }
         }
         else if (strcmp(defel->defname, "freeze") == 0)
         {
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index b4dad744547..3d86e8a8328 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -129,6 +129,7 @@ static void CopyFromBinaryEnd(CopyFromState cstate);
 
 /* text format */
 static const CopyFromRoutine CopyFromRoutineText = {
+    .type = T_CopyFromRoutine,
     .CopyFromInFunc = CopyFromTextLikeInFunc,
     .CopyFromStart = CopyFromTextLikeStart,
     .CopyFromOneRow = CopyFromTextOneRow,
@@ -137,6 +138,7 @@ static const CopyFromRoutine CopyFromRoutineText = {
 
 /* CSV format */
 static const CopyFromRoutine CopyFromRoutineCSV = {
+    .type = T_CopyFromRoutine,
     .CopyFromInFunc = CopyFromTextLikeInFunc,
     .CopyFromStart = CopyFromTextLikeStart,
     .CopyFromOneRow = CopyFromCSVOneRow,
@@ -145,6 +147,7 @@ static const CopyFromRoutine CopyFromRoutineCSV = {
 
 /* binary format */
 static const CopyFromRoutine CopyFromRoutineBinary = {
+    .type = T_CopyFromRoutine,
     .CopyFromInFunc = CopyFromBinaryInFunc,
     .CopyFromStart = CopyFromBinaryStart,
     .CopyFromOneRow = CopyFromBinaryOneRow,
@@ -155,7 +158,22 @@ static const CopyFromRoutine CopyFromRoutineBinary = {
 static const CopyFromRoutine *
 CopyFromGetRoutine(const CopyFormatOptions *opts)
 {
-    if (opts->csv_mode)
+    if (OidIsValid(opts->handler))
+    {
+        Datum        datum;
+        Node       *routine;
+
+        datum = OidFunctionCall1(opts->handler, BoolGetDatum(true));
+        routine = (Node *) DatumGetPointer(datum);
+        if (routine == NULL || !IsA(routine, CopyFromRoutine))
+            ereport(ERROR,
+                    (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                     errmsg("COPY handler function %s.%s did not return CopyFromRoutine struct",
+                            get_namespace_name(get_func_namespace(opts->handler)),
+                            get_func_name(opts->handler))));
+        return castNode(CopyFromRoutine, routine);
+    }
+    else if (opts->csv_mode)
         return &CopyFromRoutineCSV;
     else if (opts->binary)
         return &CopyFromRoutineBinary;
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index d739826afbc..265b847e255 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -22,9 +22,7 @@
 #include "commands/copyapi.h"
 #include "commands/copyto_internal.h"
 #include "commands/progress.h"
-#include "executor/execdesc.h"
 #include "executor/executor.h"
-#include "executor/tuptable.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
 #include "mb/pg_wchar.h"
@@ -37,56 +35,6 @@
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
-/*
- * This struct contains all the state variables used throughout a COPY TO
- * operation.
- *
- * Multi-byte encodings: all supported client-side encodings encode multi-byte
- * characters by having the first byte's high bit set. Subsequent bytes of the
- * character can have the high bit not set. When scanning data in such an
- * encoding to look for a match to a single-byte (ie ASCII) character, we must
- * use the full pg_encoding_mblen() machinery to skip over multibyte
- * characters, else we might find a false match to a trailing byte. In
- * supported server encodings, there is no possibility of a false match, and
- * it's faster to make useless comparisons to trailing bytes than it is to
- * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true
- * when we have to do it the hard way.
- */
-typedef struct CopyToStateData
-{
-    /* format-specific routines */
-    const CopyToRoutine *routine;
-
-    /* low-level state data */
-    CopyDest    copy_dest;        /* type of copy source/destination */
-    FILE       *copy_file;        /* used if copy_dest == COPY_DEST_FILE */
-    StringInfo    fe_msgbuf;        /* used for all dests during COPY TO */
-
-    int            file_encoding;    /* file or remote side's character encoding */
-    bool        need_transcoding;    /* file encoding diff from server? */
-    bool        encoding_embeds_ascii;    /* ASCII can be non-first byte? */
-
-    /* parameters from the COPY command */
-    Relation    rel;            /* relation to copy to */
-    QueryDesc  *queryDesc;        /* executable query to copy from */
-    List       *attnumlist;        /* integer list of attnums to copy */
-    char       *filename;        /* filename, or NULL for STDOUT */
-    bool        is_program;        /* is 'filename' a program to popen? */
-    copy_data_dest_cb data_dest_cb; /* function for writing data */
-
-    CopyFormatOptions opts;
-    Node       *whereClause;    /* WHERE condition (or NULL) */
-
-    /*
-     * Working state
-     */
-    MemoryContext copycontext;    /* per-copy execution context */
-
-    FmgrInfo   *out_functions;    /* lookup info for output functions */
-    MemoryContext rowcontext;    /* per-row evaluation context */
-    uint64        bytes_processed;    /* number of bytes processed so far */
-} CopyToStateData;
-
 /* DestReceiver for COPY (query) TO */
 typedef struct
 {
@@ -140,6 +88,7 @@ static void CopySendInt16(CopyToState cstate, int16 val);
 
 /* text format */
 static const CopyToRoutine CopyToRoutineText = {
+    .type = T_CopyToRoutine,
     .CopyToStart = CopyToTextLikeStart,
     .CopyToOutFunc = CopyToTextLikeOutFunc,
     .CopyToOneRow = CopyToTextOneRow,
@@ -148,6 +97,7 @@ static const CopyToRoutine CopyToRoutineText = {
 
 /* CSV format */
 static const CopyToRoutine CopyToRoutineCSV = {
+    .type = T_CopyToRoutine,
     .CopyToStart = CopyToTextLikeStart,
     .CopyToOutFunc = CopyToTextLikeOutFunc,
     .CopyToOneRow = CopyToCSVOneRow,
@@ -156,6 +106,7 @@ static const CopyToRoutine CopyToRoutineCSV = {
 
 /* binary format */
 static const CopyToRoutine CopyToRoutineBinary = {
+    .type = T_CopyToRoutine,
     .CopyToStart = CopyToBinaryStart,
     .CopyToOutFunc = CopyToBinaryOutFunc,
     .CopyToOneRow = CopyToBinaryOneRow,
@@ -166,7 +117,22 @@ static const CopyToRoutine CopyToRoutineBinary = {
 static const CopyToRoutine *
 CopyToGetRoutine(const CopyFormatOptions *opts)
 {
-    if (opts->csv_mode)
+    if (OidIsValid(opts->handler))
+    {
+        Datum        datum;
+        Node       *routine;
+
+        datum = OidFunctionCall1(opts->handler, BoolGetDatum(false));
+        routine = (Node *) DatumGetPointer(datum);
+        if (routine == NULL || !IsA(routine, CopyToRoutine))
+            ereport(ERROR,
+                    (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                     errmsg("COPY handler function %s.%s did not return CopyToRoutine struct",
+                            get_namespace_name(get_func_namespace(opts->handler)),
+                            get_func_name(opts->handler))));
+        return castNode(CopyToRoutine, routine);
+    }
+    else if (opts->csv_mode)
         return &CopyToRoutineCSV;
     else if (opts->binary)
         return &CopyToRoutineBinary;
diff --git a/src/backend/nodes/Makefile b/src/backend/nodes/Makefile
index 77ddb9ca53f..dc6c1087361 100644
--- a/src/backend/nodes/Makefile
+++ b/src/backend/nodes/Makefile
@@ -50,6 +50,7 @@ node_headers = \
     access/sdir.h \
     access/tableam.h \
     access/tsmapi.h \
+    commands/copyapi.h \
     commands/event_trigger.h \
     commands/trigger.h \
     executor/tuptable.h \
diff --git a/src/backend/nodes/gen_node_support.pl b/src/backend/nodes/gen_node_support.pl
old mode 100644
new mode 100755
index 77659b0f760..d688bbea3a0
--- a/src/backend/nodes/gen_node_support.pl
+++ b/src/backend/nodes/gen_node_support.pl
@@ -62,6 +62,7 @@ my @all_input_files = qw(
   access/sdir.h
   access/tableam.h
   access/tsmapi.h
+  commands/copyapi.h
   commands/event_trigger.h
   commands/trigger.h
   executor/tuptable.h
@@ -86,6 +87,7 @@ my @nodetag_only_files = qw(
   access/sdir.h
   access/tableam.h
   access/tsmapi.h
+  commands/copyapi.h
   commands/event_trigger.h
   commands/trigger.h
   executor/tuptable.h
diff --git a/src/backend/utils/adt/pseudotypes.c b/src/backend/utils/adt/pseudotypes.c
index 317a1f2b282..f2ebc21ca56 100644
--- a/src/backend/utils/adt/pseudotypes.c
+++ b/src/backend/utils/adt/pseudotypes.c
@@ -370,6 +370,7 @@ PSEUDOTYPE_DUMMY_IO_FUNCS(fdw_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(table_am_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(index_am_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(tsm_handler);
+PSEUDOTYPE_DUMMY_IO_FUNCS(copy_handler);
 PSEUDOTYPE_DUMMY_IO_FUNCS(internal);
 PSEUDOTYPE_DUMMY_IO_FUNCS(anyelement);
 PSEUDOTYPE_DUMMY_IO_FUNCS(anynonarray);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 62beb71da28..ba46bfa48a8 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -7888,6 +7888,12 @@
 { oid => '3312', descr => 'I/O',
   proname => 'tsm_handler_out', prorettype => 'cstring',
   proargtypes => 'tsm_handler', prosrc => 'tsm_handler_out' },
+{ oid => '8753', descr => 'I/O',
+  proname => 'copy_handler_in', proisstrict => 'f', prorettype => 'copy_handler',
+  proargtypes => 'cstring', prosrc => 'copy_handler_in' },
+{ oid => '8754', descr => 'I/O',
+  proname => 'copy_handler_out', prorettype => 'cstring',
+  proargtypes => 'copy_handler', prosrc => 'copy_handler_out' },
 { oid => '267', descr => 'I/O',
   proname => 'table_am_handler_in', proisstrict => 'f',
   prorettype => 'table_am_handler', proargtypes => 'cstring',
diff --git a/src/include/catalog/pg_type.dat b/src/include/catalog/pg_type.dat
index 6dca77e0a22..bddf9fb4fbe 100644
--- a/src/include/catalog/pg_type.dat
+++ b/src/include/catalog/pg_type.dat
@@ -633,6 +633,12 @@
   typcategory => 'P', typinput => 'tsm_handler_in',
   typoutput => 'tsm_handler_out', typreceive => '-', typsend => '-',
   typalign => 'i' },
+{ oid => '8752',
+  descr => 'pseudo-type for the result of a COPY TO/FROM handler function',
+  typname => 'copy_handler', typlen => '4', typbyval => 't', typtype => 'p',
+  typcategory => 'P', typinput => 'copy_handler_in',
+  typoutput => 'copy_handler_out', typreceive => '-', typsend => '-',
+  typalign => 'i' },
 { oid => '269',
   descr => 'pseudo-type for the result of a table AM handler function',
   typname => 'table_am_handler', typlen => '4', typbyval => 't', typtype => 'p',
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 06dfdfef721..6df1f8a3b9b 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -87,9 +87,10 @@ typedef struct CopyFormatOptions
     CopyLogVerbosityChoice log_verbosity;    /* verbosity of logged messages */
     int64        reject_limit;    /* maximum tolerable number of errors */
     List       *convert_select; /* list of column names (can be NIL) */
+    Oid            handler;        /* handler function for custom format routine */
 } CopyFormatOptions;
 
-/* These are private in commands/copy[from|to].c */
+/* These are private in commands/copy[from|to]_internal.h */
 typedef struct CopyFromStateData *CopyFromState;
 typedef struct CopyToStateData *CopyToState;
 
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 2a2d2f9876b..53ad3337f86 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -22,6 +22,8 @@
  */
 typedef struct CopyToRoutine
 {
+    NodeTag        type;
+
     /*
      * Set output function information. This callback is called once at the
      * beginning of COPY TO.
@@ -60,6 +62,8 @@ typedef struct CopyToRoutine
  */
 typedef struct CopyFromRoutine
 {
+    NodeTag        type;
+
     /*
      * Set input function information. This callback is called once at the
      * beginning of COPY FROM.
diff --git a/src/include/commands/copyto_internal.h b/src/include/commands/copyto_internal.h
index 42ddb37a8a2..da796131988 100644
--- a/src/include/commands/copyto_internal.h
+++ b/src/include/commands/copyto_internal.h
@@ -14,6 +14,11 @@
 #ifndef COPYTO_INTERNAL_H
 #define COPYTO_INTERNAL_H
 
+#include "commands/copy.h"
+#include "executor/execdesc.h"
+#include "executor/tuptable.h"
+#include "nodes/execnodes.h"
+
 /*
  * Represents the different dest cases we need to worry about at
  * the bottom level
@@ -25,4 +30,54 @@ typedef enum CopyDest
     COPY_DEST_CALLBACK,            /* to callback function */
 } CopyDest;
 
+/*
+ * This struct contains all the state variables used throughout a COPY TO
+ * operation.
+ *
+ * Multi-byte encodings: all supported client-side encodings encode multi-byte
+ * characters by having the first byte's high bit set. Subsequent bytes of the
+ * character can have the high bit not set. When scanning data in such an
+ * encoding to look for a match to a single-byte (ie ASCII) character, we must
+ * use the full pg_encoding_mblen() machinery to skip over multibyte
+ * characters, else we might find a false match to a trailing byte. In
+ * supported server encodings, there is no possibility of a false match, and
+ * it's faster to make useless comparisons to trailing bytes than it is to
+ * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true
+ * when we have to do it the hard way.
+ */
+typedef struct CopyToStateData
+{
+    /* format-specific routines */
+    const CopyToRoutine *routine;
+
+    /* low-level state data */
+    CopyDest    copy_dest;        /* type of copy source/destination */
+    FILE       *copy_file;        /* used if copy_dest == COPY_DEST_FILE */
+    StringInfo    fe_msgbuf;        /* used for all dests during COPY TO */
+
+    int            file_encoding;    /* file or remote side's character encoding */
+    bool        need_transcoding;    /* file encoding diff from server? */
+    bool        encoding_embeds_ascii;    /* ASCII can be non-first byte? */
+
+    /* parameters from the COPY command */
+    Relation    rel;            /* relation to copy to */
+    QueryDesc  *queryDesc;        /* executable query to copy from */
+    List       *attnumlist;        /* integer list of attnums to copy */
+    char       *filename;        /* filename, or NULL for STDOUT */
+    bool        is_program;        /* is 'filename' a program to popen? */
+    copy_data_dest_cb data_dest_cb; /* function for writing data */
+
+    CopyFormatOptions opts;
+    Node       *whereClause;    /* WHERE condition (or NULL) */
+
+    /*
+     * Working state
+     */
+    MemoryContext copycontext;    /* per-copy execution context */
+
+    FmgrInfo   *out_functions;    /* lookup info for output functions */
+    MemoryContext rowcontext;    /* per-row evaluation context */
+    uint64        bytes_processed;    /* number of bytes processed so far */
+} CopyToStateData;
+
 #endif                            /* COPYTO_INTERNAL_H */
diff --git a/src/include/nodes/meson.build b/src/include/nodes/meson.build
index d1ca24dd32f..96e70e7f38b 100644
--- a/src/include/nodes/meson.build
+++ b/src/include/nodes/meson.build
@@ -12,6 +12,7 @@ node_support_input_i = [
   'access/sdir.h',
   'access/tableam.h',
   'access/tsmapi.h',
+  'commands/copyapi.h',
   'commands/event_trigger.h',
   'commands/trigger.h',
   'executor/tuptable.h',
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index aa1d27bbed3..9bf5d58cdae 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -17,6 +17,7 @@ SUBDIRS = \
           test_aio \
           test_bloomfilter \
           test_copy_callbacks \
+          test_copy_format \
           test_custom_rmgrs \
           test_ddl_deparse \
           test_dsa \
diff --git a/src/test/modules/meson.build b/src/test/modules/meson.build
index 9de0057bd1d..5fd06de2737 100644
--- a/src/test/modules/meson.build
+++ b/src/test/modules/meson.build
@@ -16,6 +16,7 @@ subdir('ssl_passphrase_callback')
 subdir('test_aio')
 subdir('test_bloomfilter')
 subdir('test_copy_callbacks')
+subdir('test_copy_format')
 subdir('test_custom_rmgrs')
 subdir('test_ddl_deparse')
 subdir('test_dsa')
diff --git a/src/test/modules/test_copy_format/.gitignore b/src/test/modules/test_copy_format/.gitignore
new file mode 100644
index 00000000000..5dcb3ff9723
--- /dev/null
+++ b/src/test/modules/test_copy_format/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/src/test/modules/test_copy_format/Makefile b/src/test/modules/test_copy_format/Makefile
new file mode 100644
index 00000000000..8497f91624d
--- /dev/null
+++ b/src/test/modules/test_copy_format/Makefile
@@ -0,0 +1,23 @@
+# src/test/modules/test_copy_format/Makefile
+
+MODULE_big = test_copy_format
+OBJS = \
+    $(WIN32RES) \
+    test_copy_format.o
+PGFILEDESC = "test_copy_format - test custom COPY FORMAT"
+
+EXTENSION = test_copy_format
+DATA = test_copy_format--1.0.sql
+
+REGRESS = test_copy_format
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_copy_format
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out
b/src/test/modules/test_copy_format/expected/test_copy_format.out
new file mode 100644
index 00000000000..3916b766615
--- /dev/null
+++ b/src/test/modules/test_copy_format/expected/test_copy_format.out
@@ -0,0 +1,107 @@
+CREATE TABLE copy_data (a smallint, b integer, c bigint);
+INSERT INTO copy_data VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+-- No WITH SCHEMA. It installs custom COPY handlers to the current
+-- schema.
+CREATE EXTENSION test_copy_format;
+-- We can find a custom COPY handler without schema.
+COPY copy_data FROM stdin WITH (FORMAT 'test_copy_format');
+NOTICE:  test_copy_format: is_from=true
+NOTICE:  CopyFromInFunc: attribute: smallint
+NOTICE:  CopyFromInFunc: attribute: integer
+NOTICE:  CopyFromInFunc: attribute: bigint
+NOTICE:  CopyFromStart: the number of attributes: 3
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromEnd
+COPY copy_data TO stdout WITH (FORMAT 'test_copy_format');
+NOTICE:  test_copy_format: is_from=false
+NOTICE:  CopyToOutFunc: attribute: smallint
+NOTICE:  CopyToOutFunc: attribute: integer
+NOTICE:  CopyToOutFunc: attribute: bigint
+NOTICE:  CopyToStart: the number of attributes: 3
+NOTICE:  CopyToOneRow: the number of valid values: 3
+NOTICE:  CopyToOneRow: the number of valid values: 3
+NOTICE:  CopyToOneRow: the number of valid values: 3
+NOTICE:  CopyToEnd
+DROP EXTENSION test_copy_format;
+-- Install custom COPY handlers to a schema that isn't included in
+-- search_path.
+CREATE SCHEMA test_schema;
+CREATE EXTENSION test_copy_format WITH SCHEMA test_schema;
+-- We can find a custom COPY handler by qualified name.
+COPY copy_data FROM stdin WITH (FORMAT 'test_schema.test_copy_format');
+NOTICE:  test_copy_format: is_from=true
+NOTICE:  CopyFromInFunc: attribute: smallint
+NOTICE:  CopyFromInFunc: attribute: integer
+NOTICE:  CopyFromInFunc: attribute: bigint
+NOTICE:  CopyFromStart: the number of attributes: 3
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromEnd
+COPY copy_data TO stdout WITH (FORMAT 'test_schema.test_copy_format');
+NOTICE:  test_copy_format: is_from=false
+NOTICE:  CopyToOutFunc: attribute: smallint
+NOTICE:  CopyToOutFunc: attribute: integer
+NOTICE:  CopyToOutFunc: attribute: bigint
+NOTICE:  CopyToStart: the number of attributes: 3
+NOTICE:  CopyToOneRow: the number of valid values: 3
+NOTICE:  CopyToOneRow: the number of valid values: 3
+NOTICE:  CopyToOneRow: the number of valid values: 3
+NOTICE:  CopyToEnd
+-- We can't find a custom COPY handler without schema when search_path
+-- doesn't include the schema where we installed custom COPY handlers.
+COPY copy_data FROM stdin WITH (FORMAT 'test_copy_format');
+ERROR:  COPY format "test_copy_format" not recognized
+LINE 1: COPY copy_data FROM stdin WITH (FORMAT 'test_copy_format');
+                                        ^
+COPY copy_data TO stdout WITH (FORMAT 'test_copy_format');
+ERROR:  COPY format "test_copy_format" not recognized
+LINE 1: COPY copy_data TO stdout WITH (FORMAT 'test_copy_format');
+                                       ^
+-- We can find a custom COPY handler without schema when search_path
+-- includes the schema where we installed custom COPY handlers.
+SET search_path = test_schema,public;
+COPY copy_data FROM stdin WITH (FORMAT 'test_copy_format');
+NOTICE:  test_copy_format: is_from=true
+NOTICE:  CopyFromInFunc: attribute: smallint
+NOTICE:  CopyFromInFunc: attribute: integer
+NOTICE:  CopyFromInFunc: attribute: bigint
+NOTICE:  CopyFromStart: the number of attributes: 3
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromEnd
+COPY copy_data TO stdout WITH (FORMAT 'test_copy_format');
+NOTICE:  test_copy_format: is_from=false
+NOTICE:  CopyToOutFunc: attribute: smallint
+NOTICE:  CopyToOutFunc: attribute: integer
+NOTICE:  CopyToOutFunc: attribute: bigint
+NOTICE:  CopyToStart: the number of attributes: 3
+NOTICE:  CopyToOneRow: the number of valid values: 3
+NOTICE:  CopyToOneRow: the number of valid values: 3
+NOTICE:  CopyToOneRow: the number of valid values: 3
+NOTICE:  CopyToEnd
+RESET search_path;
+-- Invalid cases with qualified name.
+-- Input type is wrong
+COPY copy_data FROM stdin WITH (FORMAT 'test_schema.test_copy_format_wrong_input_type');
+ERROR:  COPY format "test_schema.test_copy_format_wrong_input_type" not recognized
+LINE 1: COPY copy_data FROM stdin WITH (FORMAT 'test_schema.test_cop...
+                                        ^
+COPY copy_data TO stdout WITH (FORMAT 'test_schema.test_copy_format_wrong_input_type');
+ERROR:  COPY format "test_schema.test_copy_format_wrong_input_type" not recognized
+LINE 1: COPY copy_data TO stdout WITH (FORMAT 'test_schema.test_copy...
+                                       ^
+-- Return type is wrong
+COPY copy_data FROM stdin WITH (FORMAT 'test_schema.test_copy_format_wrong_return_type');
+ERROR:  function test_schema.test_copy_format_wrong_return_type must return type copy_handler
+LINE 1: COPY copy_data FROM stdin WITH (FORMAT 'test_schema.test_cop...
+                                        ^
+COPY copy_data TO stdout WITH (FORMAT 'test_schema.test_copy_format_wrong_return_type');
+ERROR:  function test_schema.test_copy_format_wrong_return_type must return type copy_handler
+LINE 1: COPY copy_data TO stdout WITH (FORMAT 'test_schema.test_copy...
+                                       ^
+-- Returned value is wrong
+COPY copy_data FROM stdin WITH (FORMAT 'test_schema.test_copy_format_wrong_return_value');
+ERROR:  COPY handler function test_schema.test_copy_format_wrong_return_value did not return CopyFromRoutine struct
+COPY copy_data TO stdout WITH (FORMAT 'test_schema.test_copy_format_wrong_return_value');
+ERROR:  COPY handler function test_schema.test_copy_format_wrong_return_value did not return CopyToRoutine struct
+DROP TABLE copy_data;
+DROP EXTENSION test_copy_format;
+DROP SCHEMA test_schema;
diff --git a/src/test/modules/test_copy_format/meson.build b/src/test/modules/test_copy_format/meson.build
new file mode 100644
index 00000000000..a45a2e0a039
--- /dev/null
+++ b/src/test/modules/test_copy_format/meson.build
@@ -0,0 +1,33 @@
+# Copyright (c) 2025, PostgreSQL Global Development Group
+
+test_copy_format_sources = files(
+  'test_copy_format.c',
+)
+
+if host_system == 'windows'
+  test_copy_format_sources += rc_lib_gen.process(win32ver_rc, extra_args: [
+    '--NAME', 'test_copy_format',
+    '--FILEDESC', 'test_copy_format - test custom COPY FORMAT',])
+endif
+
+test_copy_format = shared_module('test_copy_format',
+  test_copy_format_sources,
+  kwargs: pg_test_mod_args,
+)
+test_install_libs += test_copy_format
+
+test_install_data += files(
+  'test_copy_format.control',
+  'test_copy_format--1.0.sql',
+)
+
+tests += {
+  'name': 'test_copy_format',
+  'sd': meson.current_source_dir(),
+  'bd': meson.current_build_dir(),
+  'regress': {
+    'sql': [
+      'test_copy_format',
+    ],
+  },
+}
diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql
b/src/test/modules/test_copy_format/sql/test_copy_format.sql
new file mode 100644
index 00000000000..b262794f878
--- /dev/null
+++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql
@@ -0,0 +1,52 @@
+CREATE TABLE copy_data (a smallint, b integer, c bigint);
+INSERT INTO copy_data VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+
+-- No WITH SCHEMA. It installs custom COPY handlers to the current
+-- schema.
+CREATE EXTENSION test_copy_format;
+-- We can find a custom COPY handler without schema.
+COPY copy_data FROM stdin WITH (FORMAT 'test_copy_format');
+\.
+COPY copy_data TO stdout WITH (FORMAT 'test_copy_format');
+DROP EXTENSION test_copy_format;
+
+
+-- Install custom COPY handlers to a schema that isn't included in
+-- search_path.
+CREATE SCHEMA test_schema;
+CREATE EXTENSION test_copy_format WITH SCHEMA test_schema;
+
+-- We can find a custom COPY handler by qualified name.
+COPY copy_data FROM stdin WITH (FORMAT 'test_schema.test_copy_format');
+\.
+COPY copy_data TO stdout WITH (FORMAT 'test_schema.test_copy_format');
+
+-- We can't find a custom COPY handler without schema when search_path
+-- doesn't include the schema where we installed custom COPY handlers.
+COPY copy_data FROM stdin WITH (FORMAT 'test_copy_format');
+COPY copy_data TO stdout WITH (FORMAT 'test_copy_format');
+
+-- We can find a custom COPY handler without schema when search_path
+-- includes the schema where we installed custom COPY handlers.
+SET search_path = test_schema,public;
+COPY copy_data FROM stdin WITH (FORMAT 'test_copy_format');
+\.
+COPY copy_data TO stdout WITH (FORMAT 'test_copy_format');
+RESET search_path;
+
+-- Invalid cases with qualified name.
+
+-- Input type is wrong
+COPY copy_data FROM stdin WITH (FORMAT 'test_schema.test_copy_format_wrong_input_type');
+COPY copy_data TO stdout WITH (FORMAT 'test_schema.test_copy_format_wrong_input_type');
+-- Return type is wrong
+COPY copy_data FROM stdin WITH (FORMAT 'test_schema.test_copy_format_wrong_return_type');
+COPY copy_data TO stdout WITH (FORMAT 'test_schema.test_copy_format_wrong_return_type');
+-- Returned value is wrong
+COPY copy_data FROM stdin WITH (FORMAT 'test_schema.test_copy_format_wrong_return_value');
+COPY copy_data TO stdout WITH (FORMAT 'test_schema.test_copy_format_wrong_return_value');
+
+
+DROP TABLE copy_data;
+DROP EXTENSION test_copy_format;
+DROP SCHEMA test_schema;
diff --git a/src/test/modules/test_copy_format/test_copy_format--1.0.sql
b/src/test/modules/test_copy_format/test_copy_format--1.0.sql
new file mode 100644
index 00000000000..c1a137181f8
--- /dev/null
+++ b/src/test/modules/test_copy_format/test_copy_format--1.0.sql
@@ -0,0 +1,24 @@
+/* src/test/modules/test_copy_format/test_copy_format--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_copy_format" to load this file. \quit
+
+CREATE FUNCTION test_copy_format(internal)
+    RETURNS copy_handler
+    AS 'MODULE_PATHNAME', 'test_copy_format'
+    LANGUAGE C;
+
+CREATE FUNCTION test_copy_format_wrong_input_type(bool)
+    RETURNS copy_handler
+    AS 'MODULE_PATHNAME', 'test_copy_format'
+    LANGUAGE C;
+
+CREATE FUNCTION test_copy_format_wrong_return_type(internal)
+    RETURNS bool
+    AS 'MODULE_PATHNAME', 'test_copy_format'
+    LANGUAGE C;
+
+CREATE FUNCTION test_copy_format_wrong_return_value(internal)
+    RETURNS copy_handler
+    AS 'MODULE_PATHNAME', 'test_copy_format_wrong_return_value'
+    LANGUAGE C;
diff --git a/src/test/modules/test_copy_format/test_copy_format.c
b/src/test/modules/test_copy_format/test_copy_format.c
new file mode 100644
index 00000000000..1d754201336
--- /dev/null
+++ b/src/test/modules/test_copy_format/test_copy_format.c
@@ -0,0 +1,113 @@
+/*--------------------------------------------------------------------------
+ *
+ * test_copy_format.c
+ *        Code for testing custom COPY format.
+ *
+ * Portions Copyright (c) 2025, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *        src/test/modules/test_copy_format/test_copy_format.c
+ *
+ * -------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "commands/copyapi.h"
+#include "commands/defrem.h"
+#include "utils/builtins.h"
+
+PG_MODULE_MAGIC;
+
+static void
+TestCopyFromInFunc(CopyFromState cstate, Oid atttypid,
+                   FmgrInfo *finfo, Oid *typioparam)
+{
+    ereport(NOTICE, (errmsg("CopyFromInFunc: attribute: %s", format_type_be(atttypid))));
+}
+
+static void
+TestCopyFromStart(CopyFromState cstate, TupleDesc tupDesc)
+{
+    ereport(NOTICE, (errmsg("CopyFromStart: the number of attributes: %d", tupDesc->natts)));
+}
+
+static bool
+TestCopyFromOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls)
+{
+    ereport(NOTICE, (errmsg("CopyFromOneRow")));
+    return false;
+}
+
+static void
+TestCopyFromEnd(CopyFromState cstate)
+{
+    ereport(NOTICE, (errmsg("CopyFromEnd")));
+}
+
+static const CopyFromRoutine CopyFromRoutineTestCopyFormat = {
+    .type = T_CopyFromRoutine,
+    .CopyFromInFunc = TestCopyFromInFunc,
+    .CopyFromStart = TestCopyFromStart,
+    .CopyFromOneRow = TestCopyFromOneRow,
+    .CopyFromEnd = TestCopyFromEnd,
+};
+
+static void
+TestCopyToOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo)
+{
+    ereport(NOTICE, (errmsg("CopyToOutFunc: attribute: %s", format_type_be(atttypid))));
+}
+
+static void
+TestCopyToStart(CopyToState cstate, TupleDesc tupDesc)
+{
+    ereport(NOTICE, (errmsg("CopyToStart: the number of attributes: %d", tupDesc->natts)));
+}
+
+static void
+TestCopyToOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+    ereport(NOTICE, (errmsg("CopyToOneRow: the number of valid values: %u", slot->tts_nvalid)));
+}
+
+static void
+TestCopyToEnd(CopyToState cstate)
+{
+    ereport(NOTICE, (errmsg("CopyToEnd")));
+}
+
+static const CopyToRoutine CopyToRoutineTestCopyFormat = {
+    .type = T_CopyToRoutine,
+    .CopyToOutFunc = TestCopyToOutFunc,
+    .CopyToStart = TestCopyToStart,
+    .CopyToOneRow = TestCopyToOneRow,
+    .CopyToEnd = TestCopyToEnd,
+};
+
+PG_FUNCTION_INFO_V1(test_copy_format);
+Datum
+test_copy_format(PG_FUNCTION_ARGS)
+{
+    bool        is_from = PG_GETARG_BOOL(0);
+
+    ereport(NOTICE,
+            (errmsg("test_copy_format: is_from=%s", is_from ? "true" : "false")));
+
+    if (is_from)
+        PG_RETURN_POINTER(&CopyFromRoutineTestCopyFormat);
+    else
+        PG_RETURN_POINTER(&CopyToRoutineTestCopyFormat);
+}
+
+PG_FUNCTION_INFO_V1(test_copy_format_wrong_return_value);
+Datum
+test_copy_format_wrong_return_value(PG_FUNCTION_ARGS)
+{
+    bool        is_from = PG_GETARG_BOOL(0);
+
+    if (is_from)
+        PG_RETURN_CSTRING(pstrdup("is_from=true"));
+    else
+        PG_RETURN_CSTRING(pstrdup("is_from=false"));
+}
diff --git a/src/test/modules/test_copy_format/test_copy_format.control
b/src/test/modules/test_copy_format/test_copy_format.control
new file mode 100644
index 00000000000..f05a6362358
--- /dev/null
+++ b/src/test/modules/test_copy_format/test_copy_format.control
@@ -0,0 +1,4 @@
+comment = 'Test code for custom COPY format'
+default_version = '1.0'
+module_pathname = '$libdir/test_copy_format'
+relocatable = true
-- 
2.47.2
From 18618368721678d78934251ff8243705013458f0 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Thu, 27 Mar 2025 11:24:15 +0900
Subject: [PATCH v40 3/6] Add support for implementing custom COPY handler as
 extension
* TO: Add CopyToStateData::opaque that can be used to keep
  data for custom COPY TO handler implementation
* TO: Export CopySendEndOfRow() to send end of row data as
  CopyToStateFlush()
* FROM: Add CopyFromStateData::opaque that can be used to
  keep data for custom COPY FROM handler implementation
* FROM: Export CopyGetData() to get the next data as
  CopyFromStateGetData()
* FROM: Add CopyFromSkipErrorRow() for "ON_ERROR stop" and
  "LOG_VERBOSITY verbose"
COPY FROM extensions must call CopyFromSkipErrorRow() when
CopyFromOneRow callback reports an error by
errsave(). CopyFromSkipErrorRow() handles "ON_ERROR stop" and
"LOG_VERBOSITY verbose" cases.
---
 src/backend/commands/copyfromparse.c          | 93 ++++++++++++-------
 src/backend/commands/copyto.c                 | 12 +++
 src/include/commands/copyapi.h                |  6 ++
 src/include/commands/copyfrom_internal.h      |  3 +
 src/include/commands/copyto_internal.h        |  3 +
 .../expected/test_copy_format.out             | 50 ++++++++++
 .../test_copy_format/sql/test_copy_format.sql | 35 +++++++
 .../test_copy_format/test_copy_format.c       | 80 +++++++++++++++-
 8 files changed, 245 insertions(+), 37 deletions(-)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 9f7171d1478..de68b53b000 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -739,6 +739,17 @@ CopyReadBinaryData(CopyFromState cstate, char *dest, int nbytes)
     return copied_bytes;
 }
 
+/*
+ * Export CopyGetData() for extensions. We want to keep CopyGetData() as a
+ * static function for optimization. CopyGetData() calls in this file may be
+ * optimized by a compiler.
+ */
+int
+CopyFromStateGetData(CopyFromState cstate, void *dest, int minread, int maxread)
+{
+    return CopyGetData(cstate, dest, minread, maxread);
+}
+
 /*
  * This function is exposed for use by extensions that read raw fields in the
  * next line. See NextCopyFromRawFieldsInternal() for details.
@@ -927,6 +938,51 @@ CopyFromCSVOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values,
     return CopyFromTextLikeOneRow(cstate, econtext, values, nulls, true);
 }
 
+/*
+ * Call this when you report an error by errsave() in your CopyFromOneRow
+ * callback. This handles "ON_ERROR stop" and "LOG_VERBOSITY verbose" cases
+ * for you.
+ */
+void
+CopyFromSkipErrorRow(CopyFromState cstate)
+{
+    Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
+
+    cstate->num_errors++;
+
+    if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+    {
+        /*
+         * Since we emit line number and column info in the below notice
+         * message, we suppress error context information other than the
+         * relation name.
+         */
+        Assert(!cstate->relname_only);
+        cstate->relname_only = true;
+
+        if (cstate->cur_attval)
+        {
+            char       *attval;
+
+            attval = CopyLimitPrintoutLength(cstate->cur_attval);
+            ereport(NOTICE,
+                    errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\":
\"%s\"",
+                           cstate->cur_lineno,
+                           cstate->cur_attname,
+                           attval));
+            pfree(attval);
+        }
+        else
+            ereport(NOTICE,
+                    errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column \"%s\": null
input",
+                           cstate->cur_lineno,
+                           cstate->cur_attname));
+
+        /* reset relname_only */
+        cstate->relname_only = false;
+    }
+}
+
 /*
  * Workhorse for CopyFromTextOneRow() and CopyFromCSVOneRow().
  *
@@ -1033,42 +1089,7 @@ CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext,
                                         (Node *) cstate->escontext,
                                         &values[m]))
         {
-            Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
-
-            cstate->num_errors++;
-
-            if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
-            {
-                /*
-                 * Since we emit line number and column info in the below
-                 * notice message, we suppress error context information other
-                 * than the relation name.
-                 */
-                Assert(!cstate->relname_only);
-                cstate->relname_only = true;
-
-                if (cstate->cur_attval)
-                {
-                    char       *attval;
-
-                    attval = CopyLimitPrintoutLength(cstate->cur_attval);
-                    ereport(NOTICE,
-                            errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column
\"%s\":\"%s\"",
 
-                                   cstate->cur_lineno,
-                                   cstate->cur_attname,
-                                   attval));
-                    pfree(attval);
-                }
-                else
-                    ereport(NOTICE,
-                            errmsg("skipping row due to data type incompatibility at line %" PRIu64 " for column
\"%s\":null input",
 
-                                   cstate->cur_lineno,
-                                   cstate->cur_attname));
-
-                /* reset relname_only */
-                cstate->relname_only = false;
-            }
-
+            CopyFromSkipErrorRow(cstate);
             return true;
         }
 
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 265b847e255..d6fcfdfb9b1 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -454,6 +454,18 @@ CopySendEndOfRow(CopyToState cstate)
     resetStringInfo(fe_msgbuf);
 }
 
+/*
+ * Export CopySendEndOfRow() for extensions. We want to keep
+ * CopySendEndOfRow() as a static function for
+ * optimization. CopySendEndOfRow() calls in this file may be optimized by a
+ * compiler.
+ */
+void
+CopyToStateFlush(CopyToState cstate)
+{
+    CopySendEndOfRow(cstate);
+}
+
 /*
  * Wrapper function of CopySendEndOfRow for text and CSV formats. Sends the
  * line termination and do common appropriate things for the end of row.
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 53ad3337f86..500ece7d5bb 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -56,6 +56,8 @@ typedef struct CopyToRoutine
     void        (*CopyToEnd) (CopyToState cstate);
 } CopyToRoutine;
 
+extern void CopyToStateFlush(CopyToState cstate);
+
 /*
  * API structure for a COPY FROM format implementation. Note this must be
  * allocated in a server-lifetime manner, typically as a static const struct.
@@ -106,4 +108,8 @@ typedef struct CopyFromRoutine
     void        (*CopyFromEnd) (CopyFromState cstate);
 } CopyFromRoutine;
 
+extern int    CopyFromStateGetData(CopyFromState cstate, void *dest, int minread, int maxread);
+
+extern void CopyFromSkipErrorRow(CopyFromState cstate);
+
 #endif                            /* COPYAPI_H */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 24157e11a73..f9e27152313 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -181,6 +181,9 @@ typedef struct CopyFromStateData
 #define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index)
 
     uint64        bytes_processed;    /* number of bytes processed so far */
+
+    /* For custom format implementation */
+    void       *opaque;            /* private space */
 } CopyFromStateData;
 
 extern void ReceiveCopyBegin(CopyFromState cstate);
diff --git a/src/include/commands/copyto_internal.h b/src/include/commands/copyto_internal.h
index da796131988..3bd9d702bf0 100644
--- a/src/include/commands/copyto_internal.h
+++ b/src/include/commands/copyto_internal.h
@@ -78,6 +78,9 @@ typedef struct CopyToStateData
     FmgrInfo   *out_functions;    /* lookup info for output functions */
     MemoryContext rowcontext;    /* per-row evaluation context */
     uint64        bytes_processed;    /* number of bytes processed so far */
+
+    /* For custom format implementation */
+    void       *opaque;            /* private space */
 } CopyToStateData;
 
 #endif                            /* COPYTO_INTERNAL_H */
diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out
b/src/test/modules/test_copy_format/expected/test_copy_format.out
index 3916b766615..47a875f0ab1 100644
--- a/src/test/modules/test_copy_format/expected/test_copy_format.out
+++ b/src/test/modules/test_copy_format/expected/test_copy_format.out
@@ -4,6 +4,8 @@ INSERT INTO copy_data VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
 -- schema.
 CREATE EXTENSION test_copy_format;
 -- We can find a custom COPY handler without schema.
+-- 987 is accepted.
+-- 654 is a hard error because ON_ERROR is stop by default.
 COPY copy_data FROM stdin WITH (FORMAT 'test_copy_format');
 NOTICE:  test_copy_format: is_from=true
 NOTICE:  CopyFromInFunc: attribute: smallint
@@ -11,7 +13,50 @@ NOTICE:  CopyFromInFunc: attribute: integer
 NOTICE:  CopyFromInFunc: attribute: bigint
 NOTICE:  CopyFromStart: the number of attributes: 3
 NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromOneRow
+ERROR:  invalid value: "6"
+CONTEXT:  COPY copy_data, line 2, column a: "6"
+-- 987 is accepted.
+-- 654 is a soft error because ON_ERROR is ignore.
+COPY copy_data FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore);
+NOTICE:  test_copy_format: is_from=true
+NOTICE:  CopyFromInFunc: attribute: smallint
+NOTICE:  CopyFromInFunc: attribute: integer
+NOTICE:  CopyFromInFunc: attribute: bigint
+NOTICE:  CopyFromStart: the number of attributes: 3
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromOneRow
+NOTICE:  1 row was skipped due to data type incompatibility
 NOTICE:  CopyFromEnd
+-- 987 is accepted.
+-- 654 is a soft error because ON_ERROR is ignore.
+COPY copy_data FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore, LOG_VERBOSITY verbose);
+NOTICE:  test_copy_format: is_from=true
+NOTICE:  CopyFromInFunc: attribute: smallint
+NOTICE:  CopyFromInFunc: attribute: integer
+NOTICE:  CopyFromInFunc: attribute: bigint
+NOTICE:  CopyFromStart: the number of attributes: 3
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromOneRow
+NOTICE:  skipping row due to data type incompatibility at line 2 for column "a": "6"
+NOTICE:  CopyFromOneRow
+NOTICE:  1 row was skipped due to data type incompatibility
+NOTICE:  CopyFromEnd
+-- 987 is accepted.
+-- 654 is a soft error because ON_ERROR is ignore.
+-- 321 is a hard error.
+COPY copy_data FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore);
+NOTICE:  test_copy_format: is_from=true
+NOTICE:  CopyFromInFunc: attribute: smallint
+NOTICE:  CopyFromInFunc: attribute: integer
+NOTICE:  CopyFromInFunc: attribute: bigint
+NOTICE:  CopyFromStart: the number of attributes: 3
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromOneRow
+ERROR:  too much lines: 3
+CONTEXT:  COPY copy_data, line 3
 COPY copy_data TO stdout WITH (FORMAT 'test_copy_format');
 NOTICE:  test_copy_format: is_from=false
 NOTICE:  CopyToOutFunc: attribute: smallint
@@ -21,7 +66,12 @@ NOTICE:  CopyToStart: the number of attributes: 3
 NOTICE:  CopyToOneRow: the number of valid values: 3
 NOTICE:  CopyToOneRow: the number of valid values: 3
 NOTICE:  CopyToOneRow: the number of valid values: 3
+NOTICE:  CopyToOneRow: the number of valid values: 3
+NOTICE:  CopyToOneRow: the number of valid values: 3
 NOTICE:  CopyToEnd
+-- Reset data.
+TRUNCATE copy_data;
+INSERT INTO copy_data VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
 DROP EXTENSION test_copy_format;
 -- Install custom COPY handlers to a schema that isn't included in
 -- search_path.
diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql
b/src/test/modules/test_copy_format/sql/test_copy_format.sql
index b262794f878..c7beb2fb8ae 100644
--- a/src/test/modules/test_copy_format/sql/test_copy_format.sql
+++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql
@@ -4,10 +4,45 @@ INSERT INTO copy_data VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
 -- No WITH SCHEMA. It installs custom COPY handlers to the current
 -- schema.
 CREATE EXTENSION test_copy_format;
+
 -- We can find a custom COPY handler without schema.
+
+-- 987 is accepted.
+-- 654 is a hard error because ON_ERROR is stop by default.
 COPY copy_data FROM stdin WITH (FORMAT 'test_copy_format');
+987
+654
 \.
+
+-- 987 is accepted.
+-- 654 is a soft error because ON_ERROR is ignore.
+COPY copy_data FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore);
+987
+654
+\.
+
+-- 987 is accepted.
+-- 654 is a soft error because ON_ERROR is ignore.
+COPY copy_data FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore, LOG_VERBOSITY verbose);
+987
+654
+\.
+
+-- 987 is accepted.
+-- 654 is a soft error because ON_ERROR is ignore.
+-- 321 is a hard error.
+COPY copy_data FROM stdin WITH (FORMAT 'test_copy_format', ON_ERROR ignore);
+987
+654
+321
+\.
+
 COPY copy_data TO stdout WITH (FORMAT 'test_copy_format');
+
+-- Reset data.
+TRUNCATE copy_data;
+INSERT INTO copy_data VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+
 DROP EXTENSION test_copy_format;
 
 
diff --git a/src/test/modules/test_copy_format/test_copy_format.c
b/src/test/modules/test_copy_format/test_copy_format.c
index 1d754201336..34ec693a7ec 100644
--- a/src/test/modules/test_copy_format/test_copy_format.c
+++ b/src/test/modules/test_copy_format/test_copy_format.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 
 #include "commands/copyapi.h"
+#include "commands/copyfrom_internal.h"
 #include "commands/defrem.h"
 #include "utils/builtins.h"
 
@@ -35,8 +36,85 @@ TestCopyFromStart(CopyFromState cstate, TupleDesc tupDesc)
 static bool
 TestCopyFromOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls)
 {
+    int            n_attributes = list_length(cstate->attnumlist);
+    char       *line;
+    int            line_size = n_attributes + 1;    /* +1 is for new line */
+    int            read_bytes;
+
     ereport(NOTICE, (errmsg("CopyFromOneRow")));
-    return false;
+
+    cstate->cur_lineno++;
+    line = palloc(line_size);
+    read_bytes = CopyFromStateGetData(cstate, line, line_size, line_size);
+    if (read_bytes == 0)
+        return false;
+    if (read_bytes != line_size)
+        ereport(ERROR,
+                (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                 errmsg("one line must be %d bytes: %d",
+                        line_size, read_bytes)));
+
+    if (cstate->cur_lineno == 1)
+    {
+        /* Success */
+        TupleDesc    tupDesc = RelationGetDescr(cstate->rel);
+        ListCell   *cur;
+        int            i = 0;
+
+        foreach(cur, cstate->attnumlist)
+        {
+            int            attnum = lfirst_int(cur);
+            int            m = attnum - 1;
+            Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+            if (att->atttypid == INT2OID)
+            {
+                values[i] = Int16GetDatum(line[i] - '0');
+            }
+            else if (att->atttypid == INT4OID)
+            {
+                values[i] = Int32GetDatum(line[i] - '0');
+            }
+            else if (att->atttypid == INT8OID)
+            {
+                values[i] = Int64GetDatum(line[i] - '0');
+            }
+            nulls[i] = false;
+            i++;
+        }
+    }
+    else if (cstate->cur_lineno == 2)
+    {
+        /* Soft error */
+        TupleDesc    tupDesc = RelationGetDescr(cstate->rel);
+        int            attnum = lfirst_int(list_head(cstate->attnumlist));
+        int            m = attnum - 1;
+        Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+        char        value[2];
+
+        cstate->cur_attname = NameStr(att->attname);
+        value[0] = line[0];
+        value[1] = '\0';
+        cstate->cur_attval = value;
+        errsave((Node *) cstate->escontext,
+                (
+                 errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+                 errmsg("invalid value: \"%c\"", line[0])));
+        CopyFromSkipErrorRow(cstate);
+        cstate->cur_attname = NULL;
+        cstate->cur_attval = NULL;
+        return true;
+    }
+    else
+    {
+        /* Hard error */
+        ereport(ERROR,
+                (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                 errmsg("too much lines: %llu",
+                        (unsigned long long) cstate->cur_lineno)));
+    }
+
+    return true;
 }
 
 static void
-- 
2.47.2
From ed454fd1998bca012182b977c227b4a0caa3ccd6 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Thu, 27 Mar 2025 11:56:45 +0900
Subject: [PATCH v40 4/6] Use copy handlers for built-in formats
This adds copy handlers for text, csv and binary. We can simplify
Copy{To,From}GetRoutine() by this. We'll be able to remove
CopyFormatOptions::{binary,csv_mode} when we add more callbacks to
Copy{To,From}Routine and move format specific routines to
Copy{To,From}Routine::*.
---
 src/backend/commands/copy.c                   | 101 ++++++++++++------
 src/backend/commands/copyfrom.c               |  42 ++++----
 src/backend/commands/copyto.c                 |  42 ++++----
 src/include/catalog/pg_proc.dat               |  11 ++
 src/include/commands/copy.h                   |   2 +-
 src/include/commands/copyfrom_internal.h      |   6 +-
 src/include/commands/copyto_internal.h        |   6 +-
 .../expected/test_copy_format.out             |  35 ++++++
 .../test_copy_format/sql/test_copy_format.sql |  32 ++++++
 .../test_copy_format--1.0.sql                 |  15 +++
 10 files changed, 211 insertions(+), 81 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 9515c4d5786..38ed8bccacd 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -22,7 +22,9 @@
 #include "access/table.h"
 #include "access/xact.h"
 #include "catalog/pg_authid.h"
-#include "commands/copy.h"
+#include "commands/copyapi.h"
+#include "commands/copyto_internal.h"
+#include "commands/copyfrom_internal.h"
 #include "commands/defrem.h"
 #include "executor/executor.h"
 #include "mb/pg_wchar.h"
@@ -521,43 +523,45 @@ ProcessCopyOptions(ParseState *pstate,
 
         if (strcmp(defel->defname, "format") == 0)
         {
-            char       *fmt = defGetString(defel);
+            char       *format = defGetString(defel);
+            List       *qualified_format;
+            char       *schema;
+            char       *fmt;
+            Oid            arg_types[1];
+            Oid            handler = InvalidOid;
 
             if (format_specified)
                 errorConflictingDefElem(defel, pstate);
             format_specified = true;
-            if (strcmp(fmt, "text") == 0)
-                 /* default format */ ;
-            else if (strcmp(fmt, "csv") == 0)
-                opts_out->csv_mode = true;
-            else if (strcmp(fmt, "binary") == 0)
-                opts_out->binary = true;
-            else
+
+            qualified_format = stringToQualifiedNameList(format, NULL);
+            DeconstructQualifiedName(qualified_format, &schema, &fmt);
+            if (!schema || strcmp(schema, "pg_catalog") == 0)
             {
-                List       *qualified_format;
-                Oid            arg_types[1];
-                Oid            handler = InvalidOid;
-
-                qualified_format = stringToQualifiedNameList(fmt, NULL);
-                arg_types[0] = INTERNALOID;
-                handler = LookupFuncName(qualified_format, 1,
-                                         arg_types, true);
-                if (!OidIsValid(handler))
-                    ereport(ERROR,
-                            (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-                             errmsg("COPY format \"%s\" not recognized", fmt),
-                             parser_errposition(pstate, defel->location)));
-
-                /* check that handler has correct return type */
-                if (get_func_rettype(handler) != COPY_HANDLEROID)
-                    ereport(ERROR,
-                            (errcode(ERRCODE_WRONG_OBJECT_TYPE),
-                             errmsg("function %s must return type %s",
-                                    fmt, "copy_handler"),
-                             parser_errposition(pstate, defel->location)));
-
-                opts_out->handler = handler;
+                if (strcmp(fmt, "csv") == 0)
+                    opts_out->csv_mode = true;
+                else if (strcmp(fmt, "binary") == 0)
+                    opts_out->binary = true;
             }
+
+            arg_types[0] = INTERNALOID;
+            handler = LookupFuncName(qualified_format, 1,
+                                     arg_types, true);
+            if (!OidIsValid(handler))
+                ereport(ERROR,
+                        (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                         errmsg("COPY format \"%s\" not recognized", format),
+                         parser_errposition(pstate, defel->location)));
+
+            /* check that handler has correct return type */
+            if (get_func_rettype(handler) != COPY_HANDLEROID)
+                ereport(ERROR,
+                        (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+                         errmsg("function %s must return type %s",
+                                format, "copy_handler"),
+                         parser_errposition(pstate, defel->location)));
+
+            opts_out->handler = handler;
         }
         else if (strcmp(defel->defname, "freeze") == 0)
         {
@@ -1040,3 +1044,36 @@ CopyGetAttnums(TupleDesc tupDesc, Relation rel, List *attnamelist)
 
     return attnums;
 }
+
+Datum
+copy_text_handler(PG_FUNCTION_ARGS)
+{
+    bool        is_from = PG_GETARG_BOOL(0);
+
+    if (is_from)
+        PG_RETURN_POINTER(&CopyFromRoutineText);
+    else
+        PG_RETURN_POINTER(&CopyToRoutineText);
+}
+
+Datum
+copy_csv_handler(PG_FUNCTION_ARGS)
+{
+    bool        is_from = PG_GETARG_BOOL(0);
+
+    if (is_from)
+        PG_RETURN_POINTER(&CopyFromRoutineCSV);
+    else
+        PG_RETURN_POINTER(&CopyToRoutineCSV);
+}
+
+Datum
+copy_binary_handler(PG_FUNCTION_ARGS)
+{
+    bool        is_from = PG_GETARG_BOOL(0);
+
+    if (is_from)
+        PG_RETURN_POINTER(&CopyFromRoutineBinary);
+    else
+        PG_RETURN_POINTER(&CopyToRoutineBinary);
+}
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 3d86e8a8328..74a8051c24c 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -45,6 +45,7 @@
 #include "rewrite/rewriteHandler.h"
 #include "storage/fd.h"
 #include "tcop/tcopprot.h"
+#include "utils/fmgroids.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/portal.h"
@@ -128,7 +129,7 @@ static void CopyFromBinaryEnd(CopyFromState cstate);
  */
 
 /* text format */
-static const CopyFromRoutine CopyFromRoutineText = {
+const CopyFromRoutine CopyFromRoutineText = {
     .type = T_CopyFromRoutine,
     .CopyFromInFunc = CopyFromTextLikeInFunc,
     .CopyFromStart = CopyFromTextLikeStart,
@@ -137,7 +138,7 @@ static const CopyFromRoutine CopyFromRoutineText = {
 };
 
 /* CSV format */
-static const CopyFromRoutine CopyFromRoutineCSV = {
+const CopyFromRoutine CopyFromRoutineCSV = {
     .type = T_CopyFromRoutine,
     .CopyFromInFunc = CopyFromTextLikeInFunc,
     .CopyFromStart = CopyFromTextLikeStart,
@@ -146,7 +147,7 @@ static const CopyFromRoutine CopyFromRoutineCSV = {
 };
 
 /* binary format */
-static const CopyFromRoutine CopyFromRoutineBinary = {
+const CopyFromRoutine CopyFromRoutineBinary = {
     .type = T_CopyFromRoutine,
     .CopyFromInFunc = CopyFromBinaryInFunc,
     .CopyFromStart = CopyFromBinaryStart,
@@ -158,28 +159,23 @@ static const CopyFromRoutine CopyFromRoutineBinary = {
 static const CopyFromRoutine *
 CopyFromGetRoutine(const CopyFormatOptions *opts)
 {
-    if (OidIsValid(opts->handler))
-    {
-        Datum        datum;
-        Node       *routine;
-
-        datum = OidFunctionCall1(opts->handler, BoolGetDatum(true));
-        routine = (Node *) DatumGetPointer(datum);
-        if (routine == NULL || !IsA(routine, CopyFromRoutine))
-            ereport(ERROR,
-                    (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-                     errmsg("COPY handler function %s.%s did not return CopyFromRoutine struct",
-                            get_namespace_name(get_func_namespace(opts->handler)),
-                            get_func_name(opts->handler))));
-        return castNode(CopyFromRoutine, routine);
-    }
-    else if (opts->csv_mode)
-        return &CopyFromRoutineCSV;
-    else if (opts->binary)
-        return &CopyFromRoutineBinary;
+    Oid            handler = opts->handler;
+    Datum        datum;
+    Node       *routine;
 
     /* default is text */
-    return &CopyFromRoutineText;
+    if (!OidIsValid(handler))
+        handler = F_TEXT_INTERNAL;
+
+    datum = OidFunctionCall1(handler, BoolGetDatum(true));
+    routine = (Node *) DatumGetPointer(datum);
+    if (routine == NULL || !IsA(routine, CopyFromRoutine))
+        ereport(ERROR,
+                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                 errmsg("COPY handler function %s.%s did not return CopyFromRoutine struct",
+                        get_namespace_name(get_func_namespace(handler)),
+                        get_func_name(handler))));
+    return castNode(CopyFromRoutine, routine);
 }
 
 /* Implementation of the start callback for text and CSV formats */
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index d6fcfdfb9b1..4e1b154cad2 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -30,6 +30,7 @@
 #include "pgstat.h"
 #include "storage/fd.h"
 #include "tcop/tcopprot.h"
+#include "utils/fmgroids.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
@@ -87,7 +88,7 @@ static void CopySendInt16(CopyToState cstate, int16 val);
  */
 
 /* text format */
-static const CopyToRoutine CopyToRoutineText = {
+const CopyToRoutine CopyToRoutineText = {
     .type = T_CopyToRoutine,
     .CopyToStart = CopyToTextLikeStart,
     .CopyToOutFunc = CopyToTextLikeOutFunc,
@@ -96,7 +97,7 @@ static const CopyToRoutine CopyToRoutineText = {
 };
 
 /* CSV format */
-static const CopyToRoutine CopyToRoutineCSV = {
+const CopyToRoutine CopyToRoutineCSV = {
     .type = T_CopyToRoutine,
     .CopyToStart = CopyToTextLikeStart,
     .CopyToOutFunc = CopyToTextLikeOutFunc,
@@ -105,7 +106,7 @@ static const CopyToRoutine CopyToRoutineCSV = {
 };
 
 /* binary format */
-static const CopyToRoutine CopyToRoutineBinary = {
+const CopyToRoutine CopyToRoutineBinary = {
     .type = T_CopyToRoutine,
     .CopyToStart = CopyToBinaryStart,
     .CopyToOutFunc = CopyToBinaryOutFunc,
@@ -117,28 +118,23 @@ static const CopyToRoutine CopyToRoutineBinary = {
 static const CopyToRoutine *
 CopyToGetRoutine(const CopyFormatOptions *opts)
 {
-    if (OidIsValid(opts->handler))
-    {
-        Datum        datum;
-        Node       *routine;
-
-        datum = OidFunctionCall1(opts->handler, BoolGetDatum(false));
-        routine = (Node *) DatumGetPointer(datum);
-        if (routine == NULL || !IsA(routine, CopyToRoutine))
-            ereport(ERROR,
-                    (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-                     errmsg("COPY handler function %s.%s did not return CopyToRoutine struct",
-                            get_namespace_name(get_func_namespace(opts->handler)),
-                            get_func_name(opts->handler))));
-        return castNode(CopyToRoutine, routine);
-    }
-    else if (opts->csv_mode)
-        return &CopyToRoutineCSV;
-    else if (opts->binary)
-        return &CopyToRoutineBinary;
+    Oid            handler = opts->handler;
+    Datum        datum;
+    Node       *routine;
 
     /* default is text */
-    return &CopyToRoutineText;
+    if (!OidIsValid(handler))
+        handler = F_TEXT_INTERNAL;
+
+    datum = OidFunctionCall1(handler, BoolGetDatum(false));
+    routine = (Node *) DatumGetPointer(datum);
+    if (routine == NULL || !IsA(routine, CopyToRoutine))
+        ereport(ERROR,
+                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                 errmsg("COPY handler function %s.%s did not return CopyToRoutine struct",
+                        get_namespace_name(get_func_namespace(handler)),
+                        get_func_name(handler))));
+    return castNode(CopyToRoutine, routine);
 }
 
 /* Implementation of the start callback for text and CSV formats */
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index ba46bfa48a8..e038157eb74 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -12572,4 +12572,15 @@
   proargnames =>
'{pid,io_id,io_generation,state,operation,off,length,target,handle_data_len,raw_result,result,target_desc,f_sync,f_localmem,f_buffered}',
   prosrc => 'pg_get_aios' },
 
+# COPY handlers
+{ oid => '8100', descr => 'text COPY FORMAT handler',
+  proname => 'text', provolatile => 'i', prorettype => 'copy_handler',
+  proargtypes => 'internal', prosrc => 'copy_text_handler' },
+{ oid => '8101', descr => 'csv COPY FORMAT handler',
+  proname => 'csv', provolatile => 'i', prorettype => 'copy_handler',
+  proargtypes => 'internal', prosrc => 'copy_csv_handler' },
+{ oid => '8102', descr => 'binary COPY FORMAT handler',
+  proname => 'binary', provolatile => 'i', prorettype => 'copy_handler',
+  proargtypes => 'internal', prosrc => 'copy_binary_handler' },
+
 ]
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 6df1f8a3b9b..4525261fcc4 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -87,7 +87,7 @@ typedef struct CopyFormatOptions
     CopyLogVerbosityChoice log_verbosity;    /* verbosity of logged messages */
     int64        reject_limit;    /* maximum tolerable number of errors */
     List       *convert_select; /* list of column names (can be NIL) */
-    Oid            handler;        /* handler function for custom format routine */
+    Oid            handler;        /* handler function */
 } CopyFormatOptions;
 
 /* These are private in commands/copy[from|to]_internal.h */
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index f9e27152313..51d181c3ab4 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -14,7 +14,7 @@
 #ifndef COPYFROM_INTERNAL_H
 #define COPYFROM_INTERNAL_H
 
-#include "commands/copy.h"
+#include "commands/copyapi.h"
 #include "commands/trigger.h"
 #include "nodes/miscnodes.h"
 
@@ -197,4 +197,8 @@ extern bool CopyFromCSVOneRow(CopyFromState cstate, ExprContext *econtext,
 extern bool CopyFromBinaryOneRow(CopyFromState cstate, ExprContext *econtext,
                                  Datum *values, bool *nulls);
 
+extern PGDLLIMPORT const CopyFromRoutine CopyFromRoutineText;
+extern PGDLLIMPORT const CopyFromRoutine CopyFromRoutineCSV;
+extern PGDLLIMPORT const CopyFromRoutine CopyFromRoutineBinary;
+
 #endif                            /* COPYFROM_INTERNAL_H */
diff --git a/src/include/commands/copyto_internal.h b/src/include/commands/copyto_internal.h
index 3bd9d702bf0..9faf97c718a 100644
--- a/src/include/commands/copyto_internal.h
+++ b/src/include/commands/copyto_internal.h
@@ -14,7 +14,7 @@
 #ifndef COPYTO_INTERNAL_H
 #define COPYTO_INTERNAL_H
 
-#include "commands/copy.h"
+#include "commands/copyapi.h"
 #include "executor/execdesc.h"
 #include "executor/tuptable.h"
 #include "nodes/execnodes.h"
@@ -83,4 +83,8 @@ typedef struct CopyToStateData
     void       *opaque;            /* private space */
 } CopyToStateData;
 
+extern PGDLLIMPORT const CopyToRoutine CopyToRoutineText;
+extern PGDLLIMPORT const CopyToRoutine CopyToRoutineCSV;
+extern PGDLLIMPORT const CopyToRoutine CopyToRoutineBinary;
+
 #endif                            /* COPYTO_INTERNAL_H */
diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out
b/src/test/modules/test_copy_format/expected/test_copy_format.out
index 47a875f0ab1..aa51e480b1d 100644
--- a/src/test/modules/test_copy_format/expected/test_copy_format.out
+++ b/src/test/modules/test_copy_format/expected/test_copy_format.out
@@ -72,6 +72,41 @@ NOTICE:  CopyToEnd
 -- Reset data.
 TRUNCATE copy_data;
 INSERT INTO copy_data VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+-- test_copy_format extension installs text, csv and binary custom
+-- COPY handlers to the public schema but they must not be
+-- used. Builtin COPY handlers must be used.
+-- public.text must not be used
+COPY copy_data FROM stdin WITH (FORMAT text);
+COPY copy_data TO stdout WITH (FORMAT text);
+1    2    3
+12    34    56
+123    456    789
+COPY copy_data FROM stdin WITH (FORMAT 'pg_catalog.text');
+COPY copy_data TO stdout WITH (FORMAT 'pg_catalog.text');
+1    2    3
+12    34    56
+123    456    789
+-- public.csv must not be used
+COPY copy_data FROM stdin WITH (FORMAT csv);
+COPY copy_data TO stdout WITH (FORMAT csv);
+1,2,3
+12,34,56
+123,456,789
+COPY copy_data FROM stdin WITH (FORMAT 'pg_catalog.csv');
+COPY copy_data TO stdout WITH (FORMAT 'pg_catalog.csv');
+1,2,3
+12,34,56
+123,456,789
+-- public.binary must not be used
+\getenv abs_builddir PG_ABS_BUILDDIR
+\set filename :abs_builddir '/results/binary.data'
+COPY copy_data TO :'filename' WITH (FORMAT binary);
+COPY copy_data FROM :'filename' WITH (FORMAT binary);
+COPY copy_data TO :'filename' WITH (FORMAT 'pg_catalog.binary');
+COPY copy_data FROM :'filename' WITH (FORMAT 'pg_catalog.binary');
+-- Reset data.
+TRUNCATE copy_data;
+INSERT INTO copy_data VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
 DROP EXTENSION test_copy_format;
 -- Install custom COPY handlers to a schema that isn't included in
 -- search_path.
diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql
b/src/test/modules/test_copy_format/sql/test_copy_format.sql
index c7beb2fb8ae..3b7f6e72e13 100644
--- a/src/test/modules/test_copy_format/sql/test_copy_format.sql
+++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql
@@ -43,6 +43,38 @@ COPY copy_data TO stdout WITH (FORMAT 'test_copy_format');
 TRUNCATE copy_data;
 INSERT INTO copy_data VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
 
+-- test_copy_format extension installs text, csv and binary custom
+-- COPY handlers to the public schema but they must not be
+-- used. Builtin COPY handlers must be used.
+
+-- public.text must not be used
+COPY copy_data FROM stdin WITH (FORMAT text);
+\.
+COPY copy_data TO stdout WITH (FORMAT text);
+COPY copy_data FROM stdin WITH (FORMAT 'pg_catalog.text');
+\.
+COPY copy_data TO stdout WITH (FORMAT 'pg_catalog.text');
+
+-- public.csv must not be used
+COPY copy_data FROM stdin WITH (FORMAT csv);
+\.
+COPY copy_data TO stdout WITH (FORMAT csv);
+COPY copy_data FROM stdin WITH (FORMAT 'pg_catalog.csv');
+\.
+COPY copy_data TO stdout WITH (FORMAT 'pg_catalog.csv');
+
+-- public.binary must not be used
+\getenv abs_builddir PG_ABS_BUILDDIR
+\set filename :abs_builddir '/results/binary.data'
+COPY copy_data TO :'filename' WITH (FORMAT binary);
+COPY copy_data FROM :'filename' WITH (FORMAT binary);
+COPY copy_data TO :'filename' WITH (FORMAT 'pg_catalog.binary');
+COPY copy_data FROM :'filename' WITH (FORMAT 'pg_catalog.binary');
+
+-- Reset data.
+TRUNCATE copy_data;
+INSERT INTO copy_data VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+
 DROP EXTENSION test_copy_format;
 
 
diff --git a/src/test/modules/test_copy_format/test_copy_format--1.0.sql
b/src/test/modules/test_copy_format/test_copy_format--1.0.sql
index c1a137181f8..bfa1900e828 100644
--- a/src/test/modules/test_copy_format/test_copy_format--1.0.sql
+++ b/src/test/modules/test_copy_format/test_copy_format--1.0.sql
@@ -22,3 +22,18 @@ CREATE FUNCTION test_copy_format_wrong_return_value(internal)
     RETURNS copy_handler
     AS 'MODULE_PATHNAME', 'test_copy_format_wrong_return_value'
     LANGUAGE C;
+
+CREATE FUNCTION text(internal)
+    RETURNS copy_handler
+    AS 'MODULE_PATHNAME', 'test_copy_format'
+    LANGUAGE C;
+
+CREATE FUNCTION csv(internal)
+    RETURNS copy_handler
+    AS 'MODULE_PATHNAME', 'test_copy_format'
+    LANGUAGE C;
+
+CREATE FUNCTION binary(internal)
+    RETURNS copy_handler
+    AS 'MODULE_PATHNAME', 'test_copy_format'
+    LANGUAGE C;
-- 
2.47.2
From 6e014bf226713a2c9f37da4c4f337128c4392212 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Fri, 25 Apr 2025 18:49:41 +0900
Subject: [PATCH v40 5/6] Remove CopyFormatOptions::{binary,csv_mode}
Because we can compute them from CopyFormatOptions::handler.
---
 src/backend/commands/copy.c          | 61 +++++++++++++---------------
 src/backend/commands/copyfrom.c      |  2 +-
 src/backend/commands/copyfromparse.c |  7 ++--
 src/backend/commands/copyto.c        |  4 +-
 src/include/commands/copy.h          |  2 -
 5 files changed, 36 insertions(+), 40 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 38ed8bccacd..21db5e964cf 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -38,6 +38,7 @@
 #include "parser/parse_relation.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
+#include "utils/fmgroids.h"
 #include "utils/lsyscache.h"
 #include "utils/regproc.h"
 #include "utils/rel.h"
@@ -508,6 +509,8 @@ ProcessCopyOptions(ParseState *pstate,
     bool        on_error_specified = false;
     bool        log_verbosity_specified = false;
     bool        reject_limit_specified = false;
+    bool        binary = false;
+    bool        csv_mode = false;
     ListCell   *option;
 
     /* Support external use for option sanity checking */
@@ -525,8 +528,6 @@ ProcessCopyOptions(ParseState *pstate,
         {
             char       *format = defGetString(defel);
             List       *qualified_format;
-            char       *schema;
-            char       *fmt;
             Oid            arg_types[1];
             Oid            handler = InvalidOid;
 
@@ -535,15 +536,6 @@ ProcessCopyOptions(ParseState *pstate,
             format_specified = true;
 
             qualified_format = stringToQualifiedNameList(format, NULL);
-            DeconstructQualifiedName(qualified_format, &schema, &fmt);
-            if (!schema || strcmp(schema, "pg_catalog") == 0)
-            {
-                if (strcmp(fmt, "csv") == 0)
-                    opts_out->csv_mode = true;
-                else if (strcmp(fmt, "binary") == 0)
-                    opts_out->binary = true;
-            }
-
             arg_types[0] = INTERNALOID;
             handler = LookupFuncName(qualified_format, 1,
                                      arg_types, true);
@@ -562,6 +554,11 @@ ProcessCopyOptions(ParseState *pstate,
                          parser_errposition(pstate, defel->location)));
 
             opts_out->handler = handler;
+            if (opts_out->handler == F_CSV)
+                csv_mode = true;
+            else if (opts_out->handler == F_BINARY)
+                binary = true;
+
         }
         else if (strcmp(defel->defname, "freeze") == 0)
         {
@@ -716,31 +713,31 @@ ProcessCopyOptions(ParseState *pstate,
      * Check for incompatible options (must do these three before inserting
      * defaults)
      */
-    if (opts_out->binary && opts_out->delim)
+    if (binary && opts_out->delim)
         ereport(ERROR,
                 (errcode(ERRCODE_SYNTAX_ERROR),
         /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
                  errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
 
-    if (opts_out->binary && opts_out->null_print)
+    if (binary && opts_out->null_print)
         ereport(ERROR,
                 (errcode(ERRCODE_SYNTAX_ERROR),
                  errmsg("cannot specify %s in BINARY mode", "NULL")));
 
-    if (opts_out->binary && opts_out->default_print)
+    if (binary && opts_out->default_print)
         ereport(ERROR,
                 (errcode(ERRCODE_SYNTAX_ERROR),
                  errmsg("cannot specify %s in BINARY mode", "DEFAULT")));
 
     /* Set defaults for omitted options */
     if (!opts_out->delim)
-        opts_out->delim = opts_out->csv_mode ? "," : "\t";
+        opts_out->delim = csv_mode ? "," : "\t";
 
     if (!opts_out->null_print)
-        opts_out->null_print = opts_out->csv_mode ? "" : "\\N";
+        opts_out->null_print = csv_mode ? "" : "\\N";
     opts_out->null_print_len = strlen(opts_out->null_print);
 
-    if (opts_out->csv_mode)
+    if (csv_mode)
     {
         if (!opts_out->quote)
             opts_out->quote = "\"";
@@ -788,7 +785,7 @@ ProcessCopyOptions(ParseState *pstate,
      * future-proofing.  Likewise we disallow all digits though only octal
      * digits are actually dangerous.
      */
-    if (!opts_out->csv_mode &&
+    if (!csv_mode &&
         strchr("\\.abcdefghijklmnopqrstuvwxyz0123456789",
                opts_out->delim[0]) != NULL)
         ereport(ERROR,
@@ -796,43 +793,43 @@ ProcessCopyOptions(ParseState *pstate,
                  errmsg("COPY delimiter cannot be \"%s\"", opts_out->delim)));
 
     /* Check header */
-    if (opts_out->binary && opts_out->header_line)
+    if (binary && opts_out->header_line)
         ereport(ERROR,
                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
         /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
                  errmsg("cannot specify %s in BINARY mode", "HEADER")));
 
     /* Check quote */
-    if (!opts_out->csv_mode && opts_out->quote != NULL)
+    if (!csv_mode && opts_out->quote != NULL)
         ereport(ERROR,
                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
         /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
                  errmsg("COPY %s requires CSV mode", "QUOTE")));
 
-    if (opts_out->csv_mode && strlen(opts_out->quote) != 1)
+    if (csv_mode && strlen(opts_out->quote) != 1)
         ereport(ERROR,
                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                  errmsg("COPY quote must be a single one-byte character")));
 
-    if (opts_out->csv_mode && opts_out->delim[0] == opts_out->quote[0])
+    if (csv_mode && opts_out->delim[0] == opts_out->quote[0])
         ereport(ERROR,
                 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                  errmsg("COPY delimiter and quote must be different")));
 
     /* Check escape */
-    if (!opts_out->csv_mode && opts_out->escape != NULL)
+    if (!csv_mode && opts_out->escape != NULL)
         ereport(ERROR,
                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
         /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
                  errmsg("COPY %s requires CSV mode", "ESCAPE")));
 
-    if (opts_out->csv_mode && strlen(opts_out->escape) != 1)
+    if (csv_mode && strlen(opts_out->escape) != 1)
         ereport(ERROR,
                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                  errmsg("COPY escape must be a single one-byte character")));
 
     /* Check force_quote */
-    if (!opts_out->csv_mode && (opts_out->force_quote || opts_out->force_quote_all))
+    if (!csv_mode && (opts_out->force_quote || opts_out->force_quote_all))
         ereport(ERROR,
                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
         /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -846,8 +843,8 @@ ProcessCopyOptions(ParseState *pstate,
                         "COPY FROM")));
 
     /* Check force_notnull */
-    if (!opts_out->csv_mode && (opts_out->force_notnull != NIL ||
-                                opts_out->force_notnull_all))
+    if (!csv_mode && (opts_out->force_notnull != NIL ||
+                      opts_out->force_notnull_all))
         ereport(ERROR,
                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
         /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -862,8 +859,8 @@ ProcessCopyOptions(ParseState *pstate,
                         "COPY TO")));
 
     /* Check force_null */
-    if (!opts_out->csv_mode && (opts_out->force_null != NIL ||
-                                opts_out->force_null_all))
+    if (!csv_mode && (opts_out->force_null != NIL ||
+                      opts_out->force_null_all))
         ereport(ERROR,
                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
         /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
@@ -887,7 +884,7 @@ ProcessCopyOptions(ParseState *pstate,
                         "NULL")));
 
     /* Don't allow the CSV quote char to appear in the null string. */
-    if (opts_out->csv_mode &&
+    if (csv_mode &&
         strchr(opts_out->null_print, opts_out->quote[0]) != NULL)
         ereport(ERROR,
                 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -923,7 +920,7 @@ ProcessCopyOptions(ParseState *pstate,
                             "DEFAULT")));
 
         /* Don't allow the CSV quote char to appear in the default string. */
-        if (opts_out->csv_mode &&
+        if (csv_mode &&
             strchr(opts_out->default_print, opts_out->quote[0]) != NULL)
             ereport(ERROR,
                     (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -940,7 +937,7 @@ ProcessCopyOptions(ParseState *pstate,
                      errmsg("NULL specification and DEFAULT specification cannot be the same")));
     }
     /* Check on_error */
-    if (opts_out->binary && opts_out->on_error != COPY_ON_ERROR_STOP)
+    if (binary && opts_out->on_error != COPY_ON_ERROR_STOP)
         ereport(ERROR,
                 (errcode(ERRCODE_SYNTAX_ERROR),
                  errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 74a8051c24c..b09b6b3e101 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -275,7 +275,7 @@ CopyFromErrorCallback(void *arg)
                    cstate->cur_relname);
         return;
     }
-    if (cstate->opts.binary)
+    if (cstate->opts.handler == F_BINARY)
     {
         /* can't usefully display the data */
         if (cstate->cur_attname)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index de68b53b000..148fa1f2062 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -73,6 +73,7 @@
 #include "pgstat.h"
 #include "port/pg_bswap.h"
 #include "utils/builtins.h"
+#include "utils/fmgroids.h"
 #include "utils/rel.h"
 
 #define ISOCTAL(c) (((c) >= '0') && ((c) <= '7'))
@@ -171,7 +172,7 @@ ReceiveCopyBegin(CopyFromState cstate)
 {
     StringInfoData buf;
     int            natts = list_length(cstate->attnumlist);
-    int16        format = (cstate->opts.binary ? 1 : 0);
+    int16        format = (cstate->opts.handler == F_BINARY ? 1 : 0);
     int            i;
 
     pq_beginmessage(&buf, PqMsg_CopyInResponse);
@@ -758,7 +759,7 @@ bool
 NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
 {
     return NextCopyFromRawFieldsInternal(cstate, fields, nfields,
-                                         cstate->opts.csv_mode);
+                                         cstate->opts.handler == F_CSV);
 }
 
 /*
@@ -785,7 +786,7 @@ NextCopyFromRawFieldsInternal(CopyFromState cstate, char ***fields, int *nfields
     bool        done;
 
     /* only available for text or csv input */
-    Assert(!cstate->opts.binary);
+    Assert(cstate->opts.handler != F_BINARY);
 
     /* on input check that the header line is correct if needed */
     if (cstate->cur_lineno == 0 && cstate->opts.header_line)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 4e1b154cad2..4f8f5813172 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -167,7 +167,7 @@ CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
 
             colname = NameStr(TupleDescAttr(tupDesc, attnum - 1)->attname);
 
-            if (cstate->opts.csv_mode)
+            if (cstate->opts.handler == F_CSV)
                 CopyAttributeOutCSV(cstate, colname, false);
             else
                 CopyAttributeOutText(cstate, colname);
@@ -344,7 +344,7 @@ SendCopyBegin(CopyToState cstate)
 {
     StringInfoData buf;
     int            natts = list_length(cstate->attnumlist);
-    int16        format = (cstate->opts.binary ? 1 : 0);
+    int16        format = (cstate->opts.handler == F_BINARY ? 1 : 0);
     int            i;
 
     pq_beginmessage(&buf, PqMsg_CopyOutResponse);
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 4525261fcc4..04f8f5ef1b2 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -61,9 +61,7 @@ typedef struct CopyFormatOptions
     /* parameters from the COPY command */
     int            file_encoding;    /* file or remote side's character encoding,
                                  * -1 if not specified */
-    bool        binary;            /* binary format? */
     bool        freeze;            /* freeze rows on loading? */
-    bool        csv_mode;        /* Comma Separated Value format? */
     CopyHeaderChoice header_line;    /* header line? */
     char       *null_print;        /* NULL marker string (server encoding!) */
     int            null_print_len; /* length of same */
-- 
2.47.2
From 421c34b76a5e9fe45b49bdbe52ecda4d0f638617 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Wed, 19 Mar 2025 11:46:34 +0900
Subject: [PATCH v40 6/6] Add document how to write a COPY handler
This is WIP because we haven't decided our API yet.
Co-authored-by: David G. Johnston <david.g.johnston@gmail.com>
---
 doc/src/sgml/copy-handler.sgml | 394 +++++++++++++++++++++++++++++++++
 doc/src/sgml/filelist.sgml     |   1 +
 doc/src/sgml/postgres.sgml     |   1 +
 src/include/commands/copyapi.h |   9 +-
 4 files changed, 401 insertions(+), 4 deletions(-)
 create mode 100644 doc/src/sgml/copy-handler.sgml
diff --git a/doc/src/sgml/copy-handler.sgml b/doc/src/sgml/copy-handler.sgml
new file mode 100644
index 00000000000..5bc87d16662
--- /dev/null
+++ b/doc/src/sgml/copy-handler.sgml
@@ -0,0 +1,394 @@
+<!-- doc/src/sgml/copy-handler.sgml -->
+
+<chapter id="copy-handler">
+ <title>Writing a Copy Handler</title>
+
+ <indexterm zone="copy-handler">
+  <primary><literal>COPY</literal> handler</primary>
+ </indexterm>
+
+ <para>
+  <productname>PostgreSQL</productname> supports
+  custom <link linkend="sql-copy"><literal>COPY</literal></link> handlers;
+  adding additional <replaceable>format_name</replaceable> options to
+  the <literal>FORMAT</literal> clause.
+ </para>
+
+ <para>
+  At the SQL level, a copy handler method is represented by a single SQL
+  function (see <xref linkend="sql-createfunction"/>), typically implemented in
+  C, having the signature
+<synopsis>
+<replaceable>format_name</replaceable>(internal) RETURNS <literal>copy_handler</literal>
+</synopsis>
+  The function's name is then accepted as a
+  valid <replaceable>format_name</replaceable>. The return
+  pseudo-type <literal>copy_handler</literal> informs the system that this
+  function needs to be registered as a copy handler.
+  The <type>internal</type> argument is a dummy that prevents this function
+  from being called directly from an SQL command. As the handler
+  implementation must be server-lifetime immutable; this SQL function's
+  volatility should be marked immutable. The <literal>link_symbol</literal>
+  for this function is the name of the implementation function, described
+  next.
+ </para>
+
+ <para>
+  The implementation function signature expected for the function named
+  in the <literal>link_symbol</literal> is:
+<synopsis>
+Datum
+<replaceable>copy_format_handler</replaceable>(PG_FUNCTION_ARGS)
+</synopsis>
+  The convention for the name is to replace the word
+  <replaceable>format</replaceable> in the placeholder above with the value given
+  to <replaceable>format_name</replaceable> in the SQL function.
+  The first argument is a <type>boolean</type> that indicates whether the handler
+  must provide a pointer to its implementation for <literal>COPY FROM</literal>
+  (a <type>CopyFromRoutine *</type>). If <literal>false</literal>, the handler
+  must provide a pointer to its implementation of <literal>COPY TO</literal>
+  (a <type>CopyToRoutine *</type>). These structs are declared in
+  <filename>src/include/commands/copyapi.h</filename>.
+ </para>
+
+ <para>
+  The structs hold pointers to implementation functions for initializing,
+  starting, processing rows, and ending a copy operation. The specific
+  structures vary a bit between <literal>COPY FROM</literal> and
+  <literal>COPY TO</literal> so the next two sections describes each
+  in detail.
+ </para>
+
+ <sect1 id="copy-handler-from">
+  <title>Copy From Handler</title>
+
+  <para>
+   The opening to this chapter describes how the executor will call the main
+   handler function with, in this case,
+   a <type>boolean</type> <literal>true</literal>, and expect to receive a
+   <type>CopyFromRoutine *</type> <type>Datum</type>. This section describes
+   the components of the <type>CopyFromRoutine</type> struct.
+  </para>
+
+  <para>
+<programlisting>
+void
+CopyFromInFunc(CopyFromState cstate,
+               Oid atttypid,
+               FmgrInfo *finfo,
+               Oid *typioparam);
+</programlisting>
+
+   This sets input function information for the
+   given <literal>atttypid</literal> attribute. This function is called once
+   at the beginning of <literal>COPY FROM</literal>. If
+   this <literal>COPY</literal> handler doesn't use any input functions, this
+   function doesn't need to do anything.
+
+   <variablelist>
+    <varlistentry>
+     <term><literal>CopyFromState *cstate</literal></term>
+     <listitem>
+      <para>
+       This is an internal struct that contains all the state variables used
+       throughout a <literal>COPY FROM</literal> operation.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><literal>Oid atttypid</literal></term>
+     <listitem>
+      <para>
+       This is the OID of data type used by the relation's attribute.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><literal>FmgrInfo *finfo</literal></term>
+     <listitem>
+      <para>
+       This can be optionally filled to provide the catalog information of
+       the input function.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><literal>Oid *typioparam</literal></term>
+     <listitem>
+      <para>
+       This can be optionally filled to define the OID of the type to
+       pass to the input function.
+      </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+  </para>
+
+  <para>
+<programlisting>
+void
+CopyFromStart(CopyFromState cstate,
+              TupleDesc tupDesc);
+</programlisting>
+
+   This starts a <literal>COPY FROM</literal>. This function is called once at
+   the beginning of <literal>COPY FROM</literal>.
+
+   <variablelist>
+    <varlistentry>
+     <term><literal>CopyFromState *cstate</literal></term>
+     <listitem>
+      <para>
+       This is an internal struct that contains all the state variables used
+       throughout a <literal>COPY FROM</literal> operation.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><literal>TupleDesc tupDesc</literal></term>
+     <listitem>
+      <para>
+       This is the tuple descriptor of the relation where the data needs to be
+       copied. This can be used for any initialization steps required by a
+       format.
+      </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+  </para>
+
+  <para>
+<programlisting>
+bool
+CopyFromOneRow(CopyFromState cstate,
+               ExprContext *econtext,
+               Datum *values,
+               bool *nulls);
+</programlisting>
+
+   This reads one row from the source and fill <literal>values</literal>
+   and <literal>nulls</literal>. If there is one or more tuples to be read,
+   this must return <literal>true</literal>. If there are no more tuples to
+   read, this must return <literal>false</literal>.
+
+   <variablelist>
+    <varlistentry>
+     <term><literal>CopyFromState *cstate</literal></term>
+     <listitem>
+      <para>
+       This is an internal struct that contains all the state variables used
+       throughout a <literal>COPY FROM</literal> operation.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><literal>ExprContext *econtext</literal></term>
+     <listitem>
+      <para>
+       This is used to evaluate default expression for each column that is
+       either not read from the file or is using
+       the <literal>DEFAULT</literal> option of <literal>COPY
+       FROM</literal>. It is <literal>NULL</literal> if no default values are
+       used.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><literal>Datum *values</literal></term>
+     <listitem>
+      <para>
+       This is an output variable to store read tuples.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><literal>bool *nulls</literal></term>
+     <listitem>
+      <para>
+       This is an output variable to store whether the read columns
+       are <literal>NULL</literal> or not.
+      </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+  </para>
+
+  <para>
+<programlisting>
+void
+CopyFromEnd(CopyFromState cstate);
+</programlisting>
+
+   This ends a <literal>COPY FROM</literal>. This function is called once at
+   the end of <literal>COPY FROM</literal>.
+
+   <variablelist>
+    <varlistentry>
+     <term><literal>CopyFromState *cstate</literal></term>
+     <listitem>
+      <para>
+       This is an internal struct that contains all the state variables used
+       throughout a <literal>COPY FROM</literal> operation.
+      </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+  </para>
+
+  <para>
+   TODO: Add CopyFromStateGetData() and CopyFromSkipErrowRow()?
+  </para>
+ </sect1>
+
+ <sect1 id="copy-handler-to">
+  <title>Copy To Handler</title>
+
+  <para>
+   The <literal>COPY</literal> handler function for <literal>COPY
+   TO</literal> returns a <type>CopyToRoutine</type> struct containing
+   pointers to the functions described below. All functions are required.
+  </para>
+
+  <para>
+<programlisting>
+void
+CopyToOutFunc(CopyToState cstate,
+              Oid atttypid,
+              FmgrInfo *finfo);
+</programlisting>
+
+   This sets output function information for the
+   given <literal>atttypid</literal> attribute. This function is called once
+   at the beginning of <literal>COPY TO</literal>. If
+   this <literal>COPY</literal> handler doesn't use any output functions, this
+   function doesn't need to do anything.
+
+   <variablelist>
+    <varlistentry>
+     <term><literal>CopyToState *cstate</literal></term>
+     <listitem>
+      <para>
+       This is an internal struct that contains all the state variables used
+       throughout a <literal>COPY TO</literal> operation.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><literal>Oid atttypid</literal></term>
+     <listitem>
+      <para>
+       This is the OID of data type used by the relation's attribute.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><literal>FmgrInfo *finfo</literal></term>
+     <listitem>
+      <para>
+       This can be optionally filled to provide the catalog information of
+       the output function.
+      </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+  </para>
+
+  <para>
+<programlisting>
+void
+CopyToStart(CopyToState cstate,
+            TupleDesc tupDesc);
+</programlisting>
+
+   This starts a <literal>COPY TO</literal>. This function is called once at
+   the beginning of <literal>COPY TO</literal>.
+
+   <variablelist>
+    <varlistentry>
+     <term><literal>CopyToState *cstate</literal></term>
+     <listitem>
+      <para>
+       This is an internal struct that contains all the state variables used
+       throughout a <literal>COPY TO</literal> operation.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><literal>TupleDesc tupDesc</literal></term>
+     <listitem>
+      <para>
+       This is the tuple descriptor of the relation where the data is read.
+      </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+  </para>
+
+  <para>
+<programlisting>
+bool
+CopyToOneRow(CopyToState cstate,
+             TupleTableSlot *slot);
+</programlisting>
+
+   This writes one row stored in <literal>slot</literal> to the destination.
+
+   <variablelist>
+    <varlistentry>
+     <term><literal>CopyToState *cstate</literal></term>
+     <listitem>
+      <para>
+       This is an internal struct that contains all the state variables used
+       throughout a <literal>COPY TO</literal> operation.
+      </para>
+     </listitem>
+    </varlistentry>
+
+    <varlistentry>
+     <term><literal>TupleTableSlot *slot</literal></term>
+     <listitem>
+      <para>
+       This is used to get row to be written.
+      </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+  </para>
+
+  <para>
+<programlisting>
+void
+CopyToEnd(CopyToState cstate);
+</programlisting>
+
+   This ends a <literal>COPY TO</literal>. This function is called once at
+   the end of <literal>COPY TO</literal>.
+
+   <variablelist>
+    <varlistentry>
+     <term><literal>CopyToState *cstate</literal></term>
+     <listitem>
+      <para>
+       This is an internal struct that contains all the state variables used
+       throughout a <literal>COPY TO</literal> operation.
+      </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+  </para>
+
+  <para>
+   TODO: Add CopyToStateFlush()?
+  </para>
+ </sect1>
+</chapter>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index fef9584f908..700cf22b502 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -107,6 +107,7 @@
 <!ENTITY storage    SYSTEM "storage.sgml">
 <!ENTITY transaction     SYSTEM "xact.sgml">
 <!ENTITY tablesample-method SYSTEM "tablesample-method.sgml">
+<!ENTITY copy-handler SYSTEM "copy-handler.sgml">
 <!ENTITY wal-for-extensions SYSTEM "wal-for-extensions.sgml">
 <!ENTITY generic-wal SYSTEM "generic-wal.sgml">
 <!ENTITY custom-rmgr SYSTEM "custom-rmgr.sgml">
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index af476c82fcc..8ba319ae2df 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -254,6 +254,7 @@ break is not needed in a wider output rendering.
   &plhandler;
   &fdwhandler;
   &tablesample-method;
+  ©-handler;
   &custom-scan;
   &geqo;
   &tableam;
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 500ece7d5bb..24710cb667a 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -28,10 +28,10 @@ typedef struct CopyToRoutine
      * Set output function information. This callback is called once at the
      * beginning of COPY TO.
      *
+     * 'atttypid' is the OID of data type used by the relation's attribute.
+     *
      * 'finfo' can be optionally filled to provide the catalog information of
      * the output function.
-     *
-     * 'atttypid' is the OID of data type used by the relation's attribute.
      */
     void        (*CopyToOutFunc) (CopyToState cstate, Oid atttypid,
                                   FmgrInfo *finfo);
@@ -70,12 +70,13 @@ typedef struct CopyFromRoutine
      * Set input function information. This callback is called once at the
      * beginning of COPY FROM.
      *
+     * 'atttypid' is the OID of data type used by the relation's attribute.
+     *
      * 'finfo' can be optionally filled to provide the catalog information of
      * the input function.
      *
      * 'typioparam' can be optionally filled to define the OID of the type to
-     * pass to the input function.'atttypid' is the OID of data type used by
-     * the relation's attribute.
+     * pass to the input function.
      */
     void        (*CopyFromInFunc) (CopyFromState cstate, Oid atttypid,
                                    FmgrInfo *finfo, Oid *typioparam);
-- 
2.47.2
			
		On Fri, Apr 25, 2025 at 5:45 AM Sutou Kouhei <kou@clear-code.com> wrote:
>
> Hi,
>
> I've updated the patch set. See the attached v40 patch set.
>
> In <CAD21AoAXzwPC7jjPMTcT80hnzmPa2SUJkiqdYHweEY8sZscEMA@mail.gmail.com>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Wed, 23 Apr 2025 23:44:55 -0700,
>   Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> >> Are the followings correct?
> >>
> >> 1. Move invalid input patterns in
> >>    src/test/modules/test_copy_format/sql/invalid.sql to
> >>    src/test/regress/sql/copy.sql as much as possible.
> >> 2. Create
> >>    src/test/modules/test_copy_format/sql/test_copy_format.sql
> >>    and move all contents in existing *.sql to the file.
> >> 3. Add comments what the tests expect to
> >>    src/test/modules/test_copy_format/sql/test_copy_format.sql.
> >> 4. Remove CopyFormatOptions::{binary,csv_mode}.
> >
> > Agreed with the above items.
>
> Done except 1. because 1. is removed by 3. in the following
> list:
>
> ----
> >> There are 3 unconfirmed suggested changes for tests in:
> >> https://www.postgresql.org/message-id/20250330.113126.433742864258096312.kou%40clear-code.com
> >>
> >> Here are my opinions for them:
> >>
> >> > 1.: There is no difference between single-quoting and
> >> >     double-quoting here. Because the information what quote
> >> >     was used for the given FORMAT value isn't remained
> >> >     here. Should we update gram.y?
> >> >
> >> > 2.: I don't have a strong opinion for it. If nobody objects
> >> >     it, I'll remove them.
> >> >
> >> > 3.: I don't have a strong opinion for it. If nobody objects
> >> >     it, I'll remove them.
> ----
>
> 0005 is added for 4. Could you squash 0004 ("Use copy
> handler for bult-in formats") and 0005 ("Remove
> CopyFormatOptions::{binary,csv_mode}") if needed when you
> push?
>
> >> 6. Use handler OID for detecting the default built-in format
> >>    instead of comparing the given format as string.
>
> Done.
>
> >> 7. Update documentation.
>
> Could someone help this? 0007 is the draft commit for this.
>
> >> There are 3 unconfirmed suggested changes for tests in:
> >> https://www.postgresql.org/message-id/20250330.113126.433742864258096312.kou%40clear-code.com
> >>
> >> Here are my opinions for them:
> >>
> >> > 1.: There is no difference between single-quoting and
> >> >     double-quoting here. Because the information what quote
> >> >     was used for the given FORMAT value isn't remained
> >> >     here. Should we update gram.y?
> >> >
> >> > 2.: I don't have a strong opinion for it. If nobody objects
> >> >     it, I'll remove them.
> >> >
> >> > 3.: I don't have a strong opinion for it. If nobody objects
> >> >     it, I'll remove them.
> >>
> >> Is the 1. required for "ready for merge"? If so, is there
> >> any suggestion? I don't have a strong opinion for it.
> >>
> >> If there are no more opinions for 2. and 3., I'll remove
> >> them.
> >
> > Agreed.
>
> 1.: I didn't do anything. Because there is no suggestion.
>
> 2., 3.: Done.
Thank you for updating the patches.
One of the primary considerations we need to address is the treatment
of the specified format name. The current patch set utilizes built-in
formats (namely 'csv', 'text', and 'binary') when the format name is
either unqualified or explicitly specified with 'pg_catalog' as the
schema. In all other cases, we search for custom format handler
functions based on the search_path. To be frank, I have reservations
about this interface design, as the dependence of the specified custom
format name on the search_path could potentially confuse users.
In light of these concerns, I've been contemplating alternative
interface designs. One promising approach would involve registering
custom copy formats via a C function during module loading
(specifically, in _PG_init()). This method would require extension
authors to invoke a registration function, say
RegisterCustomCopyFormat(), in _PG_init() as follows:
JsonLinesFormatId = RegisterCustomCopyFormat("jsonlines",
                                             &JsonLinesCopyToRoutine,
                                             &JsonLinesCopyFromRoutine);
The registration function would validate the format name and store it
in TopMemoryContext. It would then return a unique identifier that can
be used subsequently to reference the custom copy format extension.
Custom copy format modules could be loaded through
shared_preload_libraries, session_preload_libraries, or the LOAD
command. Extensions could register their own options within this
framework, for example:
RegisterCustomCopyFormatOption(JsonLinesFormatId,
    "custom_option",
    custom_option_handler);
This approach offers several advantages: it would eliminate the
search_path issue, provide greater flexibility, and potentially
simplify the overall interface for users and developers alike. We
might be able to provide a view showing the registered custom COPY
format in the future. Also, these interfaces align with other
customizable functionalities such as custom rmgr, custom lwlock,
custom waitevent, and custom EXPLAIN option etc.
Feedback is very welcome.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
			
		On Thu, May 01, 2025 at 12:15:30PM -0700, Masahiko Sawada wrote:
> In light of these concerns, I've been contemplating alternative
> interface designs. One promising approach would involve registering
> custom copy formats via a C function during module loading
> (specifically, in _PG_init()). This method would require extension
> authors to invoke a registration function, say
> RegisterCustomCopyFormat(), in _PG_init() as follows:
>
> JsonLinesFormatId = RegisterCustomCopyFormat("jsonlines",
>                                              &JsonLinesCopyToRoutine,
>                                              &JsonLinesCopyFromRoutine);
>
> The registration function would validate the format name and store it
> in TopMemoryContext. It would then return a unique identifier that can
> be used subsequently to reference the custom copy format extension.
Hmm.  How much should we care about the observability of the COPY
format used by a given backend?  Storing this information in a
backend's TopMemoryContext is OK to get the extensibility basics to
work, but could it make sense to use some shmem state to allocate a
uint32 ID that could be shared by all backends.  Contrary to EXPLAIN,
COPY commands usually run for a very long time, so I am wondering if
these APIs should be designed so as it would be possible to monitor
the format used.  One layer where the format information could be made
available is the progress reporting view for COPY, for example.  I can
also imagine a pgstats kind where we do COPY stats aggregates, with a
per-format pgstats kind, and sharing a fixed ID across multiple
backends is relevant (when flushing the stats at shutdown, we would
use a name/ID mapping like replication slots).
I don't think that this needs to be relevant for the option part, just
for the format where, I suspect, we should store in a shmem array
based on the ID allocated the name of the format, the library of the
callback and the function name fed to load_external_function().
Note that custom LWLock and wait events use a shmem state for
monitoring purposes, where we are able to do ID->format name lookups
as much as format->ID lookups.  Perhaps it's OK not to do that for
COPY, but I am wondering if we'd better design things from scratch
with states in shmem state knowing that COPY is a long-running
operation, and that if one mixes multiple formats they would most
likely want to know which formats are bottlenecks, through SQL.  Cloud
providers would love that.
> This approach offers several advantages: it would eliminate the
> search_path issue, provide greater flexibility, and potentially
> simplify the overall interface for users and developers alike. We
> might be able to provide a view showing the registered custom COPY
> format in the future. Also, these interfaces align with other
> customizable functionalities such as custom rmgr, custom lwlock,
> custom waitevent, and custom EXPLAIN option etc.
Yeah, agreed with the search_path concerns.  We are getting better at
making areas of Postgres more pluggable lately, having a loading path
where we don't have any of these potential issues by design matters.
--
Michael
			
		Вложения
On Thu, May 1, 2025 at 4:04 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, May 01, 2025 at 12:15:30PM -0700, Masahiko Sawada wrote:
> > In light of these concerns, I've been contemplating alternative
> > interface designs. One promising approach would involve registering
> > custom copy formats via a C function during module loading
> > (specifically, in _PG_init()). This method would require extension
> > authors to invoke a registration function, say
> > RegisterCustomCopyFormat(), in _PG_init() as follows:
> >
> > JsonLinesFormatId = RegisterCustomCopyFormat("jsonlines",
> >                                              &JsonLinesCopyToRoutine,
> >                                              &JsonLinesCopyFromRoutine);
> >
> > The registration function would validate the format name and store it
> > in TopMemoryContext. It would then return a unique identifier that can
> > be used subsequently to reference the custom copy format extension.
>
> Hmm.  How much should we care about the observability of the COPY
> format used by a given backend?  Storing this information in a
> backend's TopMemoryContext is OK to get the extensibility basics to
> work, but could it make sense to use some shmem state to allocate a
> uint32 ID that could be shared by all backends.  Contrary to EXPLAIN,
> COPY commands usually run for a very long time, so I am wondering if
> these APIs should be designed so as it would be possible to monitor
> the format used.  One layer where the format information could be made
> available is the progress reporting view for COPY, for example.  I can
> also imagine a pgstats kind where we do COPY stats aggregates, with a
> per-format pgstats kind, and sharing a fixed ID across multiple
> backends is relevant (when flushing the stats at shutdown, we would
> use a name/ID mapping like replication slots).
>
> I don't think that this needs to be relevant for the option part, just
> for the format where, I suspect, we should store in a shmem array
> based on the ID allocated the name of the format, the library of the
> callback and the function name fed to load_external_function().
>
> Note that custom LWLock and wait events use a shmem state for
> monitoring purposes, where we are able to do ID->format name lookups
> as much as format->ID lookups.  Perhaps it's OK not to do that for
> COPY, but I am wondering if we'd better design things from scratch
> with states in shmem state knowing that COPY is a long-running
> operation, and that if one mixes multiple formats they would most
> likely want to know which formats are bottlenecks, through SQL.  Cloud
> providers would love that.
Good point. It would make sense to have such information as a map on
shmem. It might be better to use dshash here since a custom copy
format module can be loaded at runtime. Or we can use dynahash with
large enough elements.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
			
		Hi,
In <CAD21AoBuEqcz2_+dpA3WTiDUF=FgudPBKwM+nvH+qHT-k4p5mA@mail.gmail.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Thu, 1 May 2025 12:15:30 -0700,
  Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> One of the primary considerations we need to address is the treatment
> of the specified format name. The current patch set utilizes built-in
> formats (namely 'csv', 'text', and 'binary') when the format name is
> either unqualified or explicitly specified with 'pg_catalog' as the
> schema. In all other cases, we search for custom format handler
> functions based on the search_path. To be frank, I have reservations
> about this interface design, as the dependence of the specified custom
> format name on the search_path could potentially confuse users.
How about requiring schema for all custom formats?
Valid:
  COPY ... TO ... (FORMAT 'text');
  COPY ... TO ... (FORMAT 'my_schema.jsonlines');
Invalid:
  COPY ... TO ... (FORMAT 'jsonlines'); -- no schema
  COPY ... TO ... (FORMAT 'pg_catalog.text'); -- needless schema
If we require "schema" for all custom formats, we don't need
to depend on search_path.
> In light of these concerns, I've been contemplating alternative
> interface designs. One promising approach would involve registering
> custom copy formats via a C function during module loading
> (specifically, in _PG_init()). This method would require extension
> authors to invoke a registration function, say
> RegisterCustomCopyFormat(), in _PG_init() as follows:
> 
> JsonLinesFormatId = RegisterCustomCopyFormat("jsonlines",
>                                              &JsonLinesCopyToRoutine,
>                                              &JsonLinesCopyFromRoutine);
> 
> The registration function would validate the format name and store it
> in TopMemoryContext. It would then return a unique identifier that can
> be used subsequently to reference the custom copy format extension.
I don't object the suggested interface because I don't have
a strong opinion how to implement this feature.
Why do we need to assign a unique ID? For performance? For
RegisterCustomCopyFormatOption()?
I think that we don't need to use it so much in COPY. We
don't need to use format name and assigned ID after we
retrieve a corresponding Copy{To,From}Routine. Because all
needed information are in Copy{To,From}Routine.
>          Extensions could register their own options within this
> framework, for example:
> 
> RegisterCustomCopyFormatOption(JsonLinesFormatId,
>     "custom_option",
>     custom_option_handler);
Can we defer to discuss how to add support for custom
options while we focus on the first implementation? Earlier
patch sets with the current approach had custom options
support but it's removed in the first implementation.
(BTW, I think that it's not a good API because we want COPY
FROM only options and COPY TO only options something like
"compression level".)
> This approach offers several advantages: it would eliminate the
> search_path issue, provide greater flexibility, and potentially
> simplify the overall interface for users and developers alike.
What contributes to the "flexibility"? Developers can call
multiple Register* functions in _PG_Init(), right?
Thanks,
-- 
kou
			
		Hi, In <CAD21AoB82+MoP_RJ=zzhO9KaHK4LbfGjORkre34C7g-xsCdegQ@mail.gmail.com> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 2 May 2025 15:52:49 -0700, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> Hmm. How much should we care about the observability of the COPY >> format used by a given backend? Storing this information in a >> backend's TopMemoryContext is OK to get the extensibility basics to >> work, but could it make sense to use some shmem state to allocate a >> uint32 ID that could be shared by all backends. Contrary to EXPLAIN, >> COPY commands usually run for a very long time, so I am wondering if >> these APIs should be designed so as it would be possible to monitor >> the format used. One layer where the format information could be made >> available is the progress reporting view for COPY, for example. I can >> also imagine a pgstats kind where we do COPY stats aggregates, with a >> per-format pgstats kind, and sharing a fixed ID across multiple >> backends is relevant (when flushing the stats at shutdown, we would >> use a name/ID mapping like replication slots). >> >> I don't think that this needs to be relevant for the option part, just >> for the format where, I suspect, we should store in a shmem array >> based on the ID allocated the name of the format, the library of the >> callback and the function name fed to load_external_function(). >> >> Note that custom LWLock and wait events use a shmem state for >> monitoring purposes, where we are able to do ID->format name lookups >> as much as format->ID lookups. Perhaps it's OK not to do that for >> COPY, but I am wondering if we'd better design things from scratch >> with states in shmem state knowing that COPY is a long-running >> operation, and that if one mixes multiple formats they would most >> likely want to know which formats are bottlenecks, through SQL. Cloud >> providers would love that. > > Good point. It would make sense to have such information as a map on > shmem. It might be better to use dshash here since a custom copy > format module can be loaded at runtime. Or we can use dynahash with > large enough elements. If we don't need to assign an ID for each format, can we avoid it? If we implement it, is this approach more complex than the current table sampling method like approach? Thanks, -- kou
On Fri, May 2, 2025 at 7:20 PM Sutou Kouhei <kou@clear-code.com> wrote:
>
> Hi,
>
> In <CAD21AoBuEqcz2_+dpA3WTiDUF=FgudPBKwM+nvH+qHT-k4p5mA@mail.gmail.com>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Thu, 1 May 2025 12:15:30 -0700,
>   Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> > One of the primary considerations we need to address is the treatment
> > of the specified format name. The current patch set utilizes built-in
> > formats (namely 'csv', 'text', and 'binary') when the format name is
> > either unqualified or explicitly specified with 'pg_catalog' as the
> > schema. In all other cases, we search for custom format handler
> > functions based on the search_path. To be frank, I have reservations
> > about this interface design, as the dependence of the specified custom
> > format name on the search_path could potentially confuse users.
>
> How about requiring schema for all custom formats?
>
> Valid:
>
>   COPY ... TO ... (FORMAT 'text');
>   COPY ... TO ... (FORMAT 'my_schema.jsonlines');
>
> Invalid:
>
>   COPY ... TO ... (FORMAT 'jsonlines'); -- no schema
>   COPY ... TO ... (FORMAT 'pg_catalog.text'); -- needless schema
>
> If we require "schema" for all custom formats, we don't need
> to depend on search_path.
I'm concerned that users cannot use the same format name in the FORMAT
option depending on which schema the handler function is created.
>
> > In light of these concerns, I've been contemplating alternative
> > interface designs. One promising approach would involve registering
> > custom copy formats via a C function during module loading
> > (specifically, in _PG_init()). This method would require extension
> > authors to invoke a registration function, say
> > RegisterCustomCopyFormat(), in _PG_init() as follows:
> >
> > JsonLinesFormatId = RegisterCustomCopyFormat("jsonlines",
> >                                              &JsonLinesCopyToRoutine,
> >                                              &JsonLinesCopyFromRoutine);
> >
> > The registration function would validate the format name and store it
> > in TopMemoryContext. It would then return a unique identifier that can
> > be used subsequently to reference the custom copy format extension.
>
> I don't object the suggested interface because I don't have
> a strong opinion how to implement this feature.
>
> Why do we need to assign a unique ID? For performance? For
> RegisterCustomCopyFormatOption()?
I think it's required for monitoring purposes for example. For
instance, we can set the format ID in the progress information and the
progress view can fetch the format name by the ID so that users can
see what format is being used in the COPY command.
>
> >          Extensions could register their own options within this
> > framework, for example:
> >
> > RegisterCustomCopyFormatOption(JsonLinesFormatId,
> >     "custom_option",
> >     custom_option_handler);
>
> Can we defer to discuss how to add support for custom
> options while we focus on the first implementation? Earlier
> patch sets with the current approach had custom options
> support but it's removed in the first implementation.
I think we can skip the custom option patch for the first
implementation but still need to discuss how we will be able to
implement it to understand the big picture of this feature. Otherwise
we could end up going the wrong direction.
>
> (BTW, I think that it's not a good API because we want COPY
> FROM only options and COPY TO only options something like
> "compression level".)
Why does this matter in terms of API? I think that even with this API
we can pass is_from to the option handler function so that it
validates the option based on it.
>
> > This approach offers several advantages: it would eliminate the
> > search_path issue, provide greater flexibility, and potentially
> > simplify the overall interface for users and developers alike.
>
> What contributes to the "flexibility"? Developers can call
> multiple Register* functions in _PG_Init(), right?
I think that with a tablesample-like approach we need to do everything
based on one handler function and callbacks returned from it whereas
there is no such limitation with C API style.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
			
		Hi, In <CAD21AoBGRFStdVbHUcxL0QB8wn92J3Sn-6x=RhsSMuhepRH0NQ@mail.gmail.com> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 2 May 2025 21:38:32 -0700, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> How about requiring schema for all custom formats? >> >> Valid: >> >> COPY ... TO ... (FORMAT 'text'); >> COPY ... TO ... (FORMAT 'my_schema.jsonlines'); >> >> Invalid: >> >> COPY ... TO ... (FORMAT 'jsonlines'); -- no schema >> COPY ... TO ... (FORMAT 'pg_catalog.text'); -- needless schema >> >> If we require "schema" for all custom formats, we don't need >> to depend on search_path. > > I'm concerned that users cannot use the same format name in the FORMAT > option depending on which schema the handler function is created. I'm not sure that it's a problem or not. If users want to use the same format name, they can install the handler function to the same schema. >> Why do we need to assign a unique ID? For performance? For >> RegisterCustomCopyFormatOption()? > > I think it's required for monitoring purposes for example. For > instance, we can set the format ID in the progress information and the > progress view can fetch the format name by the ID so that users can > see what format is being used in the COPY command. How about setting the format name instead of the format ID in the progress information? > I think we can skip the custom option patch for the first > implementation but still need to discuss how we will be able to > implement it to understand the big picture of this feature. Otherwise > we could end up going the wrong direction. I think that we don't need to discuss it deeply because we have many options with this approach. We can call C functions in _PG_Init(). I think that this feature will not be a blocker of this approach. >> (BTW, I think that it's not a good API because we want COPY >> FROM only options and COPY TO only options something like >> "compression level".) > > Why does this matter in terms of API? I think that even with this API > we can pass is_from to the option handler function so that it > validates the option based on it. If we choose the API, each custom format developer needs to handle the case in handler function. For example, if we pass information whether this option is only for TO to PostgreSQL, ProcessCopyOptions() not handler functions can handle it. Anyway, I think that we don't need to discuss this deeply for now. >> What contributes to the "flexibility"? Developers can call >> multiple Register* functions in _PG_Init(), right? > > I think that with a tablesample-like approach we need to do everything > based on one handler function and callbacks returned from it whereas > there is no such limitation with C API style. Thanks for clarifying it. It seems that my understanding is correct. I hope that the flexibility is needed flexibility and too much flexibility doesn't introduce too much complexity. Thanks, -- kou
Re: Make COPY format extendable: Extract COPY TO format implementations
От
 
		    	"David G. Johnston"
		    Дата:
		        
			
				On Thursday, May 1, 2025, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
			
		
		
	
In light of these concerns, I've been contemplating alternative
interface designs. One promising approach would involve registering
custom copy formats via a C function during module loading
(specifically, in _PG_init()). This method would require extension
authors to invoke a registration function, say
RegisterCustomCopyFormat(), in _PG_init() as follows:
JsonLinesFormatId = RegisterCustomCopyFormat("jsonlines", 
&JsonLinesCopyToRoutine,
&JsonLinesCopyFromRoutine);
The registration function would validate the format name and store it
in TopMemoryContext. It would then return a unique identifier that can
be used subsequently to reference the custom copy format extension.
How does this fix the search_path concern?  Are query writers supposed to put JsonLinesFormatId into their queries?  Or are you just prohibiting a DBA from ever installing an extension that wants to register a format name that is already registered so that no namespace is ever required?
ISTM accommodating a namespace for formats is required just like we do for virtually every other named object in the system.  At least, if we want to play nice with extension authors.  It doesn’t have to be within the existing pg_proc scope, we can create a new scope if desired, but abolishing it seems unwise.
It would be more consistent with established policy if we didn’t make exceptions for text/csv/binary - if the DBA permits a text format to exist in a different schema and that schema appears first in the search_path, unqualified references to text would resolve to the non-core handler.  We already protect ourselves with safe search_paths.  This is really no different than if someone wanted to implement a now() function and people are putting pg_catalog from of existing usage.  It’s the DBAs problem, not ours.
David J.
On Fri, May 2, 2025 at 10:36 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:
>
> On Thursday, May 1, 2025, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>
>>
>> In light of these concerns, I've been contemplating alternative
>> interface designs. One promising approach would involve registering
>> custom copy formats via a C function during module loading
>> (specifically, in _PG_init()). This method would require extension
>> authors to invoke a registration function, say
>> RegisterCustomCopyFormat(), in _PG_init() as follows:
>>
>> JsonLinesFormatId = RegisterCustomCopyFormat("jsonlines",
>>                                              &JsonLinesCopyToRoutine,
>>                                              &JsonLinesCopyFromRoutine);
>>
>> The registration function would validate the format name and store it
>> in TopMemoryContext. It would then return a unique identifier that can
>> be used subsequently to reference the custom copy format extension.
>
>
> How does this fix the search_path concern?  Are query writers supposed to put JsonLinesFormatId into their queries?
Orare you just prohibiting a DBA from ever installing an extension that wants to register a format name that is already
registeredso that no namespace is ever required? 
Users can specify "jsonlines", passed in the first argument to the
register function, to the COPY FORMAT option in this case.  While
JsonLinesFormatId is reserved for internal operations such as module
processing and monitoring, any attempt to load another custom COPY
format module named 'jsonlines' will result in an error.
> ISTM accommodating a namespace for formats is required just like we do for virtually every other named object in the
system. At least, if we want to play nice with extension authors.  It doesn’t have to be within the existing pg_proc
scope,we can create a new scope if desired, but abolishing it seems unwise. 
>
> It would be more consistent with established policy if we didn’t make exceptions for text/csv/binary - if the DBA
permitsa text format to exist in a different schema and that schema appears first in the search_path, unqualified
referencesto text would resolve to the non-core handler.  We already protect ourselves with safe search_paths.  This is
reallyno different than if someone wanted to implement a now() function and people are putting pg_catalog from of
existingusage.  It’s the DBAs problem, not ours. 
I'm concerned about allowing multiple 'text' format implementations
with identical names within the database, as this could lead to
considerable confusion. When users specify 'text', it would be more
logical to guarantee that the built-in 'text' format is consistently
used. This principle aligns with other customizable components, such
as custom resource managers, wait events, lightweight locks, and
custom scans. These components maintain their built-in data/types and
explicitly prevent the registration of duplicate names.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
			
		On Fri, May 2, 2025 at 9:56 PM Sutou Kouhei <kou@clear-code.com> wrote: > > Hi, > > In <CAD21AoBGRFStdVbHUcxL0QB8wn92J3Sn-6x=RhsSMuhepRH0NQ@mail.gmail.com> > "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 2 May 2025 21:38:32 -0700, > Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > >> How about requiring schema for all custom formats? > >> > >> Valid: > >> > >> COPY ... TO ... (FORMAT 'text'); > >> COPY ... TO ... (FORMAT 'my_schema.jsonlines'); > >> > >> Invalid: > >> > >> COPY ... TO ... (FORMAT 'jsonlines'); -- no schema > >> COPY ... TO ... (FORMAT 'pg_catalog.text'); -- needless schema > >> > >> If we require "schema" for all custom formats, we don't need > >> to depend on search_path. > > > > I'm concerned that users cannot use the same format name in the FORMAT > > option depending on which schema the handler function is created. > > I'm not sure that it's a problem or not. If users want to > use the same format name, they can install the handler > function to the same schema. > > >> Why do we need to assign a unique ID? For performance? For > >> RegisterCustomCopyFormatOption()? > > > > I think it's required for monitoring purposes for example. For > > instance, we can set the format ID in the progress information and the > > progress view can fetch the format name by the ID so that users can > > see what format is being used in the COPY command. > > How about setting the format name instead of the format ID > in the progress information? The progress view can know only numbers. We need to extend the progress view infrastructure so that we can pass other data types. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Hi, In <CAD21AoDnY2fhC7tp7jpn24AuwkeW-0YjFEtZbEfPwg8YcH6bAw@mail.gmail.com> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 2 May 2025 23:02:25 -0700, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > The progress view can know only numbers. We need to extend the > progress view infrastructure so that we can pass other data types. Sorry. Could you tell me what APIs referred here? pgstat_progress_*() functions in src/include/utils/backend_progress.h? Thanks, -- kou
Re: Make COPY format extendable: Extract COPY TO format implementations
От
 
		    	"David G. Johnston"
		    Дата:
		        
			
				On Friday, May 2, 2025, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
			
		
		
	
I'm concerned about allowing multiple 'text' format implementations
with identical names within the database, as this could lead to
considerable confusion. When users specify 'text', it would be more
logical to guarantee that the built-in 'text' format is consistently
used.
Do you want to only give text/csv/binary this special treatment or also any future format name we ever decide to implement in core.  If an extension takes up “xml” and we try to do that in core do we fail an upgrade because of the conflict, and make it impossible to actually use said extension? 
This principle aligns with other customizable components, such
as custom resource managers, wait events, lightweight locks, and
custom scans. These components maintain their built-in data/types and
explicitly prevent the registration of duplicate names.
I am totally lost on how any of those resemble this feature.
I’m all for registration to enable additional options and features - but am against moving away from turning format into a namespaced identifier.  This is a query-facing feature where namespaces are common and fundamentally required.  I have some sympathy for the fact that until now one could not prefix text/binary/csv with pg_catalog to be fully safe, but in reality DBAs/query authors either put pg_catalog first in their search_path or make an informed decision when they deviate.  That is the established precedent relevant to this feature.  The power, and responsibility for education, lies with the user.
David J.
On Fri, May 2, 2025 at 11:20 PM Sutou Kouhei <kou@clear-code.com> wrote:
>
> Hi,
>
> In <CAD21AoDnY2fhC7tp7jpn24AuwkeW-0YjFEtZbEfPwg8YcH6bAw@mail.gmail.com>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 2 May 2025 23:02:25 -0700,
>   Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> > The progress view can know only numbers. We need to extend the
> > progress view infrastructure so that we can pass other data types.
>
> Sorry. Could you tell me what APIs referred here?
> pgstat_progress_*() functions in
> src/include/utils/backend_progress.h?
The progress information is stored in PgBackendStatus defined in
backend_status.h:
    /*
     * Command progress reporting.  Any command which wishes can advertise
     * that it is running by setting st_progress_command,
     * st_progress_command_target, and st_progress_param[].
     * st_progress_command_target should be the OID of the relation which the
     * command targets (we assume there's just one, as this is meant for
     * utility commands), but the meaning of each element in the
     * st_progress_param array is command-specific.
     */
    ProgressCommandType st_progress_command;
    Oid         st_progress_command_target;
    int64       st_progress_param[PGSTAT_NUM_PROGRESS_PARAM];
Then the progress view maps the numbers to the corresponding strings:
CREATE VIEW pg_stat_progress_copy AS
    SELECT
        S.pid AS pid, S.datid AS datid, D.datname AS datname,
        S.relid AS relid,
        CASE S.param5 WHEN 1 THEN 'COPY FROM'
                      WHEN 2 THEN 'COPY TO'
                      END AS command,
        CASE S.param6 WHEN 1 THEN 'FILE'
                      WHEN 2 THEN 'PROGRAM'
                      WHEN 3 THEN 'PIPE'
                      WHEN 4 THEN 'CALLBACK'
                      END AS "type",
        S.param1 AS bytes_processed,
        S.param2 AS bytes_total,
        S.param3 AS tuples_processed,
        S.param4 AS tuples_excluded,
        S.param7 AS tuples_skipped
    FROM pg_stat_get_progress_info('COPY') AS S
        LEFT JOIN pg_database D ON S.datid = D.oid;
So the idea is that the backend process sets the format ID somewhere
in st_progress_param, and then the progress view calls a SQL function,
say pg_stat_get_copy_format_name(), with the format ID that returns
the corresponding format name.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
			
		On Fri, May 2, 2025 at 11:37 PM David G. Johnston <david.g.johnston@gmail.com> wrote: > > On Friday, May 2, 2025, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> >> >> I'm concerned about allowing multiple 'text' format implementations >> with identical names within the database, as this could lead to >> considerable confusion. When users specify 'text', it would be more >> logical to guarantee that the built-in 'text' format is consistently >> used. > > > Do you want to only give text/csv/binary this special treatment or also any future format name we ever decide to implementin core. If an extension takes up “xml” and we try to do that in core do we fail an upgrade because of the conflict,and make it impossible to actually use said extension? I guess that's an extension author's responsibility to upgrade its extension so as to work with the new PostgreSQL version, or carefully choose the format name. They can even name '[extension_name].[format_name]' as a format name. Even with the current patch design (i.e., search_path affects handler function lookups), users would end up using the built-in 'xml' format without notice after upgrade, no? I guess that could introduce another problem. I think that we need to ensure that if users specify text/csv/binary the built-in formats are always used, to keep backward compatibility. > >> This principle aligns with other customizable components, such >> as custom resource managers, wait events, lightweight locks, and >> custom scans. These components maintain their built-in data/types and >> explicitly prevent the registration of duplicate names. > > > I am totally lost on how any of those resemble this feature. > > I’m all for registration to enable additional options and features - but am against moving away from turning format intoa namespaced identifier. This is a query-facing feature where namespaces are common and fundamentally required. That's a fair concern. But isn't the format name ultimately just an option value, but not like a database object? As I mentioned above, I think we need to keep backward compatibility but treating the built-in formats special seems inconsistent with common name resolution behavior. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: Make COPY format extendable: Extract COPY TO format implementations
От
 
		    	"David G. Johnston"
		    Дата:
		        
			
				On Saturday, May 3, 2025, Masahiko Sawada <sawada.mshk@gmail.com> wrote: 
 
			
		
		
	I think that we need to ensure that if users specify text/csv/binary
the built-in formats are always used, to keep backward compatibility.
That was my original thinking, but it’s inconsistent with how functions behave today.  We don’t promise that installing extensions won’t cause existing code to change.
> I’m all for registration to enable additional options and features - but am against moving away from turning format into a namespaced identifier. This is a query-facing feature where namespaces are common and fundamentally required.
That's a fair concern. But isn't the format name ultimately just an
option value, but not like a database object?
We get to decide that.  And deciding in favor of “extensible database object in a namespace’ makes more sense - leveraging all that pre-existing design to play more nicely with extensions and give DBAs control.  The SQL command to add one is “create function” instead of “create copy format”.
David J.
On Sat, May 3, 2025 at 7:42 AM David G. Johnston <david.g.johnston@gmail.com> wrote: > > On Saturday, May 3, 2025, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > >> >> I think that we need to ensure that if users specify text/csv/binary >> the built-in formats are always used, to keep backward compatibility. > > > That was my original thinking, but it’s inconsistent with how functions behave today. We don’t promise that installingextensions won’t cause existing code to change. I'm skeptical about whether that's an acceptable backward compatibility breakage. >> > I’m all for registration to enable additional options and features - but am against moving away from turning formatinto a namespaced identifier. This is a query-facing feature where namespaces are common and fundamentally required. >> >> That's a fair concern. But isn't the format name ultimately just an >> option value, but not like a database object? > > > We get to decide that. And deciding in favor of “extensible database object in a namespace’ makes more sense - leveragingall that pre-existing design to play more nicely with extensions and give DBAs control. The SQL command to addone is “create function” instead of “create copy format”. I still don't fully understand why the FORMAT value alone needs to be treated like a schema-qualified object. If the concern is about name conflict with future built-in formats, I would argue that the same concern applies to custom EXPLAIN options and logical decoding plugins. To me, the benefit of treating the COPY FORMAT value as a schema-qualified object seems limited. Meanwhile, the risk of not protecting built-in formats like 'text', 'csv', and 'binary' is significant. If those names can be shadowed by extension via search_patch, we lose backward compatibility. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: Make COPY format extendable: Extract COPY TO format implementations
От
 
		    	"David G. Johnston"
		    Дата:
		        
			
				On Saturday, May 3, 2025, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
 
 
 
			
		
		
	On Sat, May 3, 2025 at 7:42 AM David G. Johnston
<david.g.johnston@gmail.com> wrote:
>
> On Saturday, May 3, 2025, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
>>
>> I think that we need to ensure that if users specify text/csv/binary
>> the built-in formats are always used, to keep backward compatibility.
>
>
> That was my original thinking, but it’s inconsistent with how functions behave today. We don’t promise that installing extensions won’t cause existing code to change.
I'm skeptical about whether that's an acceptable backward
compatibility breakage.
I’m skeptical you are correctly defining what backward-compatibility requires.
Well, the only potential breakage is that we are searching for a matching function by signature without first limiting the mandated return type.  But that is solve-able should anyone else see the problem as well.
The global format name has its merits but neither it nor the namespaced format option suffer from breaking compatibility or policy.
I still don't fully understand why the FORMAT value alone needs to be
treated like a schema-qualified object. If the concern is about name
conflict with future built-in formats, I would argue that the same
concern applies to custom EXPLAIN options and logical decoding
plugins.
Then maybe we have the same “problem” in those places.
To me, the benefit of treating the COPY FORMAT value as a
schema-qualified object seems limited. Meanwhile, the risk of not
protecting built-in formats like 'text', 'csv', and 'binary' is
significant.
Really? You think lots of extensions are going to choose to use these values even if they are permitted?  Or are you concerned about attack surfaces?
If those names can be shadowed by extension via
search_patch, we lose backward compatibility.
This is not a definition of backward-compatibility that I am familiar with.
If anything the ability for a DBA to arrange for such shadowing would be a feature enhancement.  They can drop-in a more efficient or desirable implementation without having to change query code.
In any case, I’m doubtful either of us can make a convincing enough argument to sway the other fully.  Both options are plausible, IMO.  Others need to chime in.
David J.
Hi,
In <CAD21AoD9CBjh4u6jdiE0tG-jvejw-GJN8fUPoQSVhKh36HW2NQ@mail.gmail.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 2 May 2025 23:37:46 -0700,
  Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> The progress information is stored in PgBackendStatus defined in
> backend_status.h:
> 
>     /*
>      * Command progress reporting.  Any command which wishes can advertise
>      * that it is running by setting st_progress_command,
>      * st_progress_command_target, and st_progress_param[].
>      * st_progress_command_target should be the OID of the relation which the
>      * command targets (we assume there's just one, as this is meant for
>      * utility commands), but the meaning of each element in the
>      * st_progress_param array is command-specific.
>      */
>     ProgressCommandType st_progress_command;
>     Oid         st_progress_command_target;
>     int64       st_progress_param[PGSTAT_NUM_PROGRESS_PARAM];
> 
> Then the progress view maps the numbers to the corresponding strings:
> 
> CREATE VIEW pg_stat_progress_copy AS
>     SELECT
>         S.pid AS pid, S.datid AS datid, D.datname AS datname,
>         S.relid AS relid,
>         CASE S.param5 WHEN 1 THEN 'COPY FROM'
>                       WHEN 2 THEN 'COPY TO'
>                       END AS command,
>         CASE S.param6 WHEN 1 THEN 'FILE'
>                       WHEN 2 THEN 'PROGRAM'
>                       WHEN 3 THEN 'PIPE'
>                       WHEN 4 THEN 'CALLBACK'
>                       END AS "type",
>         S.param1 AS bytes_processed,
>         S.param2 AS bytes_total,
>         S.param3 AS tuples_processed,
>         S.param4 AS tuples_excluded,
>         S.param7 AS tuples_skipped
>     FROM pg_stat_get_progress_info('COPY') AS S
>         LEFT JOIN pg_database D ON S.datid = D.oid;
Thanks. I didn't know about how to implement
pg_stat_progress_copy.
> So the idea is that the backend process sets the format ID somewhere
> in st_progress_param, and then the progress view calls a SQL function,
> say pg_stat_get_copy_format_name(), with the format ID that returns
> the corresponding format name.
Does it work when we use session_preload_libraries or the
LOAD command? If we have 2 sessions and both of them load
"jsonlines" COPY FORMAT extensions, what will be happened?
For example:
1. Session 1: Register "jsonlines"
2. Session 2: Register "jsonlines"
              (Should global format ID <-> format name mapping
              be updated?)
3. Session 2: Close this session.
              Unregister "jsonlines".
              (Can we unregister COPY FORMAT extension?)
              (Should global format ID <-> format name mapping
              be updated?)
4. Session 1: Close this session.
              Unregister "jsonlines".
              (Can we unregister COPY FORMAT extension?)
              (Should global format ID <-> format name mapping
              be updated?)
Thanks,
-- 
kou
			
		Hi,
In <CAKFQuwaRDXANaL+QcT6LZRAem4rwkSwv9v+viv_mcR+Rex3quA@mail.gmail.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Sat, 3 May 2025 22:27:36 -0700,
  "David G. Johnston" <david.g.johnston@gmail.com> wrote:
> In any case, I’m doubtful either of us can make a convincing enough
> argument to sway the other fully.  Both options are plausible, IMO.  Others
> need to chime in.
I may misunderstand but here is the current summary, right?
Proposed approaches to register custom COPY formats:
a. Create a function that has the same name of custom COPY
   format
b. Call a register function from _PG_init()
FYI: I proposed c. approach that uses a. but it always
requires schema name for format name in other e-mail.
Users can register the same format name:
a. Yes
   * Users can distinct the same format name by schema name
   * If format name doesn't have schema name, the used
     format depends on search_path
     * Pros:
       * Using schema for it is consistent with other
         PostgreSQL mechanisms
       * Custom format never conflict with built-in
         format. For example, an extension register "xml" and
         PostgreSQL adds "xml" later, they are never
         conflicted because PostgreSQL's "xml" is registered
         to pg_catalog.
     * Cons: Different format may be used with the same
       input. For example, "jsonlines" may choose
       "jsonlines" implemented by extension X or implemented
       by extension Y when search_path is different.
b. No
   * Users can use "${schema}.${name}" for format name
     that mimics PostgreSQL's builtin schema (but it's just
     a string)
Built-in formats (text/csv/binary) should be able to
overwritten by extensions:
a. (The current patch is no but David's answer is) Yes
   * Pros: Users can use drop-in replacement faster
     implementation without changing input
   * Cons: Users may overwrite them accidentally.
     It may break pg_dump result.
     (This is called as "backward incompatibility.")
b. No
Are there any missing or wrong items? If we can summarize
the current discussion here correctly, others will be able
to chime in this discussion. (At least I can do it.)
Thanks,
-- 
kou
			
		On Fri, May 9, 2025 at 2:41 AM Sutou Kouhei <kou@clear-code.com> wrote:
>
> Hi,
>
> In <CAKFQuwaRDXANaL+QcT6LZRAem4rwkSwv9v+viv_mcR+Rex3quA@mail.gmail.com>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Sat, 3 May 2025 22:27:36 -0700,
>   "David G. Johnston" <david.g.johnston@gmail.com> wrote:
>
> > In any case, I’m doubtful either of us can make a convincing enough
> > argument to sway the other fully.  Both options are plausible, IMO.  Others
> > need to chime in.
>
> I may misunderstand but here is the current summary, right?
Thank you for summarizing the discussion.
>
> Proposed approaches to register custom COPY formats:
> a. Create a function that has the same name of custom COPY
>    format
> b. Call a register function from _PG_init()
>
> FYI: I proposed c. approach that uses a. but it always
> requires schema name for format name in other e-mail.
With approach (c), do you mean that we require users to change all
FORMAT option values like from 'text' to 'pg_catalog.text' after the
upgrade? Or are we exempt the built-in formats?
>
> Users can register the same format name:
> a. Yes
>    * Users can distinct the same format name by schema name
>    * If format name doesn't have schema name, the used
>      format depends on search_path
>      * Pros:
>        * Using schema for it is consistent with other
>          PostgreSQL mechanisms
>        * Custom format never conflict with built-in
>          format. For example, an extension register "xml" and
>          PostgreSQL adds "xml" later, they are never
>          conflicted because PostgreSQL's "xml" is registered
>          to pg_catalog.
>      * Cons: Different format may be used with the same
>        input. For example, "jsonlines" may choose
>        "jsonlines" implemented by extension X or implemented
>        by extension Y when search_path is different.
> b. No
>    * Users can use "${schema}.${name}" for format name
>      that mimics PostgreSQL's builtin schema (but it's just
>      a string)
>
>
> Built-in formats (text/csv/binary) should be able to
> overwritten by extensions:
> a. (The current patch is no but David's answer is) Yes
>    * Pros: Users can use drop-in replacement faster
>      implementation without changing input
>    * Cons: Users may overwrite them accidentally.
>      It may break pg_dump result.
>      (This is called as "backward incompatibility.")
> b. No
The summary matches my understanding. I think the second point is
important. If we go with a tablesample-like API, I agree with David's
point that all FORMAT values including the built-in formats should
depend on the search_path value. While it provides a similar user
experience to other database objects, there is a possibility that a
COPY with built-in format could work differently on v19 than v18 or
earlier depending on the search_path value.
> Are there any missing or wrong items?
I think the approach (b) provides more flexibility than (a) in terms
of API design as with (a) we need to do everything based on one
handler function and callbacks.
> If we can summarize
> the current discussion here correctly, others will be able
> to chime in this discussion. (At least I can do it.)
+1
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
			
		On Fri, May 9, 2025 at 1:51 AM Sutou Kouhei <kou@clear-code.com> wrote:
>
> Hi,
>
> In <CAD21AoD9CBjh4u6jdiE0tG-jvejw-GJN8fUPoQSVhKh36HW2NQ@mail.gmail.com>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 2 May 2025 23:37:46 -0700,
>   Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> > The progress information is stored in PgBackendStatus defined in
> > backend_status.h:
> >
> >     /*
> >      * Command progress reporting.  Any command which wishes can advertise
> >      * that it is running by setting st_progress_command,
> >      * st_progress_command_target, and st_progress_param[].
> >      * st_progress_command_target should be the OID of the relation which the
> >      * command targets (we assume there's just one, as this is meant for
> >      * utility commands), but the meaning of each element in the
> >      * st_progress_param array is command-specific.
> >      */
> >     ProgressCommandType st_progress_command;
> >     Oid         st_progress_command_target;
> >     int64       st_progress_param[PGSTAT_NUM_PROGRESS_PARAM];
> >
> > Then the progress view maps the numbers to the corresponding strings:
> >
> > CREATE VIEW pg_stat_progress_copy AS
> >     SELECT
> >         S.pid AS pid, S.datid AS datid, D.datname AS datname,
> >         S.relid AS relid,
> >         CASE S.param5 WHEN 1 THEN 'COPY FROM'
> >                       WHEN 2 THEN 'COPY TO'
> >                       END AS command,
> >         CASE S.param6 WHEN 1 THEN 'FILE'
> >                       WHEN 2 THEN 'PROGRAM'
> >                       WHEN 3 THEN 'PIPE'
> >                       WHEN 4 THEN 'CALLBACK'
> >                       END AS "type",
> >         S.param1 AS bytes_processed,
> >         S.param2 AS bytes_total,
> >         S.param3 AS tuples_processed,
> >         S.param4 AS tuples_excluded,
> >         S.param7 AS tuples_skipped
> >     FROM pg_stat_get_progress_info('COPY') AS S
> >         LEFT JOIN pg_database D ON S.datid = D.oid;
>
> Thanks. I didn't know about how to implement
> pg_stat_progress_copy.
>
> > So the idea is that the backend process sets the format ID somewhere
> > in st_progress_param, and then the progress view calls a SQL function,
> > say pg_stat_get_copy_format_name(), with the format ID that returns
> > the corresponding format name.
>
> Does it work when we use session_preload_libraries or the
> LOAD command? If we have 2 sessions and both of them load
> "jsonlines" COPY FORMAT extensions, what will be happened?
>
> For example:
>
> 1. Session 1: Register "jsonlines"
> 2. Session 2: Register "jsonlines"
>               (Should global format ID <-> format name mapping
>               be updated?)
> 3. Session 2: Close this session.
>               Unregister "jsonlines".
>               (Can we unregister COPY FORMAT extension?)
>               (Should global format ID <-> format name mapping
>               be updated?)
> 4. Session 1: Close this session.
>               Unregister "jsonlines".
>               (Can we unregister COPY FORMAT extension?)
>               (Should global format ID <-> format name mapping
>               be updated?)
I imagine that only for progress reporting purposes, I think session 1
and 2 can have different format IDs for the same 'jsonlines' if they
load it by LOAD command. They can advertise the format IDs on the
shmem and we can also provide a SQL function for the progress view
that can get the format name by the format ID.
Considering the possibility that we might want to use the format ID
also in the cumulative statistics, we might want to strictly provide
the unique format ID for each custom format as the format IDs are
serialized to the pgstat file. One possible way to implement it is
that we manage the custom format IDs in a wiki page like we do for
custom cumulative statistics and custom RMGR[1][2]. That is, a custom
format extension registers the format name along with the format ID
that is pre-registered in the wiki page or the format ID (e.g. 128)
indicating under development. If either the format name or format ID
conflict with an already registered custom format extension, the
registration function raises an error. And we preallocate enough
format IDs for built-in formats.
As for unregistration, I think that  even if we provide an
unregisteration API, it ultimately depends on whether or not custom
format extensions call it in _PG_fini().
Regards,
[1] https://wiki.postgresql.org/wiki/CustomCumulativeStats
[2] https://wiki.postgresql.org/wiki/CustomWALResourceManagers
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
			
		Hi,
In <CAD21AoBrSTmPyDai_QVR-XOe7PL722Dazm70A+FpvGy2hfSV9g@mail.gmail.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 9 May 2025 17:57:35 -0700,
  Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> Proposed approaches to register custom COPY formats:
>> a. Create a function that has the same name of custom COPY
>>    format
>> b. Call a register function from _PG_init()
>>
>> FYI: I proposed c. approach that uses a. but it always
>> requires schema name for format name in other e-mail.
> 
> With approach (c), do you mean that we require users to change all
> FORMAT option values like from 'text' to 'pg_catalog.text' after the
> upgrade? Or are we exempt the built-in formats?
The latter. 'text' must be accepted because existing pg_dump
results use 'text'. If we reject 'text', it's a big
incompatibility. (We can't dump on old PostgreSQL and
restore to new PostgreSQL.)
>> Users can register the same format name:
>> a. Yes
>>    * Users can distinct the same format name by schema name
>>    * If format name doesn't have schema name, the used
>>      format depends on search_path
>>      * Pros:
>>        * Using schema for it is consistent with other
>>          PostgreSQL mechanisms
>>        * Custom format never conflict with built-in
>>          format. For example, an extension register "xml" and
>>          PostgreSQL adds "xml" later, they are never
>>          conflicted because PostgreSQL's "xml" is registered
>>          to pg_catalog.
>>      * Cons: Different format may be used with the same
>>        input. For example, "jsonlines" may choose
>>        "jsonlines" implemented by extension X or implemented
>>        by extension Y when search_path is different.
>> b. No
>>    * Users can use "${schema}.${name}" for format name
>>      that mimics PostgreSQL's builtin schema (but it's just
>>      a string)
>>
>>
>> Built-in formats (text/csv/binary) should be able to
>> overwritten by extensions:
>> a. (The current patch is no but David's answer is) Yes
>>    * Pros: Users can use drop-in replacement faster
>>      implementation without changing input
>>    * Cons: Users may overwrite them accidentally.
>>      It may break pg_dump result.
>>      (This is called as "backward incompatibility.")
>> b. No
> 
> The summary matches my understanding. I think the second point is
> important. If we go with a tablesample-like API, I agree with David's
> point that all FORMAT values including the built-in formats should
> depend on the search_path value. While it provides a similar user
> experience to other database objects, there is a possibility that a
> COPY with built-in format could work differently on v19 than v18 or
> earlier depending on the search_path value.
Thanks for sharing additional points.
David said that the additional point case is a
responsibility or DBA not PostgreSQL, right?
As I already said, I don't have a strong opinion on which
approach is better. My opinion for the (important) second
point is no. I feel that the pros of a. isn't realistic. If
users want to improve text/csv/binary performance (or
something), they should improve PostgreSQL itself instead of
replacing it as an extension. (Or they should create another
custom copy format such as "faster_text" not "text".)
So I'm OK with the approach b.
>> Are there any missing or wrong items?
> 
> I think the approach (b) provides more flexibility than (a) in terms
> of API design as with (a) we need to do everything based on one
> handler function and callbacks.
Thanks for sharing this missing point.
I have a concern that the flexibility may introduce needless
complexity. If it's not a real concern, I'm OK with the
approach b.
>> If we can summarize
>> the current discussion here correctly, others will be able
>> to chime in this discussion. (At least I can do it.)
> 
> +1
Are there any more people who are interested in custom COPY
FORMAT implementation design? If no more people, let's
decide it by us.
Thanks,
-- 
kou
			
		Hi, In <CAD21AoAY_h-9nuhs14e3cyO_A2rH7==zuq+NPHkn9ggwyaXnPQ@mail.gmail.com> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 9 May 2025 21:29:23 -0700, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> > So the idea is that the backend process sets the format ID somewhere >> > in st_progress_param, and then the progress view calls a SQL function, >> > say pg_stat_get_copy_format_name(), with the format ID that returns >> > the corresponding format name. >> >> Does it work when we use session_preload_libraries or the >> LOAD command? If we have 2 sessions and both of them load >> "jsonlines" COPY FORMAT extensions, what will be happened? >> >> For example: >> >> 1. Session 1: Register "jsonlines" >> 2. Session 2: Register "jsonlines" >> (Should global format ID <-> format name mapping >> be updated?) >> 3. Session 2: Close this session. >> Unregister "jsonlines". >> (Can we unregister COPY FORMAT extension?) >> (Should global format ID <-> format name mapping >> be updated?) >> 4. Session 1: Close this session. >> Unregister "jsonlines". >> (Can we unregister COPY FORMAT extension?) >> (Should global format ID <-> format name mapping >> be updated?) > > I imagine that only for progress reporting purposes, I think session 1 > and 2 can have different format IDs for the same 'jsonlines' if they > load it by LOAD command. They can advertise the format IDs on the > shmem and we can also provide a SQL function for the progress view > that can get the format name by the format ID. > > Considering the possibility that we might want to use the format ID > also in the cumulative statistics, we might want to strictly provide > the unique format ID for each custom format as the format IDs are > serialized to the pgstat file. One possible way to implement it is > that we manage the custom format IDs in a wiki page like we do for > custom cumulative statistics and custom RMGR[1][2]. That is, a custom > format extension registers the format name along with the format ID > that is pre-registered in the wiki page or the format ID (e.g. 128) > indicating under development. If either the format name or format ID > conflict with an already registered custom format extension, the > registration function raises an error. And we preallocate enough > format IDs for built-in formats. > > As for unregistration, I think that even if we provide an > unregisteration API, it ultimately depends on whether or not custom > format extensions call it in _PG_fini(). Thanks for sharing your idea. With the former ID issuing approach, it seems that we need a global format ID <-> name mapping and a per session registered format name list. The custom COPY FORMAT register function rejects the same format name, right? If we support both of shared_preload_libraries and session_preload_libraries/LOAD, we have different life time custom formats. It may introduce a complexity with the ID issuing approach. With the latter static ID approach, how to implement a function that converts format ID to format name? PostgreSQL itself doesn't know ID <-> name mapping in the Wiki page. It seems that custom COPY FORMAT implementation needs to register its name to PostgreSQL by itself. Thanks, -- kou
On Mon, May 26, 2025 at 10:04:05AM +0900, Sutou Kouhei wrote: > As I already said, I don't have a strong opinion on which > approach is better. My opinion for the (important) second > point is no. I feel that the pros of a. isn't realistic. If > users want to improve text/csv/binary performance (or > something), they should improve PostgreSQL itself instead of > replacing it as an extension. (Or they should create another > custom copy format such as "faster_text" not "text".) Patches welcome. Andres may have a TODO board regarding that, I think. > So I'm OK with the approach b. Here is an opinion. Approach (b), that uses _PG_init() a function to register a custom format has the merit to be simple to implement and secure by "design", because it depends only on the fact that we can do a lookup based on the string defined in one or more DefElems. Adding a dependendy to search_path as you say could lead to surprising results. Using a shared ID when a COPY method is registered (like extension wait events) or an ID that's static in a backend (like EXPLAIN extensibility does) is an implementation difference that can be useful for monitoring, and only that AFAIK. If you want to implement method-based statistics for COPY, you will want to allocate one stats kind for each COPY method, because the stats stored will be aggregates of the COPY methods. The stats kind ID is something that should not be linked to the COPY method ID, because the stats kind ID is registered in its own dedicated path, and it would be hardcoded in the library where the COPY callbacks are defined. So you could have a stats kind with a fixed ID, and a COPY method ID that's linked to each backend like EXPLAIN does. One factor to take into account is how much freedom we are OK with giving to users when it comes to the deployment of custom COPY methods, and how popular these would be. Cloud is popular these days, so folks may want to be able to define pointers to functions that are run in something else than C, as long as the language is trusted. My take on this part is that we are not going to see many formats out there that would benefit from these callbacks, so asking for people to deploy a .so on disk that can only be LOAD'ed or registered with one of the preloading GUCs should be enough to satisfy most users, even if the barrier entry to get that only a cloud instead like RDS or Azure is higher. This has also the benefit in giving more control on the COPY internals to cloud providers, as they are the ones who would be in charge of saying if they're OK with a dedicated .so or not. Not the users themselves. We've had a lot of bad PR and false CVEs in the past with COPY FROM/TO PROGRAM and the fact that it requires superusers. Having something in this area that gives more freedom to the user with something like approach (a) (SQL functions allowed to define the callback) will, I suspect, bite us back hard. So, my opinion is to rely on _PG_init(), with a shared ID if you want to expose the method used somewhere for monitoring tools. You could as well implement the simpler set of APIs that allocates IDs local to each backend, like EXPLAIN, then consider later if shared IDs are really needed. The registration APIs don't have to be fixed in time across releases, they can be always improved in steps as required. What matters is ABI compatibility in the same major version once it is released. -- Michael
Вложения
On Wed, Jun 11, 2025 at 7:34 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Mon, May 26, 2025 at 10:04:05AM +0900, Sutou Kouhei wrote: > > As I already said, I don't have a strong opinion on which > > approach is better. My opinion for the (important) second > > point is no. I feel that the pros of a. isn't realistic. If > > users want to improve text/csv/binary performance (or > > something), they should improve PostgreSQL itself instead of > > replacing it as an extension. (Or they should create another > > custom copy format such as "faster_text" not "text".) > > Patches welcome. Andres may have a TODO board regarding that, I > think. > > > So I'm OK with the approach b. > > Here is an opinion. Thank you for the comments. > > Approach (b), that uses _PG_init() a function to register a custom > format has the merit to be simple to implement and secure by "design", > because it depends only on the fact that we can do a lookup based on > the string defined in one or more DefElems. Adding a dependendy to > search_path as you say could lead to surprising results. > > Using a shared ID when a COPY method is registered (like extension > wait events) or an ID that's static in a backend (like EXPLAIN > extensibility does) is an implementation difference that can be useful > for monitoring, and only that AFAIK. If you want to implement > method-based statistics for COPY, you will want to allocate one stats > kind for each COPY method, because the stats stored will be aggregates > of the COPY methods. The stats kind ID is something that should not > be linked to the COPY method ID, because the stats kind ID is > registered in its own dedicated path, and it would be hardcoded in the > library where the COPY callbacks are defined. So you could have a > stats kind with a fixed ID, and a COPY method ID that's linked to each > backend like EXPLAIN does. Good point. > > One factor to take into account is how much freedom we are OK with > giving to users when it comes to the deployment of custom COPY > methods, and how popular these would be. Cloud is popular these days, > so folks may want to be able to define pointers to functions that are > run in something else than C, as long as the language is trusted. My > take on this part is that we are not going to see many formats out > there that would benefit from these callbacks, so asking for people to > deploy a .so on disk that can only be LOAD'ed or registered with one > of the preloading GUCs should be enough to satisfy most users, even if > the barrier entry to get that only a cloud instead like RDS or Azure > is higher. This has also the benefit in giving more control on the > COPY internals to cloud providers, as they are the ones who would be > in charge of saying if they're OK with a dedicated .so or not. Not > the users themselves. We've had a lot of bad PR and false CVEs in the > past with COPY FROM/TO PROGRAM and the fact that it requires > superusers. Having something in this area that gives more freedom to > the user with something like approach (a) (SQL functions allowed to > define the callback) will, I suspect, bite us back hard. That's a valid point and I agree. > > So, my opinion is to rely on _PG_init(), with a shared ID if you want > to expose the method used somewhere for monitoring tools. You could > as well implement the simpler set of APIs that allocates IDs local to > each backend, like EXPLAIN, then consider later if shared IDs are > really needed. The registration APIs don't have to be fixed in time > across releases, they can be always improved in steps as required. > What matters is ABI compatibility in the same major version once it is > released. +1 to start with a simpler set of APIs. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Hi, In <CAD21AoBwxgfkMYxgPWyrLG-r8-ptVKjd=jhncY_QAaVJYhQQdw@mail.gmail.com> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Thu, 12 Jun 2025 10:00:12 -0700, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> So, my opinion is to rely on _PG_init(), with a shared ID if you want >> to expose the method used somewhere for monitoring tools. You could >> as well implement the simpler set of APIs that allocates IDs local to >> each backend, like EXPLAIN, then consider later if shared IDs are >> really needed. The registration APIs don't have to be fixed in time >> across releases, they can be always improved in steps as required. >> What matters is ABI compatibility in the same major version once it is >> released. > > +1 to start with a simpler set of APIs. OK. I'll implement the initial version with this design. (Allocating IDs local not shared.) Thanks, -- kou
On Tue, Jun 17, 2025 at 08:50:37AM +0900, Sutou Kouhei wrote: > OK. I'll implement the initial version with this > design. (Allocating IDs local not shared.) Sounds good to me. Thanks Sutou-san! -- Michael
Вложения
Hi,
In <aFC5HmZHU5NCPuTL@paquier.xyz>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Tue, 17 Jun 2025 09:38:54 +0900,
  Michael Paquier <michael@paquier.xyz> wrote:
> On Tue, Jun 17, 2025 at 08:50:37AM +0900, Sutou Kouhei wrote:
>> OK. I'll implement the initial version with this
>> design. (Allocating IDs local not shared.)
> 
> Sounds good to me.  Thanks Sutou-san!
I've attached the v41 patch set that uses the C API approach
with local (not shared) COPY routine management.
0001: This is same as 0001 in the v40 patch set. It just
      cleans up CopySource and CopyDest enums.
0002: This is the initial version of this approach.
Here are some discussion points:
1. This provides 2 registration APIs
   (RegisterCopy{From,To}Routine(name, routine)) instead of
   1 registration API (RegisterCopyFormat(name,
   from_routine, to_routine)).
   It's for simple implementation and easy to extend without
   breaking APIs in the future. (And some formats may
   provide only FROM routine or TO routine.)
   Is this design acceptable?
   FYI: RegisterCopy{From,To}Routine() uses the same logic
   as RegisterExtensionExplainOption().
2. This allocates IDs internally but doesn't provide APIs
   that get them. Because it's not needed for now.
   We can provide GetExplainExtensionId() like API when we
   need it.
        
   Is this design acceptable?
3. I want to register the built-in COPY {FROM,TO} routines
   in the PostgreSQL initialization phase. Where should we
   do it? In 0002, it's done in InitPostgres() but I'm not
   sure whether it's a suitable location or not.
4. 0002 adds CopyFormatOptions::routine as union:
   @@ -87,9 +91,14 @@ typedef struct CopyFormatOptions
           CopyLogVerbosityChoice log_verbosity;   /* verbosity of logged messages */
           int64           reject_limit;   /* maximum tolerable number of errors */
           List       *convert_select; /* list of column names (can be NIL) */
   +       union
   +       {
   +               const struct CopyFromRoutine *from; /* for COPY FROM */
   +               const struct CopyToRoutine *to; /* for COPY TO */
   +       }                       routine;                /* routine to process the specified format */
    } CopyFormatOptions;
   Because one of Copy{From,To}Routine is only needed at
   once. Is this union usage strange in PostgreSQL?
5. 0002 adds InitializeCopy{From,To}Routines() and
   GetCopy{From,To}Routine() that aren't used by COPY
   {FROM,TO} routine implementations to copyapi.h. Should we
   move them to other .h? If so, which .h should be used for
   them?
Thanks,
-- 
kou
From 78b0c3897e3c78988239dd149753ab55336d060c Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Mon, 25 Nov 2024 13:58:33 +0900
Subject: [PATCH v41 1/2] Export CopyDest as private data
This is a preparation to export CopyToStateData as private data.
CopyToStateData depends on CopyDest. So we need to export CopyDest
too.
But CopyDest and CopySource has the same names. So we can't export
CopyDest as-is.
This uses the COPY_DEST_ prefix for CopyDest enum values. CopySource
uses the COPY_FROM_ prefix for consistency.
---
 src/backend/commands/copyfrom.c          |  4 ++--
 src/backend/commands/copyfromparse.c     | 10 ++++----
 src/backend/commands/copyto.c            | 30 ++++++++----------------
 src/include/commands/copyfrom_internal.h |  8 +++----
 src/include/commands/copyto_internal.h   | 28 ++++++++++++++++++++++
 5 files changed, 49 insertions(+), 31 deletions(-)
 create mode 100644 src/include/commands/copyto_internal.h
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index fbbbc09a97b..b4dad744547 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1709,7 +1709,7 @@ BeginCopyFrom(ParseState *pstate,
                             pg_encoding_to_char(GetDatabaseEncoding()))));
     }
 
-    cstate->copy_src = COPY_FILE;    /* default */
+    cstate->copy_src = COPY_SOURCE_FILE;    /* default */
 
     cstate->whereClause = whereClause;
 
@@ -1837,7 +1837,7 @@ BeginCopyFrom(ParseState *pstate,
     if (data_source_cb)
     {
         progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK;
-        cstate->copy_src = COPY_CALLBACK;
+        cstate->copy_src = COPY_SOURCE_CALLBACK;
         cstate->data_source_cb = data_source_cb;
     }
     else if (pipe)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index f5fc346e201..9f7171d1478 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -180,7 +180,7 @@ ReceiveCopyBegin(CopyFromState cstate)
     for (i = 0; i < natts; i++)
         pq_sendint16(&buf, format); /* per-column formats */
     pq_endmessage(&buf);
-    cstate->copy_src = COPY_FRONTEND;
+    cstate->copy_src = COPY_SOURCE_FRONTEND;
     cstate->fe_msgbuf = makeStringInfo();
     /* We *must* flush here to ensure FE knows it can send. */
     pq_flush();
@@ -248,7 +248,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
 
     switch (cstate->copy_src)
     {
-        case COPY_FILE:
+        case COPY_SOURCE_FILE:
             bytesread = fread(databuf, 1, maxread, cstate->copy_file);
             if (ferror(cstate->copy_file))
                 ereport(ERROR,
@@ -257,7 +257,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
             if (bytesread == 0)
                 cstate->raw_reached_eof = true;
             break;
-        case COPY_FRONTEND:
+        case COPY_SOURCE_FRONTEND:
             while (maxread > 0 && bytesread < minread && !cstate->raw_reached_eof)
             {
                 int            avail;
@@ -340,7 +340,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
                 bytesread += avail;
             }
             break;
-        case COPY_CALLBACK:
+        case COPY_SOURCE_CALLBACK:
             bytesread = cstate->data_source_cb(databuf, minread, maxread);
             break;
     }
@@ -1172,7 +1172,7 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
          * after \. up to the protocol end of copy data.  (XXX maybe better
          * not to treat \. as special?)
          */
-        if (cstate->copy_src == COPY_FRONTEND)
+        if (cstate->copy_src == COPY_SOURCE_FRONTEND)
         {
             int            inbytes;
 
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index ea6f18f2c80..99aec9c4c48 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -20,6 +20,7 @@
 
 #include "access/tableam.h"
 #include "commands/copyapi.h"
+#include "commands/copyto_internal.h"
 #include "commands/progress.h"
 #include "executor/execdesc.h"
 #include "executor/executor.h"
@@ -36,17 +37,6 @@
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
-/*
- * Represents the different dest cases we need to worry about at
- * the bottom level
- */
-typedef enum CopyDest
-{
-    COPY_FILE,                    /* to file (or a piped program) */
-    COPY_FRONTEND,                /* to frontend */
-    COPY_CALLBACK,                /* to callback function */
-} CopyDest;
-
 /*
  * This struct contains all the state variables used throughout a COPY TO
  * operation.
@@ -69,7 +59,7 @@ typedef struct CopyToStateData
 
     /* low-level state data */
     CopyDest    copy_dest;        /* type of copy source/destination */
-    FILE       *copy_file;        /* used if copy_dest == COPY_FILE */
+    FILE       *copy_file;        /* used if copy_dest == COPY_DEST_FILE */
     StringInfo    fe_msgbuf;        /* used for all dests during COPY TO */
 
     int            file_encoding;    /* file or remote side's character encoding */
@@ -401,7 +391,7 @@ SendCopyBegin(CopyToState cstate)
     for (i = 0; i < natts; i++)
         pq_sendint16(&buf, format); /* per-column formats */
     pq_endmessage(&buf);
-    cstate->copy_dest = COPY_FRONTEND;
+    cstate->copy_dest = COPY_DEST_FRONTEND;
 }
 
 static void
@@ -448,7 +438,7 @@ CopySendEndOfRow(CopyToState cstate)
 
     switch (cstate->copy_dest)
     {
-        case COPY_FILE:
+        case COPY_DEST_FILE:
             if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1,
                        cstate->copy_file) != 1 ||
                 ferror(cstate->copy_file))
@@ -482,11 +472,11 @@ CopySendEndOfRow(CopyToState cstate)
                              errmsg("could not write to COPY file: %m")));
             }
             break;
-        case COPY_FRONTEND:
+        case COPY_DEST_FRONTEND:
             /* Dump the accumulated row as one CopyData message */
             (void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len);
             break;
-        case COPY_CALLBACK:
+        case COPY_DEST_CALLBACK:
             cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len);
             break;
     }
@@ -507,7 +497,7 @@ CopySendTextLikeEndOfRow(CopyToState cstate)
 {
     switch (cstate->copy_dest)
     {
-        case COPY_FILE:
+        case COPY_DEST_FILE:
             /* Default line termination depends on platform */
 #ifndef WIN32
             CopySendChar(cstate, '\n');
@@ -515,7 +505,7 @@ CopySendTextLikeEndOfRow(CopyToState cstate)
             CopySendString(cstate, "\r\n");
 #endif
             break;
-        case COPY_FRONTEND:
+        case COPY_DEST_FRONTEND:
             /* The FE/BE protocol uses \n as newline for all platforms */
             CopySendChar(cstate, '\n');
             break;
@@ -902,12 +892,12 @@ BeginCopyTo(ParseState *pstate,
     /* See Multibyte encoding comment above */
     cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding);
 
-    cstate->copy_dest = COPY_FILE;    /* default */
+    cstate->copy_dest = COPY_DEST_FILE; /* default */
 
     if (data_dest_cb)
     {
         progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK;
-        cstate->copy_dest = COPY_CALLBACK;
+        cstate->copy_dest = COPY_DEST_CALLBACK;
         cstate->data_dest_cb = data_dest_cb;
     }
     else if (pipe)
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index c8b22af22d8..24157e11a73 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -24,9 +24,9 @@
  */
 typedef enum CopySource
 {
-    COPY_FILE,                    /* from file (or a piped program) */
-    COPY_FRONTEND,                /* from frontend */
-    COPY_CALLBACK,                /* from callback function */
+    COPY_SOURCE_FILE,            /* from file (or a piped program) */
+    COPY_SOURCE_FRONTEND,        /* from frontend */
+    COPY_SOURCE_CALLBACK,        /* from callback function */
 } CopySource;
 
 /*
@@ -64,7 +64,7 @@ typedef struct CopyFromStateData
     /* low-level state data */
     CopySource    copy_src;        /* type of copy source */
     FILE       *copy_file;        /* used if copy_src == COPY_FILE */
-    StringInfo    fe_msgbuf;        /* used if copy_src == COPY_FRONTEND */
+    StringInfo    fe_msgbuf;        /* used if copy_src == COPY_SOURCE_FRONTEND */
 
     EolType        eol_type;        /* EOL type of input */
     int            file_encoding;    /* file or remote side's character encoding */
diff --git a/src/include/commands/copyto_internal.h b/src/include/commands/copyto_internal.h
new file mode 100644
index 00000000000..42ddb37a8a2
--- /dev/null
+++ b/src/include/commands/copyto_internal.h
@@ -0,0 +1,28 @@
+/*-------------------------------------------------------------------------
+ *
+ * copyto_internal.h
+ *      Internal definitions for COPY TO command.
+ *
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/commands/copyto_internal.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef COPYTO_INTERNAL_H
+#define COPYTO_INTERNAL_H
+
+/*
+ * Represents the different dest cases we need to worry about at
+ * the bottom level
+ */
+typedef enum CopyDest
+{
+    COPY_DEST_FILE,                /* to file (or a piped program) */
+    COPY_DEST_FRONTEND,            /* to frontend */
+    COPY_DEST_CALLBACK,            /* to callback function */
+} CopyDest;
+
+#endif                            /* COPYTO_INTERNAL_H */
-- 
2.49.0
From bdd45f68d7026fae757bcd7d6aec8f2b6644a846 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Wed, 18 Jun 2025 11:43:09 +0900
Subject: [PATCH v41 2/2] Add support for registering COPY {FROM,TO} routines
This uses the C API approach like custom EXPLAIN option. Some of this
are based on the custom EXPLAIN option implementations.
This approach provides C API to register COPY {FROM,TO} routines:
    void RegisterCopyFromRoutine(const char *name,
                                 const CopyFromRoutine *routine);
    void RegisterCopyToRoutine(const char *name,
                               const CopyToRoutine *routine);
(They are based on RegisterExtensionExplainOption().)
They assign an ID for each name internally but the current API set
doesn't provide it to users. Because it's not needed for now. If it's
needed, we can provide APIs for it like GetExplainExtensionId() for
custom EXPLAIN option.
This manages registered COPY {FROM,TO} routines in a
process. Registered COPY {FROM,TO} routines aren't shared in multiple
processes because it's not needed for now. We may revisit it when we
need it.
---
 src/backend/commands/copy.c                   |  16 +-
 src/backend/commands/copyfrom.c               | 108 ++++++++++--
 src/backend/commands/copyto.c                 | 161 +++++++++++-------
 src/backend/utils/init/postinit.c             |   5 +
 src/include/commands/copy.h                   |  11 +-
 src/include/commands/copyapi.h                |  13 ++
 src/include/commands/copyto_internal.h        |  55 ++++++
 src/test/modules/Makefile                     |   1 +
 src/test/modules/meson.build                  |   1 +
 src/test/modules/test_copy_format/.gitignore  |   4 +
 src/test/modules/test_copy_format/Makefile    |  21 +++
 .../expected/test_copy_format.out             |  19 +++
 src/test/modules/test_copy_format/meson.build |  29 ++++
 .../test_copy_format/sql/test_copy_format.sql |   8 +
 .../test_copy_format/test_copy_format.c       |  91 ++++++++++
 .../test_copy_format/test_copy_format.conf    |   1 +
 16 files changed, 466 insertions(+), 78 deletions(-)
 create mode 100644 src/test/modules/test_copy_format/.gitignore
 create mode 100644 src/test/modules/test_copy_format/Makefile
 create mode 100644 src/test/modules/test_copy_format/expected/test_copy_format.out
 create mode 100644 src/test/modules/test_copy_format/meson.build
 create mode 100644 src/test/modules/test_copy_format/sql/test_copy_format.sql
 create mode 100644 src/test/modules/test_copy_format/test_copy_format.c
 create mode 100644 src/test/modules/test_copy_format/test_copy_format.conf
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 74ae42b19a7..787a3bdf8a4 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -22,7 +22,7 @@
 #include "access/table.h"
 #include "access/xact.h"
 #include "catalog/pg_authid.h"
-#include "commands/copy.h"
+#include "commands/copyapi.h"
 #include "commands/defrem.h"
 #include "executor/executor.h"
 #include "mb/pg_wchar.h"
@@ -524,13 +524,21 @@ ProcessCopyOptions(ParseState *pstate,
             if (format_specified)
                 errorConflictingDefElem(defel, pstate);
             format_specified = true;
-            if (strcmp(fmt, "text") == 0)
-                 /* default format */ ;
-            else if (strcmp(fmt, "csv") == 0)
+            if (strcmp(fmt, "csv") == 0)
                 opts_out->csv_mode = true;
             else if (strcmp(fmt, "binary") == 0)
                 opts_out->binary = true;
+
+            if (is_from)
+                opts_out->routine.from = GetCopyFromRoutine(fmt);
             else
+                opts_out->routine.to = GetCopyToRoutine(fmt);
+
+            /*
+             * We can use either opts_out->routine.from or .to here to check
+             * the nonexistent routine case.
+             */
+            if (!opts_out->routine.from)
                 ereport(ERROR,
                         (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                          errmsg("COPY format \"%s\" not recognized", fmt),
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index b4dad744547..72c96fc6ff6 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -103,6 +103,22 @@ typedef struct CopyMultiInsertInfo
 } CopyMultiInsertInfo;
 
 
+/*
+ * Manage registered COPY FROM routines in a process. They aren't shared in
+ * multiple processes for now. We may do it later when it's needed.
+ */
+
+typedef struct
+{
+    const char *name;
+    const CopyFromRoutine *routine;
+}            CopyFromRoutineEntry;
+
+static CopyFromRoutineEntry * CopyFromRoutineEntries = NULL;
+static int    CopyFromRoutineEntriesAssigned = 0;
+static int    CopyFromRoutineEntriesAllocated = 0;
+
+
 /* non-export function prototypes */
 static void ClosePipeFromProgram(CopyFromState cstate);
 
@@ -151,17 +167,87 @@ static const CopyFromRoutine CopyFromRoutineBinary = {
     .CopyFromEnd = CopyFromBinaryEnd,
 };
 
-/* Return a COPY FROM routine for the given options */
-static const CopyFromRoutine *
-CopyFromGetRoutine(const CopyFormatOptions *opts)
+/*
+ * Register a new COPY FROM routine.
+ *
+ * When name is used as a COPY FROM format name, routine will be used to
+ * process the COPY FROM request. See CopyFromRoutine how to implement a COPY
+ * FROM routine.
+ *
+ * If name is already registered, registered routine is replaced with the
+ * given routine.
+ *
+ * name is assumed to be a constant string or allocated in storage that will
+ * never be freed.
+ *
+ * routine is assumed to be allocated in storage that will never be freed.
+ */
+void
+RegisterCopyFromRoutine(const char *name, const CopyFromRoutine *routine)
 {
-    if (opts->csv_mode)
-        return &CopyFromRoutineCSV;
-    else if (opts->binary)
-        return &CopyFromRoutineBinary;
+    CopyFromRoutineEntry *entry;
 
-    /* default is text */
-    return &CopyFromRoutineText;
+    /* Search for an existing routine by this name; if found, update handler. */
+    for (int i = 0; i < CopyFromRoutineEntriesAssigned; ++i)
+    {
+        if (strcmp(CopyFromRoutineEntries[i].name, name) == 0)
+        {
+            CopyFromRoutineEntries[i].routine = routine;
+            return;
+        }
+    }
+
+    /* If there is no array yet, create one. */
+    if (!CopyFromRoutineEntries)
+    {
+        CopyFromRoutineEntriesAllocated = 16;
+        CopyFromRoutineEntries =
+            MemoryContextAlloc(TopMemoryContext,
+                               sizeof(CopyFromRoutineEntry) * CopyFromRoutineEntriesAllocated);
+    }
+
+    /* If there's an array but it's currently full, expand it. */
+    if (CopyFromRoutineEntriesAssigned >= CopyFromRoutineEntriesAllocated)
+    {
+        int            i = pg_nextpower2_32(CopyFromRoutineEntriesAssigned + 1);
+
+        CopyFromRoutineEntries =
+            repalloc(CopyFromRoutineEntries, sizeof(CopyFromRoutineEntry) * i);
+        CopyFromRoutineEntriesAllocated = i;
+    }
+
+    /* Assign new ID. */
+    entry = &CopyFromRoutineEntries[CopyFromRoutineEntriesAssigned++];
+    entry->name = name;
+    entry->routine = routine;
+}
+
+/*
+ * Register built-in COPY FROM routines.
+ *
+ * This must be called only once in the initialization process.
+ */
+void
+InitializeCopyFromRoutines(void)
+{
+    RegisterCopyFromRoutine("text", &CopyFromRoutineText);
+    RegisterCopyFromRoutine("csv", &CopyFromRoutineCSV);
+    RegisterCopyFromRoutine("binary", &CopyFromRoutineBinary);
+}
+
+/* Return a COPY FROM routine for the given options */
+const CopyFromRoutine *
+GetCopyFromRoutine(const char *name)
+{
+    for (int i = 0; i < CopyFromRoutineEntriesAssigned; ++i)
+    {
+        if (strcmp(CopyFromRoutineEntries[i].name, name) == 0)
+        {
+            return CopyFromRoutineEntries[i].routine;
+        }
+    }
+
+    return NULL;
 }
 
 /* Implementation of the start callback for text and CSV formats */
@@ -1574,7 +1660,9 @@ BeginCopyFrom(ParseState *pstate,
     ProcessCopyOptions(pstate, &cstate->opts, true /* is_from */ , options);
 
     /* Set the format routine */
-    cstate->routine = CopyFromGetRoutine(&cstate->opts);
+    cstate->routine = cstate->opts.routine.from;
+    if (!cstate->routine)
+        cstate->routine = &CopyFromRoutineText; /* default is text */
 
     /* Process the target relation */
     cstate->rel = rel;
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 99aec9c4c48..7ef690981cb 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -19,12 +19,9 @@
 #include <sys/stat.h>
 
 #include "access/tableam.h"
-#include "commands/copyapi.h"
 #include "commands/copyto_internal.h"
 #include "commands/progress.h"
-#include "executor/execdesc.h"
 #include "executor/executor.h"
-#include "executor/tuptable.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
 #include "mb/pg_wchar.h"
@@ -37,56 +34,6 @@
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
-/*
- * This struct contains all the state variables used throughout a COPY TO
- * operation.
- *
- * Multi-byte encodings: all supported client-side encodings encode multi-byte
- * characters by having the first byte's high bit set. Subsequent bytes of the
- * character can have the high bit not set. When scanning data in such an
- * encoding to look for a match to a single-byte (ie ASCII) character, we must
- * use the full pg_encoding_mblen() machinery to skip over multibyte
- * characters, else we might find a false match to a trailing byte. In
- * supported server encodings, there is no possibility of a false match, and
- * it's faster to make useless comparisons to trailing bytes than it is to
- * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true
- * when we have to do it the hard way.
- */
-typedef struct CopyToStateData
-{
-    /* format-specific routines */
-    const CopyToRoutine *routine;
-
-    /* low-level state data */
-    CopyDest    copy_dest;        /* type of copy source/destination */
-    FILE       *copy_file;        /* used if copy_dest == COPY_DEST_FILE */
-    StringInfo    fe_msgbuf;        /* used for all dests during COPY TO */
-
-    int            file_encoding;    /* file or remote side's character encoding */
-    bool        need_transcoding;    /* file encoding diff from server? */
-    bool        encoding_embeds_ascii;    /* ASCII can be non-first byte? */
-
-    /* parameters from the COPY command */
-    Relation    rel;            /* relation to copy to */
-    QueryDesc  *queryDesc;        /* executable query to copy from */
-    List       *attnumlist;        /* integer list of attnums to copy */
-    char       *filename;        /* filename, or NULL for STDOUT */
-    bool        is_program;        /* is 'filename' a program to popen? */
-    copy_data_dest_cb data_dest_cb; /* function for writing data */
-
-    CopyFormatOptions opts;
-    Node       *whereClause;    /* WHERE condition (or NULL) */
-
-    /*
-     * Working state
-     */
-    MemoryContext copycontext;    /* per-copy execution context */
-
-    FmgrInfo   *out_functions;    /* lookup info for output functions */
-    MemoryContext rowcontext;    /* per-row evaluation context */
-    uint64        bytes_processed;    /* number of bytes processed so far */
-} CopyToStateData;
-
 /* DestReceiver for COPY (query) TO */
 typedef struct
 {
@@ -99,6 +46,22 @@ typedef struct
 static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 
 
+/*
+ * Manage registered COPY TO routines in a process. They aren't shared in
+ * multiple processes for now. We may do it later when it's needed.
+ */
+
+typedef struct
+{
+    const char *name;
+    const CopyToRoutine *routine;
+}            CopyToRoutineEntry;
+
+static CopyToRoutineEntry * CopyToRoutineEntries = NULL;
+static int    CopyToRoutineEntriesAssigned = 0;
+static int    CopyToRoutineEntriesAllocated = 0;
+
+
 /* non-export function prototypes */
 static void EndCopy(CopyToState cstate);
 static void ClosePipeToProgram(CopyToState cstate);
@@ -131,6 +94,7 @@ static void CopySendTextLikeEndOfRow(CopyToState cstate);
 static void CopySendInt32(CopyToState cstate, int32 val);
 static void CopySendInt16(CopyToState cstate, int16 val);
 
+
 /*
  * COPY TO routines for built-in formats.
  *
@@ -162,17 +126,86 @@ static const CopyToRoutine CopyToRoutineBinary = {
     .CopyToEnd = CopyToBinaryEnd,
 };
 
-/* Return a COPY TO routine for the given options */
-static const CopyToRoutine *
-CopyToGetRoutine(const CopyFormatOptions *opts)
+/*
+ * Register a new COPY TO routine.
+ *
+ * When name is used as a COPY TO format name, routine will be used to process
+ * the COPY TO request. See CopyToRoutine how to implement a COPY TO routine.
+ *
+ * If name is already registered, registered routine is replaced with the
+ * given routine.
+ *
+ * name is assumed to be a constant string or allocated in storage that will
+ * never be freed.
+ *
+ * routine is assumed to be allocated in storage that will never be freed.
+ */
+void
+RegisterCopyToRoutine(const char *name, const CopyToRoutine *routine)
 {
-    if (opts->csv_mode)
-        return &CopyToRoutineCSV;
-    else if (opts->binary)
-        return &CopyToRoutineBinary;
+    CopyToRoutineEntry *entry;
 
-    /* default is text */
-    return &CopyToRoutineText;
+    /* Search for an existing routine by this name; if found, update handler. */
+    for (int i = 0; i < CopyToRoutineEntriesAssigned; ++i)
+    {
+        if (strcmp(CopyToRoutineEntries[i].name, name) == 0)
+        {
+            CopyToRoutineEntries[i].routine = routine;
+            return;
+        }
+    }
+
+    /* If there is no array yet, create one. */
+    if (!CopyToRoutineEntries)
+    {
+        CopyToRoutineEntriesAllocated = 16;
+        CopyToRoutineEntries =
+            MemoryContextAlloc(TopMemoryContext,
+                               sizeof(CopyToRoutineEntry) * CopyToRoutineEntriesAllocated);
+    }
+
+    /* If there's an array but it's currently full, expand it. */
+    if (CopyToRoutineEntriesAssigned >= CopyToRoutineEntriesAllocated)
+    {
+        int            i = pg_nextpower2_32(CopyToRoutineEntriesAssigned + 1);
+
+        CopyToRoutineEntries =
+            repalloc(CopyToRoutineEntries, sizeof(CopyToRoutineEntry) * i);
+        CopyToRoutineEntriesAllocated = i;
+    }
+
+    /* Assign new ID. */
+    entry = &CopyToRoutineEntries[CopyToRoutineEntriesAssigned++];
+    entry->name = name;
+    entry->routine = routine;
+}
+
+/*
+ * Register built-in COPY TO routines.
+ *
+ * This must be called only once in the initialization process.
+ */
+void
+InitializeCopyToRoutines(void)
+{
+    RegisterCopyToRoutine("text", &CopyToRoutineText);
+    RegisterCopyToRoutine("csv", &CopyToRoutineCSV);
+    RegisterCopyToRoutine("binary", &CopyToRoutineBinary);
+}
+
+/* Return a COPY TO routine for the given options */
+const CopyToRoutine *
+GetCopyToRoutine(const char *name)
+{
+    for (int i = 0; i < CopyToRoutineEntriesAssigned; ++i)
+    {
+        if (strcmp(CopyToRoutineEntries[i].name, name) == 0)
+        {
+            return CopyToRoutineEntries[i].routine;
+        }
+    }
+
+    return NULL;
 }
 
 /* Implementation of the start callback for text and CSV formats */
@@ -693,7 +726,9 @@ BeginCopyTo(ParseState *pstate,
     ProcessCopyOptions(pstate, &cstate->opts, false /* is_from */ , options);
 
     /* Set format routine */
-    cstate->routine = CopyToGetRoutine(&cstate->opts);
+    cstate->routine = cstate->opts.routine.to;
+    if (!cstate->routine)
+        cstate->routine = &CopyToRoutineText;    /* default is text */
 
     /* Process the source/target relation or query */
     if (rel)
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index c86ceefda94..d566f542d62 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -33,6 +33,7 @@
 #include "catalog/pg_database.h"
 #include "catalog/pg_db_role_setting.h"
 #include "catalog/pg_tablespace.h"
+#include "commands/copyapi.h"
 #include "libpq/auth.h"
 #include "libpq/libpq-be.h"
 #include "mb/pg_wchar.h"
@@ -1217,6 +1218,10 @@ InitPostgres(const char *in_dbname, Oid dboid,
     /* Initialize this backend's session state. */
     InitializeSession();
 
+    /* Initialize COPY routines. */
+    InitializeCopyFromRoutines();
+    InitializeCopyToRoutines();
+
     /*
      * If this is an interactive session, load any libraries that should be
      * preloaded at backend start.  Since those are determined by GUCs, this
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 06dfdfef721..88fa0703d0a 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -51,6 +51,10 @@ typedef enum CopyLogVerbosityChoice
     COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
 } CopyLogVerbosityChoice;
 
+/* These are in commands/copyapi.h */
+struct CopyFromRoutine;
+struct CopyToRoutine;
+
 /*
  * A struct to hold COPY options, in a parsed form. All of these are related
  * to formatting, except for 'freeze', which doesn't really belong here, but
@@ -87,9 +91,14 @@ typedef struct CopyFormatOptions
     CopyLogVerbosityChoice log_verbosity;    /* verbosity of logged messages */
     int64        reject_limit;    /* maximum tolerable number of errors */
     List       *convert_select; /* list of column names (can be NIL) */
+    union
+    {
+        const struct CopyFromRoutine *from; /* for COPY FROM */
+        const struct CopyToRoutine *to; /* for COPY TO */
+    }            routine;        /* routine to process the specified format */
 } CopyFormatOptions;
 
-/* These are private in commands/copy[from|to].c */
+/* These are private in commands/copy[from|to]_internal.h */
 typedef struct CopyFromStateData *CopyFromState;
 typedef struct CopyToStateData *CopyToState;
 
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 2a2d2f9876b..39253d616d7 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -54,6 +54,13 @@ typedef struct CopyToRoutine
     void        (*CopyToEnd) (CopyToState cstate);
 } CopyToRoutine;
 
+extern void RegisterCopyToRoutine(const char *name, const CopyToRoutine *routine);
+
+/* TODO: Should we move the followings to other .h because they are not for
+ * custom COPY TO format extensions? */
+extern void InitializeCopyToRoutines(void);
+extern const CopyToRoutine *GetCopyToRoutine(const char *name);
+
 /*
  * API structure for a COPY FROM format implementation. Note this must be
  * allocated in a server-lifetime manner, typically as a static const struct.
@@ -102,4 +109,10 @@ typedef struct CopyFromRoutine
     void        (*CopyFromEnd) (CopyFromState cstate);
 } CopyFromRoutine;
 
+extern void RegisterCopyFromRoutine(const char *name, const CopyFromRoutine *routine);
+/* TODO: Should we move the followings to other .h because they are not for
+ * custom COPY FROM format extensions? */
+extern void InitializeCopyFromRoutines(void);
+extern const CopyFromRoutine *GetCopyFromRoutine(const char *name);
+
 #endif                            /* COPYAPI_H */
diff --git a/src/include/commands/copyto_internal.h b/src/include/commands/copyto_internal.h
index 42ddb37a8a2..9dbbbc592b7 100644
--- a/src/include/commands/copyto_internal.h
+++ b/src/include/commands/copyto_internal.h
@@ -14,6 +14,11 @@
 #ifndef COPYTO_INTERNAL_H
 #define COPYTO_INTERNAL_H
 
+#include "commands/copyapi.h"
+#include "executor/execdesc.h"
+#include "executor/tuptable.h"
+#include "nodes/execnodes.h"
+
 /*
  * Represents the different dest cases we need to worry about at
  * the bottom level
@@ -25,4 +30,54 @@ typedef enum CopyDest
     COPY_DEST_CALLBACK,            /* to callback function */
 } CopyDest;
 
+/*
+ * This struct contains all the state variables used throughout a COPY TO
+ * operation.
+ *
+ * Multi-byte encodings: all supported client-side encodings encode multi-byte
+ * characters by having the first byte's high bit set. Subsequent bytes of the
+ * character can have the high bit not set. When scanning data in such an
+ * encoding to look for a match to a single-byte (ie ASCII) character, we must
+ * use the full pg_encoding_mblen() machinery to skip over multibyte
+ * characters, else we might find a false match to a trailing byte. In
+ * supported server encodings, there is no possibility of a false match, and
+ * it's faster to make useless comparisons to trailing bytes than it is to
+ * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true
+ * when we have to do it the hard way.
+ */
+typedef struct CopyToStateData
+{
+    /* format-specific routines */
+    const CopyToRoutine *routine;
+
+    /* low-level state data */
+    CopyDest    copy_dest;        /* type of copy source/destination */
+    FILE       *copy_file;        /* used if copy_dest == COPY_DEST_FILE */
+    StringInfo    fe_msgbuf;        /* used for all dests during COPY TO */
+
+    int            file_encoding;    /* file or remote side's character encoding */
+    bool        need_transcoding;    /* file encoding diff from server? */
+    bool        encoding_embeds_ascii;    /* ASCII can be non-first byte? */
+
+    /* parameters from the COPY command */
+    Relation    rel;            /* relation to copy to */
+    QueryDesc  *queryDesc;        /* executable query to copy from */
+    List       *attnumlist;        /* integer list of attnums to copy */
+    char       *filename;        /* filename, or NULL for STDOUT */
+    bool        is_program;        /* is 'filename' a program to popen? */
+    copy_data_dest_cb data_dest_cb; /* function for writing data */
+
+    CopyFormatOptions opts;
+    Node       *whereClause;    /* WHERE condition (or NULL) */
+
+    /*
+     * Working state
+     */
+    MemoryContext copycontext;    /* per-copy execution context */
+
+    FmgrInfo   *out_functions;    /* lookup info for output functions */
+    MemoryContext rowcontext;    /* per-row evaluation context */
+    uint64        bytes_processed;    /* number of bytes processed so far */
+} CopyToStateData;
+
 #endif                            /* COPYTO_INTERNAL_H */
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index aa1d27bbed3..9bf5d58cdae 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -17,6 +17,7 @@ SUBDIRS = \
           test_aio \
           test_bloomfilter \
           test_copy_callbacks \
+          test_copy_format \
           test_custom_rmgrs \
           test_ddl_deparse \
           test_dsa \
diff --git a/src/test/modules/meson.build b/src/test/modules/meson.build
index 9de0057bd1d..5fd06de2737 100644
--- a/src/test/modules/meson.build
+++ b/src/test/modules/meson.build
@@ -16,6 +16,7 @@ subdir('ssl_passphrase_callback')
 subdir('test_aio')
 subdir('test_bloomfilter')
 subdir('test_copy_callbacks')
+subdir('test_copy_format')
 subdir('test_custom_rmgrs')
 subdir('test_ddl_deparse')
 subdir('test_dsa')
diff --git a/src/test/modules/test_copy_format/.gitignore b/src/test/modules/test_copy_format/.gitignore
new file mode 100644
index 00000000000..5dcb3ff9723
--- /dev/null
+++ b/src/test/modules/test_copy_format/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/src/test/modules/test_copy_format/Makefile b/src/test/modules/test_copy_format/Makefile
new file mode 100644
index 00000000000..85dce14ebb3
--- /dev/null
+++ b/src/test/modules/test_copy_format/Makefile
@@ -0,0 +1,21 @@
+# src/test/modules/test_copy_format/Makefile
+
+MODULE_big = test_copy_format
+OBJS = \
+    $(WIN32RES) \
+    test_copy_format.o
+PGFILEDESC = "test_copy_format - test custom COPY FORMAT"
+
+REGRESS = test_copy_format
+REGRESS_OPTS = --temp-config $(top_srcdir)/src/test/modules/test_copy_format/test_copy_format.conf
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_copy_format
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out
b/src/test/modules/test_copy_format/expected/test_copy_format.out
new file mode 100644
index 00000000000..163ff94fa41
--- /dev/null
+++ b/src/test/modules/test_copy_format/expected/test_copy_format.out
@@ -0,0 +1,19 @@
+CREATE TABLE copy_data (a smallint, b integer, c bigint);
+INSERT INTO copy_data VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+COPY copy_data FROM stdin WITH (FORMAT 'test_copy_format');
+NOTICE:  CopyFromInFunc: attribute: smallint
+NOTICE:  CopyFromInFunc: attribute: integer
+NOTICE:  CopyFromInFunc: attribute: bigint
+NOTICE:  CopyFromStart: the number of attributes: 3
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromEnd
+COPY copy_data TO stdout WITH (FORMAT 'test_copy_format');
+NOTICE:  CopyToOutFunc: attribute: smallint
+NOTICE:  CopyToOutFunc: attribute: integer
+NOTICE:  CopyToOutFunc: attribute: bigint
+NOTICE:  CopyToStart: the number of attributes: 3
+NOTICE:  CopyToOneRow: the number of valid values: 3
+NOTICE:  CopyToOneRow: the number of valid values: 3
+NOTICE:  CopyToOneRow: the number of valid values: 3
+NOTICE:  CopyToEnd
+DROP TABLE copy_data;
diff --git a/src/test/modules/test_copy_format/meson.build b/src/test/modules/test_copy_format/meson.build
new file mode 100644
index 00000000000..723c51d3f45
--- /dev/null
+++ b/src/test/modules/test_copy_format/meson.build
@@ -0,0 +1,29 @@
+# Copyright (c) 2025, PostgreSQL Global Development Group
+
+test_copy_format_sources = files(
+  'test_copy_format.c',
+)
+
+if host_system == 'windows'
+  test_copy_format_sources += rc_lib_gen.process(win32ver_rc, extra_args: [
+    '--NAME', 'test_copy_format',
+    '--FILEDESC', 'test_copy_format - test custom COPY FORMAT',])
+endif
+
+test_copy_format = shared_module('test_copy_format',
+  test_copy_format_sources,
+  kwargs: pg_test_mod_args,
+)
+test_install_libs += test_copy_format
+
+tests += {
+  'name': 'test_copy_format',
+  'sd': meson.current_source_dir(),
+  'bd': meson.current_build_dir(),
+  'regress': {
+    'sql': [
+      'test_copy_format',
+    ],
+    'regress_args': ['--temp-config', files('test_copy_format.conf')],
+  },
+}
diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql
b/src/test/modules/test_copy_format/sql/test_copy_format.sql
new file mode 100644
index 00000000000..6d60a493e0e
--- /dev/null
+++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql
@@ -0,0 +1,8 @@
+CREATE TABLE copy_data (a smallint, b integer, c bigint);
+INSERT INTO copy_data VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+
+COPY copy_data FROM stdin WITH (FORMAT 'test_copy_format');
+\.
+COPY copy_data TO stdout WITH (FORMAT 'test_copy_format');
+
+DROP TABLE copy_data;
diff --git a/src/test/modules/test_copy_format/test_copy_format.c
b/src/test/modules/test_copy_format/test_copy_format.c
new file mode 100644
index 00000000000..70b7a308d8a
--- /dev/null
+++ b/src/test/modules/test_copy_format/test_copy_format.c
@@ -0,0 +1,91 @@
+/*--------------------------------------------------------------------------
+ *
+ * test_copy_format.c
+ *        Code for testing custom COPY format.
+ *
+ * Portions Copyright (c) 2025, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *        src/test/modules/test_copy_format/test_copy_format.c
+ *
+ * -------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "commands/copyapi.h"
+#include "commands/defrem.h"
+#include "utils/builtins.h"
+
+PG_MODULE_MAGIC;
+
+static void
+TestCopyFromInFunc(CopyFromState cstate, Oid atttypid,
+                   FmgrInfo *finfo, Oid *typioparam)
+{
+    ereport(NOTICE, (errmsg("CopyFromInFunc: attribute: %s", format_type_be(atttypid))));
+}
+
+static void
+TestCopyFromStart(CopyFromState cstate, TupleDesc tupDesc)
+{
+    ereport(NOTICE, (errmsg("CopyFromStart: the number of attributes: %d", tupDesc->natts)));
+}
+
+static bool
+TestCopyFromOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls)
+{
+    ereport(NOTICE, (errmsg("CopyFromOneRow")));
+    return false;
+}
+
+static void
+TestCopyFromEnd(CopyFromState cstate)
+{
+    ereport(NOTICE, (errmsg("CopyFromEnd")));
+}
+
+static const CopyFromRoutine CopyFromRoutineTestCopyFormat = {
+    .CopyFromInFunc = TestCopyFromInFunc,
+    .CopyFromStart = TestCopyFromStart,
+    .CopyFromOneRow = TestCopyFromOneRow,
+    .CopyFromEnd = TestCopyFromEnd,
+};
+
+static void
+TestCopyToOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo)
+{
+    ereport(NOTICE, (errmsg("CopyToOutFunc: attribute: %s", format_type_be(atttypid))));
+}
+
+static void
+TestCopyToStart(CopyToState cstate, TupleDesc tupDesc)
+{
+    ereport(NOTICE, (errmsg("CopyToStart: the number of attributes: %d", tupDesc->natts)));
+}
+
+static void
+TestCopyToOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+    ereport(NOTICE, (errmsg("CopyToOneRow: the number of valid values: %u", slot->tts_nvalid)));
+}
+
+static void
+TestCopyToEnd(CopyToState cstate)
+{
+    ereport(NOTICE, (errmsg("CopyToEnd")));
+}
+
+static const CopyToRoutine CopyToRoutineTestCopyFormat = {
+    .CopyToOutFunc = TestCopyToOutFunc,
+    .CopyToStart = TestCopyToStart,
+    .CopyToOneRow = TestCopyToOneRow,
+    .CopyToEnd = TestCopyToEnd,
+};
+
+void
+_PG_init(void)
+{
+    RegisterCopyFromRoutine("test_copy_format", &CopyFromRoutineTestCopyFormat);
+    RegisterCopyToRoutine("test_copy_format", &CopyToRoutineTestCopyFormat);
+}
diff --git a/src/test/modules/test_copy_format/test_copy_format.conf
b/src/test/modules/test_copy_format/test_copy_format.conf
new file mode 100644
index 00000000000..743a02ac92a
--- /dev/null
+++ b/src/test/modules/test_copy_format/test_copy_format.conf
@@ -0,0 +1 @@
+shared_preload_libraries = 'test_copy_format'
-- 
2.49.0
			
		On Wed, Jun 18, 2025 at 12:59 PM Sutou Kouhei <kou@clear-code.com> wrote:
>
> Hi,
>
> In <aFC5HmZHU5NCPuTL@paquier.xyz>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Tue, 17 Jun 2025 09:38:54 +0900,
>   Michael Paquier <michael@paquier.xyz> wrote:
>
> > On Tue, Jun 17, 2025 at 08:50:37AM +0900, Sutou Kouhei wrote:
> >> OK. I'll implement the initial version with this
> >> design. (Allocating IDs local not shared.)
> >
> > Sounds good to me.  Thanks Sutou-san!
>
> I've attached the v41 patch set that uses the C API approach
> with local (not shared) COPY routine management.
>
> 0001: This is same as 0001 in the v40 patch set. It just
>       cleans up CopySource and CopyDest enums.
> 0002: This is the initial version of this approach.
Thank you for updating the patches!
> Here are some discussion points:
>
> 1. This provides 2 registration APIs
>    (RegisterCopy{From,To}Routine(name, routine)) instead of
>    1 registration API (RegisterCopyFormat(name,
>    from_routine, to_routine)).
>
>    It's for simple implementation and easy to extend without
>    breaking APIs in the future. (And some formats may
>    provide only FROM routine or TO routine.)
>
>    Is this design acceptable?
With the single registration API idea, we can register the custom
format routine that supports only FROM routine using the API like:
RegisterCopyRoutine("new-format", NewFormatFromRoutine, NULL);
Compared to this approach, what points do you think having separate
two registration APIs is preferable in terms of extendability and API
compatibility? I think it would be rather confusing for example if
each COPY TO routine and COPY FROM routine is registered by different
extensions with the same format name.
>    FYI: RegisterCopy{From,To}Routine() uses the same logic
>    as RegisterExtensionExplainOption().
I'm concerned that the patch has duplicated logics for the
registration of COPY FROM and COPY TO.
>
> 2. This allocates IDs internally but doesn't provide APIs
>    that get them. Because it's not needed for now.
>
>    We can provide GetExplainExtensionId() like API when we
>    need it.
>
>    Is this design acceptable?
+1
>
> 3. I want to register the built-in COPY {FROM,TO} routines
>    in the PostgreSQL initialization phase. Where should we
>    do it? In 0002, it's done in InitPostgres() but I'm not
>    sure whether it's a suitable location or not.
InitPostgres() is not a correct function as it's a process
initialization function. Probably we don't necessarily need to
register the built-in formats in the same way as custom formats. A
simple solution would be to have separate arrays for built-in formats
and custom formats and have the GetCopy[To|From]Routine() search both
arrays (built-in array first).
> 4. 0002 adds CopyFormatOptions::routine as union:
>
>    @@ -87,9 +91,14 @@ typedef struct CopyFormatOptions
>            CopyLogVerbosityChoice log_verbosity;   /* verbosity of logged messages */
>            int64           reject_limit;   /* maximum tolerable number of errors */
>            List       *convert_select; /* list of column names (can be NIL) */
>    +       union
>    +       {
>    +               const struct CopyFromRoutine *from; /* for COPY FROM */
>    +               const struct CopyToRoutine *to; /* for COPY TO */
>    +       }                       routine;                /* routine to process the specified format */
>     } CopyFormatOptions;
>
>    Because one of Copy{From,To}Routine is only needed at
>    once. Is this union usage strange in PostgreSQL?
I think we can live with having two fields as there are other options
that are used only in either COPY FROM or COPY TO.
>
> 5. 0002 adds InitializeCopy{From,To}Routines() and
>    GetCopy{From,To}Routine() that aren't used by COPY
>    {FROM,TO} routine implementations to copyapi.h. Should we
>    move them to other .h? If so, which .h should be used for
>    them?
As I commented at 3, I think it's better to avoid dynamically
registering the built-in formats.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
			
		Hi,
In <CAD21AoA57owo6qYTPTxOtCjDmcuj1tGL1aGs95cvEnoLQvwF0A@mail.gmail.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Tue, 24 Jun 2025 11:59:17 +0900,
  Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> 1. This provides 2 registration APIs
>>    (RegisterCopy{From,To}Routine(name, routine)) instead of
>>    1 registration API (RegisterCopyFormat(name,
>>    from_routine, to_routine)).
>>
>>    It's for simple implementation and easy to extend without
>>    breaking APIs in the future. (And some formats may
>>    provide only FROM routine or TO routine.)
>>
>>    Is this design acceptable?
> 
> With the single registration API idea, we can register the custom
> format routine that supports only FROM routine using the API like:
> 
> RegisterCopyRoutine("new-format", NewFormatFromRoutine, NULL);
> 
> Compared to this approach, what points do you think having separate
> two registration APIs is preferable in terms of extendability and API
> compatibility?
It's natural to add more related APIs with this
approach. The single registration API provides one feature
by one operation. If we use the RegisterCopyRoutine() for
FROM and TO formats API, it's not natural that we add more
related APIs. In this case, some APIs may provide multiple
features by one operation and other APIs may provide single
feature by one operation. Developers may be confused with
the API. For example, developers may think "what does mean
NULL here?" or "can we use NULL here?" for
"RegisterCopyRoutine("new-format", NewFormatFromRoutine,
NULL)".
>                I think it would be rather confusing for example if
> each COPY TO routine and COPY FROM routine is registered by different
> extensions with the same format name.
Hmm, I don't think so. Who is confused by the case? DBA?
Users who use COPY? Why is it confused?
Do you assume the case that the same name is used for
different format? For example, "json" is used for JSON Lines
for COPY FROM and and normal JSON for COPY TO by different
extensions.
>>    FYI: RegisterCopy{From,To}Routine() uses the same logic
>>    as RegisterExtensionExplainOption().
> 
> I'm concerned that the patch has duplicated logics for the
> registration of COPY FROM and COPY TO.
We can implement a convenient routine that can be used for
RegisterExtensionExplainOption() and
RegisterCopy{From,To}Routine() if it's needed.
>> 3. I want to register the built-in COPY {FROM,TO} routines
>>    in the PostgreSQL initialization phase. Where should we
>>    do it? In 0002, it's done in InitPostgres() but I'm not
>>    sure whether it's a suitable location or not.
> 
> InitPostgres() is not a correct function as it's a process
> initialization function. Probably we don't necessarily need to
> register the built-in formats in the same way as custom formats. A
> simple solution would be to have separate arrays for built-in formats
> and custom formats and have the GetCopy[To|From]Routine() search both
> arrays (built-in array first).
We had a discussion that we should dog-food APIs:
https://www.postgresql.org/message-id/flat/CAKFQuwaCHhrS%2BRE4p_OO6d7WEskd9b86-2cYcvChNkrP%2B7PJ7A%40mail.gmail.com#e6d1cdd04dac53eafe34b784ac47b68b
> We should (and usually do) dog-food APIs when reasonable
> and this situation seems quite reasonable.
In this case, we don't need to dog-food APIs, right?
>> 4. 0002 adds CopyFormatOptions::routine as union:
>>
>>    @@ -87,9 +91,14 @@ typedef struct CopyFormatOptions
>>            CopyLogVerbosityChoice log_verbosity;   /* verbosity of logged messages */
>>            int64           reject_limit;   /* maximum tolerable number of errors */
>>            List       *convert_select; /* list of column names (can be NIL) */
>>    +       union
>>    +       {
>>    +               const struct CopyFromRoutine *from; /* for COPY FROM */
>>    +               const struct CopyToRoutine *to; /* for COPY TO */
>>    +       }                       routine;                /* routine to process the specified format */
>>     } CopyFormatOptions;
>>
>>    Because one of Copy{From,To}Routine is only needed at
>>    once. Is this union usage strange in PostgreSQL?
> 
> I think we can live with having two fields as there are other options
> that are used only in either COPY FROM or COPY TO.
OK.
Thanks,
-- 
kou
			
		On Tue, Jun 24, 2025 at 2:11 PM Sutou Kouhei <kou@clear-code.com> wrote:
>
> Hi,
>
> In <CAD21AoA57owo6qYTPTxOtCjDmcuj1tGL1aGs95cvEnoLQvwF0A@mail.gmail.com>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Tue, 24 Jun 2025 11:59:17 +0900,
>   Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> >> 1. This provides 2 registration APIs
> >>    (RegisterCopy{From,To}Routine(name, routine)) instead of
> >>    1 registration API (RegisterCopyFormat(name,
> >>    from_routine, to_routine)).
> >>
> >>    It's for simple implementation and easy to extend without
> >>    breaking APIs in the future. (And some formats may
> >>    provide only FROM routine or TO routine.)
> >>
> >>    Is this design acceptable?
> >
> > With the single registration API idea, we can register the custom
> > format routine that supports only FROM routine using the API like:
> >
> > RegisterCopyRoutine("new-format", NewFormatFromRoutine, NULL);
> >
> > Compared to this approach, what points do you think having separate
> > two registration APIs is preferable in terms of extendability and API
> > compatibility?
>
> It's natural to add more related APIs with this
> approach. The single registration API provides one feature
> by one operation. If we use the RegisterCopyRoutine() for
> FROM and TO formats API, it's not natural that we add more
> related APIs. In this case, some APIs may provide multiple
> features by one operation and other APIs may provide single
> feature by one operation. Developers may be confused with
> the API. For example, developers may think "what does mean
> NULL here?" or "can we use NULL here?" for
> "RegisterCopyRoutine("new-format", NewFormatFromRoutine,
> NULL)".
We can document it in the comment for the registration function.
>
>
> >                I think it would be rather confusing for example if
> > each COPY TO routine and COPY FROM routine is registered by different
> > extensions with the same format name.
>
> Hmm, I don't think so. Who is confused by the case? DBA?
> Users who use COPY? Why is it confused?
>
> Do you assume the case that the same name is used for
> different format? For example, "json" is used for JSON Lines
> for COPY FROM and and normal JSON for COPY TO by different
> extensions.
Suppose that extension-A implements only CopyToRoutine for the
custom-format-X with the format name 'myformat' and extension-B
implements only CopyFromRoutine for the custom-format-Y with the same
name, if users load both extension-A and extension-B, it seems to me
that extension-A registers the custom-format-X format as 'myformat'
only with CopyToRoutine, and extension-B overwrites the 'myformat'
registration by adding custom-format-Y's CopyFromRoutine. However, if
users register extension-C that implements both routines with the
format name 'myformat', they can register neither extension-A nor
extension-B, which seems to me that we don't allow overwriting the
registration in this case.
I think the core issue appears to be the internal management of custom
format entries but the current patch does enable registration
overwriting in the former case (extension-A and extension-B case).
>
> >>    FYI: RegisterCopy{From,To}Routine() uses the same logic
> >>    as RegisterExtensionExplainOption().
> >
> > I'm concerned that the patch has duplicated logics for the
> > registration of COPY FROM and COPY TO.
>
> We can implement a convenient routine that can be used for
> RegisterExtensionExplainOption() and
> RegisterCopy{From,To}Routine() if it's needed.
I meant there are duplicated codes in COPY FROM and COPY TO. For
instance, RegisterCopyFromRoutine() and RegisterCopyToRoutine() have
the same logic.
>
> >> 3. I want to register the built-in COPY {FROM,TO} routines
> >>    in the PostgreSQL initialization phase. Where should we
> >>    do it? In 0002, it's done in InitPostgres() but I'm not
> >>    sure whether it's a suitable location or not.
> >
> > InitPostgres() is not a correct function as it's a process
> > initialization function. Probably we don't necessarily need to
> > register the built-in formats in the same way as custom formats. A
> > simple solution would be to have separate arrays for built-in formats
> > and custom formats and have the GetCopy[To|From]Routine() search both
> > arrays (built-in array first).
>
> We had a discussion that we should dog-food APIs:
>
>
https://www.postgresql.org/message-id/flat/CAKFQuwaCHhrS%2BRE4p_OO6d7WEskd9b86-2cYcvChNkrP%2B7PJ7A%40mail.gmail.com#e6d1cdd04dac53eafe34b784ac47b68b
>
> > We should (and usually do) dog-food APIs when reasonable
> > and this situation seems quite reasonable.
>
> In this case, we don't need to dog-food APIs, right?
Yes, I think so.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
			
		Hi,
In <CAD21AoC8-d=GF-hOvGqUyq2xFg=QGpYfCiWJbcp4wcn0UidrPw@mail.gmail.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Tue, 24 Jun 2025 15:24:23 +0900,
  Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> It's natural to add more related APIs with this
>> approach. The single registration API provides one feature
>> by one operation. If we use the RegisterCopyRoutine() for
>> FROM and TO formats API, it's not natural that we add more
>> related APIs. In this case, some APIs may provide multiple
>> features by one operation and other APIs may provide single
>> feature by one operation. Developers may be confused with
>> the API. For example, developers may think "what does mean
>> NULL here?" or "can we use NULL here?" for
>> "RegisterCopyRoutine("new-format", NewFormatFromRoutine,
>> NULL)".
> 
> We can document it in the comment for the registration function.
I think that API that can be understandable without the
additional note is better API than API that needs some
notes.
Why do you suggest the RegisterCopyRoutine("new-format",
NewFormatFromRoutine, NewFormatToRoutine) API? You want to
remove the duplicated codes in
RegisterCopy{From,To}Routine(), right? I think that we can
do it by creating a convenient function that has the
duplicated codes extracted from
RegisterCopy{From,To}Routine() and
RegisterExtensionExplainOption().
BTW, what do you think about my answer (one feature by one
operation API is more extendable API) for your question
(extendability and API compatibility)?
> Suppose that extension-A implements only CopyToRoutine for the
> custom-format-X with the format name 'myformat' and extension-B
> implements only CopyFromRoutine for the custom-format-Y with the same
> name, if users load both extension-A and extension-B, it seems to me
> that extension-A registers the custom-format-X format as 'myformat'
> only with CopyToRoutine, and extension-B overwrites the 'myformat'
> registration by adding custom-format-Y's CopyFromRoutine. However, if
> users register extension-C that implements both routines with the
> format name 'myformat', they can register neither extension-A nor
> extension-B, which seems to me that we don't allow overwriting the
> registration in this case.
Do you assume that users use extension-A, extension-B and
extension-C without reading their documentation? If users
read their documentation before users use them, users can
know all of them use the same format name 'myformat' and
which extension provides Copy{From,To}Routine.
In this case, these users (who don't read documentation)
will be confused with the RegisterCopyRoutine("new-format",
NewFormatFromRoutine, NewFormatToRoutine) API too. Do we
really need to care about this case?
> I think the core issue appears to be the internal management of custom
> format entries but the current patch does enable registration
> overwriting in the former case (extension-A and extension-B case).
This is the same behavior as existing custom EXPLAIN option
implementation. Should we use different behavior here?
>> >>    FYI: RegisterCopy{From,To}Routine() uses the same logic
>> >>    as RegisterExtensionExplainOption().
>> >
>> > I'm concerned that the patch has duplicated logics for the
>> > registration of COPY FROM and COPY TO.
>>
>> We can implement a convenient routine that can be used for
>> RegisterExtensionExplainOption() and
>> RegisterCopy{From,To}Routine() if it's needed.
> 
> I meant there are duplicated codes in COPY FROM and COPY TO. For
> instance, RegisterCopyFromRoutine() and RegisterCopyToRoutine() have
> the same logic.
Yes, I understand it. I wanted to say that we can remove the
duplicated codes by introducing a RegisterSomething()
function that can be used by
RegisterExtensionExplainOption() and
RegisterCopy{From,To}Routine():
void
RegisterSomething(...)
{
  /* Common codes in RegisterExtensionExplainOption() and
     RegisterCopy{From,To}Routine()
     ...
   */
}
void
RegisterExtensionExplainOption(...)
{
  RegisterSomething(...);
}
void
RegisterCopyFromRoutine(...)
{
  RegisterSomething(...);
}
void
RegisterCopyToRoutine(...)
{
  RegisterSomething(...);
}
You think that this approach can't remove the duplicated
codes, right?
>> > InitPostgres() is not a correct function as it's a process
>> > initialization function. Probably we don't necessarily need to
>> > register the built-in formats in the same way as custom formats. A
>> > simple solution would be to have separate arrays for built-in formats
>> > and custom formats and have the GetCopy[To|From]Routine() search both
>> > arrays (built-in array first).
>>
>> We had a discussion that we should dog-food APIs:
>>
>>
https://www.postgresql.org/message-id/flat/CAKFQuwaCHhrS%2BRE4p_OO6d7WEskd9b86-2cYcvChNkrP%2B7PJ7A%40mail.gmail.com#e6d1cdd04dac53eafe34b784ac47b68b
>>
>> > We should (and usually do) dog-food APIs when reasonable
>> > and this situation seems quite reasonable.
>>
>> In this case, we don't need to dog-food APIs, right?
> 
> Yes, I think so.
OK. I don't have a strong opinion for it. If nobody objects
it, I'll do it when I update the patch set.
Thanks,
-- 
kou
			
		On Tue, Jun 24, 2025 at 4:10 PM Sutou Kouhei <kou@clear-code.com> wrote:
>
> Hi,
>
> In <CAD21AoC8-d=GF-hOvGqUyq2xFg=QGpYfCiWJbcp4wcn0UidrPw@mail.gmail.com>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Tue, 24 Jun 2025 15:24:23 +0900,
>   Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> >> It's natural to add more related APIs with this
> >> approach. The single registration API provides one feature
> >> by one operation. If we use the RegisterCopyRoutine() for
> >> FROM and TO formats API, it's not natural that we add more
> >> related APIs. In this case, some APIs may provide multiple
> >> features by one operation and other APIs may provide single
> >> feature by one operation. Developers may be confused with
> >> the API. For example, developers may think "what does mean
> >> NULL here?" or "can we use NULL here?" for
> >> "RegisterCopyRoutine("new-format", NewFormatFromRoutine,
> >> NULL)".
> >
> > We can document it in the comment for the registration function.
>
> I think that API that can be understandable without the
> additional note is better API than API that needs some
> notes.
I don't see much difference in this case.
>
> Why do you suggest the RegisterCopyRoutine("new-format",
> NewFormatFromRoutine, NewFormatToRoutine) API? You want to
> remove the duplicated codes in
> RegisterCopy{From,To}Routine(), right?
No. I think that if extensions are likely to support both
CopyToRoutine and CopyFromRoutine in most cases, it would be simpler
to register the custom format using a single API. Registering
CopyToRoutine and CopyFromRoutine separately seems redundant to me.
>
> BTW, what do you think about my answer (one feature by one
> operation API is more extendable API) for your question
> (extendability and API compatibility)?
Could you provide some examples? It seems to me that even if we
provide the single API for the registration we can provide other APIs
differently. For example, if we want to provide an API to register a
custom option, we can provide RegisterCopyToOption() and
RegisterCopyFromOption().
>
> > Suppose that extension-A implements only CopyToRoutine for the
> > custom-format-X with the format name 'myformat' and extension-B
> > implements only CopyFromRoutine for the custom-format-Y with the same
> > name, if users load both extension-A and extension-B, it seems to me
> > that extension-A registers the custom-format-X format as 'myformat'
> > only with CopyToRoutine, and extension-B overwrites the 'myformat'
> > registration by adding custom-format-Y's CopyFromRoutine. However, if
> > users register extension-C that implements both routines with the
> > format name 'myformat', they can register neither extension-A nor
> > extension-B, which seems to me that we don't allow overwriting the
> > registration in this case.
>
> Do you assume that users use extension-A, extension-B and
> extension-C without reading their documentation? If users
> read their documentation before users use them, users can
> know all of them use the same format name 'myformat' and
> which extension provides Copy{From,To}Routine.
>
> In this case, these users (who don't read documentation)
> will be confused with the RegisterCopyRoutine("new-format",
> NewFormatFromRoutine, NewFormatToRoutine) API too. Do we
> really need to care about this case?
My point is about the consistency of registration behavior. I think
that we should raise an error if the custom format name that an
extension tries to register already exists. Therefore I'm not sure why
installing extension-A+B is okay but installing extension-C+A or
extension-C+B is not okay? We can think that's an extension-A's choice
not to implement CopyFromRoutine for the 'myformat' format so
extension-B should not change it.
>
> > I think the core issue appears to be the internal management of custom
> > format entries but the current patch does enable registration
> > overwriting in the former case (extension-A and extension-B case).
>
> This is the same behavior as existing custom EXPLAIN option
> implementation. Should we use different behavior here?
I think that unlike custom EXPLAIN options, it's better to raise an
error or a warning if the custom format name (or combination of format
name and COPY direction) that an extension tries to register already
exists.
> >> >>    FYI: RegisterCopy{From,To}Routine() uses the same logic
> >> >>    as RegisterExtensionExplainOption().
> >> >
> >> > I'm concerned that the patch has duplicated logics for the
> >> > registration of COPY FROM and COPY TO.
> >>
> >> We can implement a convenient routine that can be used for
> >> RegisterExtensionExplainOption() and
> >> RegisterCopy{From,To}Routine() if it's needed.
> >
> > I meant there are duplicated codes in COPY FROM and COPY TO. For
> > instance, RegisterCopyFromRoutine() and RegisterCopyToRoutine() have
> > the same logic.
>
> Yes, I understand it. I wanted to say that we can remove the
> duplicated codes by introducing a RegisterSomething()
> function that can be used by
> RegisterExtensionExplainOption() and
> RegisterCopy{From,To}Routine():
>
> void
> RegisterSomething(...)
> {
>   /* Common codes in RegisterExtensionExplainOption() and
>      RegisterCopy{From,To}Routine()
>      ...
>    */
> }
>
> void
> RegisterExtensionExplainOption(...)
> {
>   RegisterSomething(...);
> }
>
> void
> RegisterCopyFromRoutine(...)
> {
>   RegisterSomething(...);
> }
>
> void
> RegisterCopyToRoutine(...)
> {
>   RegisterSomething(...);
> }
>
> You think that this approach can't remove the duplicated
> codes, right?
Well, no, I just meant we don't need to do that. Custom EXPLAIN option
and custom COPY format are different features and have different
requirements. I think while we don't need to remove duplicates between
them at least at this stage we need to remove the duplicate between
COPY TO registration code and COPY TO's one.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
			
		Hi,
In <CAD21AoC19fV5Ujs-1r24MNU+hwTQUeZMEnaJDjSFwHLMMdFi0Q@mail.gmail.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Wed, 25 Jun 2025 00:48:46 +0900,
  Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> >> It's natural to add more related APIs with this
>> >> approach. The single registration API provides one feature
>> >> by one operation. If we use the RegisterCopyRoutine() for
>> >> FROM and TO formats API, it's not natural that we add more
>> >> related APIs. In this case, some APIs may provide multiple
>> >> features by one operation and other APIs may provide single
>> >> feature by one operation. Developers may be confused with
>> >> the API. For example, developers may think "what does mean
>> >> NULL here?" or "can we use NULL here?" for
>> >> "RegisterCopyRoutine("new-format", NewFormatFromRoutine,
>> >> NULL)".
>> >
>> > We can document it in the comment for the registration function.
>>
>> I think that API that can be understandable without the
>> additional note is better API than API that needs some
>> notes.
> 
> I don't see much difference in this case.
OK. It seems that we can't agree on which API is better.
I've implemented your idea as the v42 patch set. Can we
proceed this proposal with this approach? What is the next
step?
> No. I think that if extensions are likely to support both
> CopyToRoutine and CopyFromRoutine in most cases, it would be simpler
> to register the custom format using a single API. Registering
> CopyToRoutine and CopyFromRoutine separately seems redundant to me.
I don't think so. In general, extensions are implemented
step by step. Extension developers will not implement
CopyToRoutine and CopyFromRoutine at once even if extensions
implement both of CopyToRoutine and CopyFromRoutine
eventually.
> Could you provide some examples? It seems to me that even if we
> provide the single API for the registration we can provide other APIs
> differently. For example, if we want to provide an API to register a
> custom option, we can provide RegisterCopyToOption() and
> RegisterCopyFromOption().
Yes. We can mix different style APIs. In general, consistent
style APIs is easier to use than mixed style APIs. If it's
not an important point in PostgreSQL API design, my point is
meaningless. (Sorry, I'm not familiar with PostgreSQL API
design.)
> My point is about the consistency of registration behavior. I think
> that we should raise an error if the custom format name that an
> extension tries to register already exists. Therefore I'm not sure why
> installing extension-A+B is okay but installing extension-C+A or
> extension-C+B is not okay? We can think that's an extension-A's choice
> not to implement CopyFromRoutine for the 'myformat' format so
> extension-B should not change it.
I think that it's the users' responsibility. I think that
it's more convenient that users can mix extension-A+B (A
provides only TO format and B provides only FROM format)
than users can't mix them. I think that extension-A doesn't
want to prohibit FROM format in the case. Extension-A just
doesn't care about FROM format.
FYI: Both of extension-C+A and extension-C+B are OK when we
update not raising an error existing format.
Thanks,
-- 
kou
From 48b99b69b4be26bf6a4e2525d3de28a96e2b241a Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Mon, 25 Nov 2024 13:58:33 +0900
Subject: [PATCH v42 1/2] Export CopyDest as private data
This is a preparation to export CopyToStateData as private data.
CopyToStateData depends on CopyDest. So we need to export CopyDest
too.
But CopyDest and CopySource has the same names. So we can't export
CopyDest as-is.
This uses the COPY_DEST_ prefix for CopyDest enum values. CopySource
uses the COPY_FROM_ prefix for consistency.
---
 src/backend/commands/copyfrom.c          |  4 ++--
 src/backend/commands/copyfromparse.c     | 10 ++++----
 src/backend/commands/copyto.c            | 30 ++++++++----------------
 src/include/commands/copyfrom_internal.h |  8 +++----
 src/include/commands/copyto_internal.h   | 28 ++++++++++++++++++++++
 5 files changed, 49 insertions(+), 31 deletions(-)
 create mode 100644 src/include/commands/copyto_internal.h
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index fbbbc09a97b..b4dad744547 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1709,7 +1709,7 @@ BeginCopyFrom(ParseState *pstate,
                             pg_encoding_to_char(GetDatabaseEncoding()))));
     }
 
-    cstate->copy_src = COPY_FILE;    /* default */
+    cstate->copy_src = COPY_SOURCE_FILE;    /* default */
 
     cstate->whereClause = whereClause;
 
@@ -1837,7 +1837,7 @@ BeginCopyFrom(ParseState *pstate,
     if (data_source_cb)
     {
         progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK;
-        cstate->copy_src = COPY_CALLBACK;
+        cstate->copy_src = COPY_SOURCE_CALLBACK;
         cstate->data_source_cb = data_source_cb;
     }
     else if (pipe)
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index f5fc346e201..9f7171d1478 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -180,7 +180,7 @@ ReceiveCopyBegin(CopyFromState cstate)
     for (i = 0; i < natts; i++)
         pq_sendint16(&buf, format); /* per-column formats */
     pq_endmessage(&buf);
-    cstate->copy_src = COPY_FRONTEND;
+    cstate->copy_src = COPY_SOURCE_FRONTEND;
     cstate->fe_msgbuf = makeStringInfo();
     /* We *must* flush here to ensure FE knows it can send. */
     pq_flush();
@@ -248,7 +248,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
 
     switch (cstate->copy_src)
     {
-        case COPY_FILE:
+        case COPY_SOURCE_FILE:
             bytesread = fread(databuf, 1, maxread, cstate->copy_file);
             if (ferror(cstate->copy_file))
                 ereport(ERROR,
@@ -257,7 +257,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
             if (bytesread == 0)
                 cstate->raw_reached_eof = true;
             break;
-        case COPY_FRONTEND:
+        case COPY_SOURCE_FRONTEND:
             while (maxread > 0 && bytesread < minread && !cstate->raw_reached_eof)
             {
                 int            avail;
@@ -340,7 +340,7 @@ CopyGetData(CopyFromState cstate, void *databuf, int minread, int maxread)
                 bytesread += avail;
             }
             break;
-        case COPY_CALLBACK:
+        case COPY_SOURCE_CALLBACK:
             bytesread = cstate->data_source_cb(databuf, minread, maxread);
             break;
     }
@@ -1172,7 +1172,7 @@ CopyReadLine(CopyFromState cstate, bool is_csv)
          * after \. up to the protocol end of copy data.  (XXX maybe better
          * not to treat \. as special?)
          */
-        if (cstate->copy_src == COPY_FRONTEND)
+        if (cstate->copy_src == COPY_SOURCE_FRONTEND)
         {
             int            inbytes;
 
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index ea6f18f2c80..99aec9c4c48 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -20,6 +20,7 @@
 
 #include "access/tableam.h"
 #include "commands/copyapi.h"
+#include "commands/copyto_internal.h"
 #include "commands/progress.h"
 #include "executor/execdesc.h"
 #include "executor/executor.h"
@@ -36,17 +37,6 @@
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
-/*
- * Represents the different dest cases we need to worry about at
- * the bottom level
- */
-typedef enum CopyDest
-{
-    COPY_FILE,                    /* to file (or a piped program) */
-    COPY_FRONTEND,                /* to frontend */
-    COPY_CALLBACK,                /* to callback function */
-} CopyDest;
-
 /*
  * This struct contains all the state variables used throughout a COPY TO
  * operation.
@@ -69,7 +59,7 @@ typedef struct CopyToStateData
 
     /* low-level state data */
     CopyDest    copy_dest;        /* type of copy source/destination */
-    FILE       *copy_file;        /* used if copy_dest == COPY_FILE */
+    FILE       *copy_file;        /* used if copy_dest == COPY_DEST_FILE */
     StringInfo    fe_msgbuf;        /* used for all dests during COPY TO */
 
     int            file_encoding;    /* file or remote side's character encoding */
@@ -401,7 +391,7 @@ SendCopyBegin(CopyToState cstate)
     for (i = 0; i < natts; i++)
         pq_sendint16(&buf, format); /* per-column formats */
     pq_endmessage(&buf);
-    cstate->copy_dest = COPY_FRONTEND;
+    cstate->copy_dest = COPY_DEST_FRONTEND;
 }
 
 static void
@@ -448,7 +438,7 @@ CopySendEndOfRow(CopyToState cstate)
 
     switch (cstate->copy_dest)
     {
-        case COPY_FILE:
+        case COPY_DEST_FILE:
             if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1,
                        cstate->copy_file) != 1 ||
                 ferror(cstate->copy_file))
@@ -482,11 +472,11 @@ CopySendEndOfRow(CopyToState cstate)
                              errmsg("could not write to COPY file: %m")));
             }
             break;
-        case COPY_FRONTEND:
+        case COPY_DEST_FRONTEND:
             /* Dump the accumulated row as one CopyData message */
             (void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len);
             break;
-        case COPY_CALLBACK:
+        case COPY_DEST_CALLBACK:
             cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len);
             break;
     }
@@ -507,7 +497,7 @@ CopySendTextLikeEndOfRow(CopyToState cstate)
 {
     switch (cstate->copy_dest)
     {
-        case COPY_FILE:
+        case COPY_DEST_FILE:
             /* Default line termination depends on platform */
 #ifndef WIN32
             CopySendChar(cstate, '\n');
@@ -515,7 +505,7 @@ CopySendTextLikeEndOfRow(CopyToState cstate)
             CopySendString(cstate, "\r\n");
 #endif
             break;
-        case COPY_FRONTEND:
+        case COPY_DEST_FRONTEND:
             /* The FE/BE protocol uses \n as newline for all platforms */
             CopySendChar(cstate, '\n');
             break;
@@ -902,12 +892,12 @@ BeginCopyTo(ParseState *pstate,
     /* See Multibyte encoding comment above */
     cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding);
 
-    cstate->copy_dest = COPY_FILE;    /* default */
+    cstate->copy_dest = COPY_DEST_FILE; /* default */
 
     if (data_dest_cb)
     {
         progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK;
-        cstate->copy_dest = COPY_CALLBACK;
+        cstate->copy_dest = COPY_DEST_CALLBACK;
         cstate->data_dest_cb = data_dest_cb;
     }
     else if (pipe)
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index c8b22af22d8..24157e11a73 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -24,9 +24,9 @@
  */
 typedef enum CopySource
 {
-    COPY_FILE,                    /* from file (or a piped program) */
-    COPY_FRONTEND,                /* from frontend */
-    COPY_CALLBACK,                /* from callback function */
+    COPY_SOURCE_FILE,            /* from file (or a piped program) */
+    COPY_SOURCE_FRONTEND,        /* from frontend */
+    COPY_SOURCE_CALLBACK,        /* from callback function */
 } CopySource;
 
 /*
@@ -64,7 +64,7 @@ typedef struct CopyFromStateData
     /* low-level state data */
     CopySource    copy_src;        /* type of copy source */
     FILE       *copy_file;        /* used if copy_src == COPY_FILE */
-    StringInfo    fe_msgbuf;        /* used if copy_src == COPY_FRONTEND */
+    StringInfo    fe_msgbuf;        /* used if copy_src == COPY_SOURCE_FRONTEND */
 
     EolType        eol_type;        /* EOL type of input */
     int            file_encoding;    /* file or remote side's character encoding */
diff --git a/src/include/commands/copyto_internal.h b/src/include/commands/copyto_internal.h
new file mode 100644
index 00000000000..42ddb37a8a2
--- /dev/null
+++ b/src/include/commands/copyto_internal.h
@@ -0,0 +1,28 @@
+/*-------------------------------------------------------------------------
+ *
+ * copyto_internal.h
+ *      Internal definitions for COPY TO command.
+ *
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/commands/copyto_internal.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef COPYTO_INTERNAL_H
+#define COPYTO_INTERNAL_H
+
+/*
+ * Represents the different dest cases we need to worry about at
+ * the bottom level
+ */
+typedef enum CopyDest
+{
+    COPY_DEST_FILE,                /* to file (or a piped program) */
+    COPY_DEST_FRONTEND,            /* to frontend */
+    COPY_DEST_CALLBACK,            /* to callback function */
+} CopyDest;
+
+#endif                            /* COPYTO_INTERNAL_H */
-- 
2.49.0
From c71a8265d3fd81317e22494cdc25de4ffb0378d8 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Wed, 18 Jun 2025 11:43:09 +0900
Subject: [PATCH v42 2/2] Add support for registering COPY {FROM,TO} routines
This uses the C API approach like custom EXPLAIN option. Some of this
are based on the custom EXPLAIN option implementations.
This approach provides C API to register COPY {FROM,TO} routines:
    void RegisterCopyRoutine(const char *name,
                             const CopyFromRoutine *from_routine,
                             const CopyToRoutine *to_routine);
(This is based on RegisterExtensionExplainOption().)
This assigns an ID for each name internally but the current API set
doesn't provide it to users. Because it's not needed for now. If it's
needed, we can provide APIs for it like GetExplainExtensionId() for
custom EXPLAIN option.
This manages registered COPY {FROM,TO} routines in a
process. Registered COPY {FROM,TO} routines aren't shared in multiple
processes because it's not needed for now. We may revisit it when we
need it.
---
 src/backend/commands/copy.c                   | 120 +++++++++++++++++-
 src/backend/commands/copyfrom.c               |  27 ++--
 src/backend/commands/copyto.c                 |  80 +++---------
 src/include/commands/copy.h                   |   8 +-
 src/include/commands/copyapi.h                |   5 +
 src/include/commands/copyto_internal.h        |  55 ++++++++
 src/test/modules/Makefile                     |   1 +
 src/test/modules/meson.build                  |   1 +
 src/test/modules/test_copy_format/.gitignore  |   4 +
 src/test/modules/test_copy_format/Makefile    |  21 +++
 .../expected/test_copy_format.out             |  19 +++
 src/test/modules/test_copy_format/meson.build |  29 +++++
 .../test_copy_format/sql/test_copy_format.sql |   8 ++
 .../test_copy_format/test_copy_format.c       |  91 +++++++++++++
 .../test_copy_format/test_copy_format.conf    |   1 +
 15 files changed, 392 insertions(+), 78 deletions(-)
 create mode 100644 src/test/modules/test_copy_format/.gitignore
 create mode 100644 src/test/modules/test_copy_format/Makefile
 create mode 100644 src/test/modules/test_copy_format/expected/test_copy_format.out
 create mode 100644 src/test/modules/test_copy_format/meson.build
 create mode 100644 src/test/modules/test_copy_format/sql/test_copy_format.sql
 create mode 100644 src/test/modules/test_copy_format/test_copy_format.c
 create mode 100644 src/test/modules/test_copy_format/test_copy_format.conf
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 74ae42b19a7..60c00b9698b 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -22,7 +22,7 @@
 #include "access/table.h"
 #include "access/xact.h"
 #include "catalog/pg_authid.h"
-#include "commands/copy.h"
+#include "commands/copyapi.h"
 #include "commands/defrem.h"
 #include "executor/executor.h"
 #include "mb/pg_wchar.h"
@@ -39,6 +39,107 @@
 #include "utils/rel.h"
 #include "utils/rls.h"
 
+/*
+ * Manage registered COPY routines in a process. They aren't shared in
+ * multiple processes for now. We may do it later when it's needed.
+ */
+
+typedef struct
+{
+    const char *name;
+    const CopyFromRoutine *from_routine;
+    const CopyToRoutine *to_routine;
+}            CopyRoutineEntry;
+
+static CopyRoutineEntry * CopyRoutineEntries = NULL;
+static int    CopyRoutineEntriesAssigned = 0;
+static int    CopyRoutineEntriesAllocated = 0;
+
+/*
+ * Register new COPY routines for the given name.
+ *
+ * When name is used as a COPY format name, routine will be used to process
+ * the COPY request. See CopyFromRoutine and CopyToRoutine how to implement a
+ * COPY routine.
+ *
+ * If name is already registered, an error is raised.
+ *
+ * name is assumed to be a constant string or allocated in storage that will
+ * never be freed.
+ *
+ * from_routine and to_routine are assumed to be allocated in storage that
+ * will never be freed.
+ */
+void
+RegisterCopyRoutine(const char *name, const CopyFromRoutine *from_routine, const CopyToRoutine *to_routine)
+{
+    CopyRoutineEntry *entry;
+
+    /* Search for an existing routine by this name; if found, raise an error. */
+    for (int i = 0; i < CopyRoutineEntriesAssigned; ++i)
+    {
+        if (strcmp(CopyRoutineEntries[i].name, name) == 0)
+        {
+            ereport(ERROR,
+                    errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                    errmsg("cannot register the same COPY format name: %s", name));
+        }
+    }
+
+    /* If there is no array yet, create one. */
+    if (!CopyRoutineEntries)
+    {
+        CopyRoutineEntriesAllocated = 16;
+        CopyRoutineEntries =
+            MemoryContextAlloc(TopMemoryContext,
+                               sizeof(CopyRoutineEntry) * CopyRoutineEntriesAllocated);
+    }
+
+    /* If there's an array but it's currently full, expand it. */
+    if (CopyRoutineEntriesAssigned >= CopyRoutineEntriesAllocated)
+    {
+        int            i = pg_nextpower2_32(CopyRoutineEntriesAssigned + 1);
+
+        CopyRoutineEntries =
+            repalloc(CopyRoutineEntries, sizeof(CopyRoutineEntry) * i);
+        CopyRoutineEntriesAllocated = i;
+    }
+
+    /* Assign new ID. */
+    entry = &CopyRoutineEntries[CopyRoutineEntriesAssigned++];
+    entry->name = name;
+    entry->from_routine = from_routine;
+    entry->to_routine = to_routine;
+}
+
+/*
+ * Find COPY routines for the given name.
+ *
+ * This returns true if the given name is registered, false otherwise.
+ *
+ * If the given name is registered, registered CopyFromRoutine and
+ * CopyToRoutine are set to output parameters, from_routine and to_routine.
+ * from_routine and to_routine can be NULL if they are not needed.
+ */
+bool
+FindCopyRoutine(const char *name, const CopyFromRoutine **from_routine,
+                const CopyToRoutine **to_routine)
+{
+    for (int i = 0; i < CopyRoutineEntriesAssigned; ++i)
+    {
+        if (strcmp(CopyRoutineEntries[i].name, name) == 0)
+        {
+            if (from_routine)
+                *from_routine = CopyRoutineEntries[i].from_routine;
+            if (to_routine)
+                *to_routine = CopyRoutineEntries[i].to_routine;
+            return true;
+        }
+    }
+
+    return false;
+}
+
 /*
  *     DoCopy executes the SQL COPY statement
  *
@@ -520,17 +621,28 @@ ProcessCopyOptions(ParseState *pstate,
         if (strcmp(defel->defname, "format") == 0)
         {
             char       *fmt = defGetString(defel);
+            bool        is_valid;
 
             if (format_specified)
                 errorConflictingDefElem(defel, pstate);
             format_specified = true;
-            if (strcmp(fmt, "text") == 0)
-                 /* default format */ ;
-            else if (strcmp(fmt, "csv") == 0)
+            if (strcmp(fmt, "csv") == 0)
                 opts_out->csv_mode = true;
             else if (strcmp(fmt, "binary") == 0)
                 opts_out->binary = true;
+
+            if (is_from)
+            {
+                opts_out->from_routine = GetCopyFromRoutine(fmt);
+                is_valid = opts_out->from_routine != NULL;
+            }
             else
+            {
+                opts_out->to_routine = GetCopyToRoutine(fmt);
+                is_valid = opts_out->to_routine != NULL;
+            }
+
+            if (!is_valid)
                 ereport(ERROR,
                         (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                          errmsg("COPY format \"%s\" not recognized", fmt),
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index b4dad744547..e872815f015 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -151,17 +151,22 @@ static const CopyFromRoutine CopyFromRoutineBinary = {
     .CopyFromEnd = CopyFromBinaryEnd,
 };
 
-/* Return a COPY FROM routine for the given options */
-static const CopyFromRoutine *
-CopyFromGetRoutine(const CopyFormatOptions *opts)
+/* Return a COPY FROM routine for the given name */
+const CopyFromRoutine *
+GetCopyFromRoutine(const char *name)
 {
-    if (opts->csv_mode)
-        return &CopyFromRoutineCSV;
-    else if (opts->binary)
-        return &CopyFromRoutineBinary;
+    const CopyFromRoutine *routine = NULL;
 
-    /* default is text */
-    return &CopyFromRoutineText;
+    if (strcmp(name, "text") == 0)
+        return &CopyFromRoutineText;
+    else if (strcmp(name, "csv") == 0)
+        return &CopyFromRoutineCSV;
+    else if (strcmp(name, "binary") == 0)
+        return &CopyFromRoutineBinary;
+    else if (FindCopyRoutine(name, &routine, NULL))
+        return routine;
+    else
+        return NULL;
 }
 
 /* Implementation of the start callback for text and CSV formats */
@@ -1574,7 +1579,9 @@ BeginCopyFrom(ParseState *pstate,
     ProcessCopyOptions(pstate, &cstate->opts, true /* is_from */ , options);
 
     /* Set the format routine */
-    cstate->routine = CopyFromGetRoutine(&cstate->opts);
+    cstate->routine = cstate->opts.from_routine;
+    if (!cstate->routine)
+        cstate->routine = &CopyFromRoutineText; /* default is text */
 
     /* Process the target relation */
     cstate->rel = rel;
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 99aec9c4c48..a54abbe6853 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -19,12 +19,9 @@
 #include <sys/stat.h>
 
 #include "access/tableam.h"
-#include "commands/copyapi.h"
 #include "commands/copyto_internal.h"
 #include "commands/progress.h"
-#include "executor/execdesc.h"
 #include "executor/executor.h"
-#include "executor/tuptable.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
 #include "mb/pg_wchar.h"
@@ -37,56 +34,6 @@
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
-/*
- * This struct contains all the state variables used throughout a COPY TO
- * operation.
- *
- * Multi-byte encodings: all supported client-side encodings encode multi-byte
- * characters by having the first byte's high bit set. Subsequent bytes of the
- * character can have the high bit not set. When scanning data in such an
- * encoding to look for a match to a single-byte (ie ASCII) character, we must
- * use the full pg_encoding_mblen() machinery to skip over multibyte
- * characters, else we might find a false match to a trailing byte. In
- * supported server encodings, there is no possibility of a false match, and
- * it's faster to make useless comparisons to trailing bytes than it is to
- * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true
- * when we have to do it the hard way.
- */
-typedef struct CopyToStateData
-{
-    /* format-specific routines */
-    const CopyToRoutine *routine;
-
-    /* low-level state data */
-    CopyDest    copy_dest;        /* type of copy source/destination */
-    FILE       *copy_file;        /* used if copy_dest == COPY_DEST_FILE */
-    StringInfo    fe_msgbuf;        /* used for all dests during COPY TO */
-
-    int            file_encoding;    /* file or remote side's character encoding */
-    bool        need_transcoding;    /* file encoding diff from server? */
-    bool        encoding_embeds_ascii;    /* ASCII can be non-first byte? */
-
-    /* parameters from the COPY command */
-    Relation    rel;            /* relation to copy to */
-    QueryDesc  *queryDesc;        /* executable query to copy from */
-    List       *attnumlist;        /* integer list of attnums to copy */
-    char       *filename;        /* filename, or NULL for STDOUT */
-    bool        is_program;        /* is 'filename' a program to popen? */
-    copy_data_dest_cb data_dest_cb; /* function for writing data */
-
-    CopyFormatOptions opts;
-    Node       *whereClause;    /* WHERE condition (or NULL) */
-
-    /*
-     * Working state
-     */
-    MemoryContext copycontext;    /* per-copy execution context */
-
-    FmgrInfo   *out_functions;    /* lookup info for output functions */
-    MemoryContext rowcontext;    /* per-row evaluation context */
-    uint64        bytes_processed;    /* number of bytes processed so far */
-} CopyToStateData;
-
 /* DestReceiver for COPY (query) TO */
 typedef struct
 {
@@ -162,17 +109,22 @@ static const CopyToRoutine CopyToRoutineBinary = {
     .CopyToEnd = CopyToBinaryEnd,
 };
 
-/* Return a COPY TO routine for the given options */
-static const CopyToRoutine *
-CopyToGetRoutine(const CopyFormatOptions *opts)
+/* Return a COPY TO routine for the given name */
+const CopyToRoutine *
+GetCopyToRoutine(const char *name)
 {
-    if (opts->csv_mode)
-        return &CopyToRoutineCSV;
-    else if (opts->binary)
-        return &CopyToRoutineBinary;
+    const CopyToRoutine *routine = NULL;
 
-    /* default is text */
-    return &CopyToRoutineText;
+    if (strcmp(name, "text") == 0)
+        return &CopyToRoutineText;
+    else if (strcmp(name, "csv") == 0)
+        return &CopyToRoutineCSV;
+    else if (strcmp(name, "binary") == 0)
+        return &CopyToRoutineBinary;
+    else if (FindCopyRoutine(name, NULL, &routine))
+        return routine;
+    else
+        return NULL;
 }
 
 /* Implementation of the start callback for text and CSV formats */
@@ -693,7 +645,9 @@ BeginCopyTo(ParseState *pstate,
     ProcessCopyOptions(pstate, &cstate->opts, false /* is_from */ , options);
 
     /* Set format routine */
-    cstate->routine = CopyToGetRoutine(&cstate->opts);
+    cstate->routine = cstate->opts.to_routine;
+    if (!cstate->routine)
+        cstate->routine = &CopyToRoutineText;    /* default is text */
 
     /* Process the source/target relation or query */
     if (rel)
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 06dfdfef721..64a9ced0de4 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -51,6 +51,10 @@ typedef enum CopyLogVerbosityChoice
     COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
 } CopyLogVerbosityChoice;
 
+/* These are in commands/copyapi.h */
+struct CopyFromRoutine;
+struct CopyToRoutine;
+
 /*
  * A struct to hold COPY options, in a parsed form. All of these are related
  * to formatting, except for 'freeze', which doesn't really belong here, but
@@ -87,9 +91,11 @@ typedef struct CopyFormatOptions
     CopyLogVerbosityChoice log_verbosity;    /* verbosity of logged messages */
     int64        reject_limit;    /* maximum tolerable number of errors */
     List       *convert_select; /* list of column names (can be NIL) */
+    const struct CopyFromRoutine *from_routine; /* routine for FROM */
+    const struct CopyToRoutine *to_routine; /* routine for TO */
 } CopyFormatOptions;
 
-/* These are private in commands/copy[from|to].c */
+/* These are private in commands/copy[from|to]_internal.h */
 typedef struct CopyFromStateData *CopyFromState;
 typedef struct CopyToStateData *CopyToState;
 
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 2a2d2f9876b..e06ab93eaff 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -102,4 +102,9 @@ typedef struct CopyFromRoutine
     void        (*CopyFromEnd) (CopyFromState cstate);
 } CopyFromRoutine;
 
+extern void RegisterCopyRoutine(const char *name, const CopyFromRoutine *from_routine, const CopyToRoutine
*to_routine);
+extern bool FindCopyRoutine(const char *name, const CopyFromRoutine **from_routine, const CopyToRoutine
**to_routine);
+extern const CopyToRoutine *GetCopyToRoutine(const char *name);
+extern const CopyFromRoutine *GetCopyFromRoutine(const char *name);
+
 #endif                            /* COPYAPI_H */
diff --git a/src/include/commands/copyto_internal.h b/src/include/commands/copyto_internal.h
index 42ddb37a8a2..9dbbbc592b7 100644
--- a/src/include/commands/copyto_internal.h
+++ b/src/include/commands/copyto_internal.h
@@ -14,6 +14,11 @@
 #ifndef COPYTO_INTERNAL_H
 #define COPYTO_INTERNAL_H
 
+#include "commands/copyapi.h"
+#include "executor/execdesc.h"
+#include "executor/tuptable.h"
+#include "nodes/execnodes.h"
+
 /*
  * Represents the different dest cases we need to worry about at
  * the bottom level
@@ -25,4 +30,54 @@ typedef enum CopyDest
     COPY_DEST_CALLBACK,            /* to callback function */
 } CopyDest;
 
+/*
+ * This struct contains all the state variables used throughout a COPY TO
+ * operation.
+ *
+ * Multi-byte encodings: all supported client-side encodings encode multi-byte
+ * characters by having the first byte's high bit set. Subsequent bytes of the
+ * character can have the high bit not set. When scanning data in such an
+ * encoding to look for a match to a single-byte (ie ASCII) character, we must
+ * use the full pg_encoding_mblen() machinery to skip over multibyte
+ * characters, else we might find a false match to a trailing byte. In
+ * supported server encodings, there is no possibility of a false match, and
+ * it's faster to make useless comparisons to trailing bytes than it is to
+ * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true
+ * when we have to do it the hard way.
+ */
+typedef struct CopyToStateData
+{
+    /* format-specific routines */
+    const CopyToRoutine *routine;
+
+    /* low-level state data */
+    CopyDest    copy_dest;        /* type of copy source/destination */
+    FILE       *copy_file;        /* used if copy_dest == COPY_DEST_FILE */
+    StringInfo    fe_msgbuf;        /* used for all dests during COPY TO */
+
+    int            file_encoding;    /* file or remote side's character encoding */
+    bool        need_transcoding;    /* file encoding diff from server? */
+    bool        encoding_embeds_ascii;    /* ASCII can be non-first byte? */
+
+    /* parameters from the COPY command */
+    Relation    rel;            /* relation to copy to */
+    QueryDesc  *queryDesc;        /* executable query to copy from */
+    List       *attnumlist;        /* integer list of attnums to copy */
+    char       *filename;        /* filename, or NULL for STDOUT */
+    bool        is_program;        /* is 'filename' a program to popen? */
+    copy_data_dest_cb data_dest_cb; /* function for writing data */
+
+    CopyFormatOptions opts;
+    Node       *whereClause;    /* WHERE condition (or NULL) */
+
+    /*
+     * Working state
+     */
+    MemoryContext copycontext;    /* per-copy execution context */
+
+    FmgrInfo   *out_functions;    /* lookup info for output functions */
+    MemoryContext rowcontext;    /* per-row evaluation context */
+    uint64        bytes_processed;    /* number of bytes processed so far */
+} CopyToStateData;
+
 #endif                            /* COPYTO_INTERNAL_H */
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index aa1d27bbed3..9bf5d58cdae 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -17,6 +17,7 @@ SUBDIRS = \
           test_aio \
           test_bloomfilter \
           test_copy_callbacks \
+          test_copy_format \
           test_custom_rmgrs \
           test_ddl_deparse \
           test_dsa \
diff --git a/src/test/modules/meson.build b/src/test/modules/meson.build
index 9de0057bd1d..5fd06de2737 100644
--- a/src/test/modules/meson.build
+++ b/src/test/modules/meson.build
@@ -16,6 +16,7 @@ subdir('ssl_passphrase_callback')
 subdir('test_aio')
 subdir('test_bloomfilter')
 subdir('test_copy_callbacks')
+subdir('test_copy_format')
 subdir('test_custom_rmgrs')
 subdir('test_ddl_deparse')
 subdir('test_dsa')
diff --git a/src/test/modules/test_copy_format/.gitignore b/src/test/modules/test_copy_format/.gitignore
new file mode 100644
index 00000000000..5dcb3ff9723
--- /dev/null
+++ b/src/test/modules/test_copy_format/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/src/test/modules/test_copy_format/Makefile b/src/test/modules/test_copy_format/Makefile
new file mode 100644
index 00000000000..85dce14ebb3
--- /dev/null
+++ b/src/test/modules/test_copy_format/Makefile
@@ -0,0 +1,21 @@
+# src/test/modules/test_copy_format/Makefile
+
+MODULE_big = test_copy_format
+OBJS = \
+    $(WIN32RES) \
+    test_copy_format.o
+PGFILEDESC = "test_copy_format - test custom COPY FORMAT"
+
+REGRESS = test_copy_format
+REGRESS_OPTS = --temp-config $(top_srcdir)/src/test/modules/test_copy_format/test_copy_format.conf
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/test_copy_format
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/test_copy_format/expected/test_copy_format.out
b/src/test/modules/test_copy_format/expected/test_copy_format.out
new file mode 100644
index 00000000000..163ff94fa41
--- /dev/null
+++ b/src/test/modules/test_copy_format/expected/test_copy_format.out
@@ -0,0 +1,19 @@
+CREATE TABLE copy_data (a smallint, b integer, c bigint);
+INSERT INTO copy_data VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+COPY copy_data FROM stdin WITH (FORMAT 'test_copy_format');
+NOTICE:  CopyFromInFunc: attribute: smallint
+NOTICE:  CopyFromInFunc: attribute: integer
+NOTICE:  CopyFromInFunc: attribute: bigint
+NOTICE:  CopyFromStart: the number of attributes: 3
+NOTICE:  CopyFromOneRow
+NOTICE:  CopyFromEnd
+COPY copy_data TO stdout WITH (FORMAT 'test_copy_format');
+NOTICE:  CopyToOutFunc: attribute: smallint
+NOTICE:  CopyToOutFunc: attribute: integer
+NOTICE:  CopyToOutFunc: attribute: bigint
+NOTICE:  CopyToStart: the number of attributes: 3
+NOTICE:  CopyToOneRow: the number of valid values: 3
+NOTICE:  CopyToOneRow: the number of valid values: 3
+NOTICE:  CopyToOneRow: the number of valid values: 3
+NOTICE:  CopyToEnd
+DROP TABLE copy_data;
diff --git a/src/test/modules/test_copy_format/meson.build b/src/test/modules/test_copy_format/meson.build
new file mode 100644
index 00000000000..723c51d3f45
--- /dev/null
+++ b/src/test/modules/test_copy_format/meson.build
@@ -0,0 +1,29 @@
+# Copyright (c) 2025, PostgreSQL Global Development Group
+
+test_copy_format_sources = files(
+  'test_copy_format.c',
+)
+
+if host_system == 'windows'
+  test_copy_format_sources += rc_lib_gen.process(win32ver_rc, extra_args: [
+    '--NAME', 'test_copy_format',
+    '--FILEDESC', 'test_copy_format - test custom COPY FORMAT',])
+endif
+
+test_copy_format = shared_module('test_copy_format',
+  test_copy_format_sources,
+  kwargs: pg_test_mod_args,
+)
+test_install_libs += test_copy_format
+
+tests += {
+  'name': 'test_copy_format',
+  'sd': meson.current_source_dir(),
+  'bd': meson.current_build_dir(),
+  'regress': {
+    'sql': [
+      'test_copy_format',
+    ],
+    'regress_args': ['--temp-config', files('test_copy_format.conf')],
+  },
+}
diff --git a/src/test/modules/test_copy_format/sql/test_copy_format.sql
b/src/test/modules/test_copy_format/sql/test_copy_format.sql
new file mode 100644
index 00000000000..6d60a493e0e
--- /dev/null
+++ b/src/test/modules/test_copy_format/sql/test_copy_format.sql
@@ -0,0 +1,8 @@
+CREATE TABLE copy_data (a smallint, b integer, c bigint);
+INSERT INTO copy_data VALUES (1, 2, 3), (12, 34, 56), (123, 456, 789);
+
+COPY copy_data FROM stdin WITH (FORMAT 'test_copy_format');
+\.
+COPY copy_data TO stdout WITH (FORMAT 'test_copy_format');
+
+DROP TABLE copy_data;
diff --git a/src/test/modules/test_copy_format/test_copy_format.c
b/src/test/modules/test_copy_format/test_copy_format.c
new file mode 100644
index 00000000000..dd63ccf2e0f
--- /dev/null
+++ b/src/test/modules/test_copy_format/test_copy_format.c
@@ -0,0 +1,91 @@
+/*--------------------------------------------------------------------------
+ *
+ * test_copy_format.c
+ *        Code for testing custom COPY format.
+ *
+ * Portions Copyright (c) 2025, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *        src/test/modules/test_copy_format/test_copy_format.c
+ *
+ * -------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "commands/copyapi.h"
+#include "commands/defrem.h"
+#include "utils/builtins.h"
+
+PG_MODULE_MAGIC;
+
+static void
+TestCopyFromInFunc(CopyFromState cstate, Oid atttypid,
+                   FmgrInfo *finfo, Oid *typioparam)
+{
+    ereport(NOTICE, (errmsg("CopyFromInFunc: attribute: %s", format_type_be(atttypid))));
+}
+
+static void
+TestCopyFromStart(CopyFromState cstate, TupleDesc tupDesc)
+{
+    ereport(NOTICE, (errmsg("CopyFromStart: the number of attributes: %d", tupDesc->natts)));
+}
+
+static bool
+TestCopyFromOneRow(CopyFromState cstate, ExprContext *econtext, Datum *values, bool *nulls)
+{
+    ereport(NOTICE, (errmsg("CopyFromOneRow")));
+    return false;
+}
+
+static void
+TestCopyFromEnd(CopyFromState cstate)
+{
+    ereport(NOTICE, (errmsg("CopyFromEnd")));
+}
+
+static const CopyFromRoutine CopyFromRoutineTestCopyFormat = {
+    .CopyFromInFunc = TestCopyFromInFunc,
+    .CopyFromStart = TestCopyFromStart,
+    .CopyFromOneRow = TestCopyFromOneRow,
+    .CopyFromEnd = TestCopyFromEnd,
+};
+
+static void
+TestCopyToOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo)
+{
+    ereport(NOTICE, (errmsg("CopyToOutFunc: attribute: %s", format_type_be(atttypid))));
+}
+
+static void
+TestCopyToStart(CopyToState cstate, TupleDesc tupDesc)
+{
+    ereport(NOTICE, (errmsg("CopyToStart: the number of attributes: %d", tupDesc->natts)));
+}
+
+static void
+TestCopyToOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+    ereport(NOTICE, (errmsg("CopyToOneRow: the number of valid values: %u", slot->tts_nvalid)));
+}
+
+static void
+TestCopyToEnd(CopyToState cstate)
+{
+    ereport(NOTICE, (errmsg("CopyToEnd")));
+}
+
+static const CopyToRoutine CopyToRoutineTestCopyFormat = {
+    .CopyToOutFunc = TestCopyToOutFunc,
+    .CopyToStart = TestCopyToStart,
+    .CopyToOneRow = TestCopyToOneRow,
+    .CopyToEnd = TestCopyToEnd,
+};
+
+void
+_PG_init(void)
+{
+    RegisterCopyRoutine("test_copy_format", &CopyFromRoutineTestCopyFormat,
+                        &CopyToRoutineTestCopyFormat);
+}
diff --git a/src/test/modules/test_copy_format/test_copy_format.conf
b/src/test/modules/test_copy_format/test_copy_format.conf
new file mode 100644
index 00000000000..743a02ac92a
--- /dev/null
+++ b/src/test/modules/test_copy_format/test_copy_format.conf
@@ -0,0 +1 @@
+shared_preload_libraries = 'test_copy_format'
-- 
2.49.0
			
		On Wed, Jun 25, 2025 at 4:35 PM Sutou Kouhei <kou@clear-code.com> wrote:
>
> Hi,
>
> In <CAD21AoC19fV5Ujs-1r24MNU+hwTQUeZMEnaJDjSFwHLMMdFi0Q@mail.gmail.com>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Wed, 25 Jun 2025 00:48:46 +0900,
>   Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> >> >> It's natural to add more related APIs with this
> >> >> approach. The single registration API provides one feature
> >> >> by one operation. If we use the RegisterCopyRoutine() for
> >> >> FROM and TO formats API, it's not natural that we add more
> >> >> related APIs. In this case, some APIs may provide multiple
> >> >> features by one operation and other APIs may provide single
> >> >> feature by one operation. Developers may be confused with
> >> >> the API. For example, developers may think "what does mean
> >> >> NULL here?" or "can we use NULL here?" for
> >> >> "RegisterCopyRoutine("new-format", NewFormatFromRoutine,
> >> >> NULL)".
> >> >
> >> > We can document it in the comment for the registration function.
> >>
> >> I think that API that can be understandable without the
> >> additional note is better API than API that needs some
> >> notes.
> >
> > I don't see much difference in this case.
>
> OK. It seems that we can't agree on which API is better.
>
> I've implemented your idea as the v42 patch set. Can we
> proceed this proposal with this approach? What is the next
> step?
I'll review the patches. In the meanwhile could you update the
documentation accordingly?
>
> > No. I think that if extensions are likely to support both
> > CopyToRoutine and CopyFromRoutine in most cases, it would be simpler
> > to register the custom format using a single API. Registering
> > CopyToRoutine and CopyFromRoutine separately seems redundant to me.
>
> I don't think so. In general, extensions are implemented
> step by step. Extension developers will not implement
> CopyToRoutine and CopyFromRoutine at once even if extensions
> implement both of CopyToRoutine and CopyFromRoutine
> eventually.
Hmm, I think if the extension eventually implements both directions,
it would make sense to provide the single API.
>
> > Could you provide some examples? It seems to me that even if we
> > provide the single API for the registration we can provide other APIs
> > differently. For example, if we want to provide an API to register a
> > custom option, we can provide RegisterCopyToOption() and
> > RegisterCopyFromOption().
>
> Yes. We can mix different style APIs. In general, consistent
> style APIs is easier to use than mixed style APIs. If it's
> not an important point in PostgreSQL API design, my point is
> meaningless. (Sorry, I'm not familiar with PostgreSQL API
> design.)
As far as I know, there is no standard for PostgreSQL API design, but
I don't find any weirdness in this design.
>
> > My point is about the consistency of registration behavior. I think
> > that we should raise an error if the custom format name that an
> > extension tries to register already exists. Therefore I'm not sure why
> > installing extension-A+B is okay but installing extension-C+A or
> > extension-C+B is not okay? We can think that's an extension-A's choice
> > not to implement CopyFromRoutine for the 'myformat' format so
> > extension-B should not change it.
>
> I think that it's the users' responsibility. I think that
> it's more convenient that users can mix extension-A+B (A
> provides only TO format and B provides only FROM format)
> than users can't mix them. I think that extension-A doesn't
> want to prohibit FROM format in the case. Extension-A just
> doesn't care about FROM format.
>
> FYI: Both of extension-C+A and extension-C+B are OK when we
> update not raising an error existing format.
I want to keep the basic design that one custom format comes from one
extension because it's straightforward for both of us and users and
easy to maintain format ID. IIUC we somewhat agreed on this design in
the previous API design (TABLESAMPLE like API).
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
			
		On Mon, Jun 30, 2025 at 3:00 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, Jun 25, 2025 at 4:35 PM Sutou Kouhei <kou@clear-code.com> wrote:
> >
> > Hi,
> >
> > In <CAD21AoC19fV5Ujs-1r24MNU+hwTQUeZMEnaJDjSFwHLMMdFi0Q@mail.gmail.com>
> >   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Wed, 25 Jun 2025 00:48:46 +0900,
> >   Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > >> >> It's natural to add more related APIs with this
> > >> >> approach. The single registration API provides one feature
> > >> >> by one operation. If we use the RegisterCopyRoutine() for
> > >> >> FROM and TO formats API, it's not natural that we add more
> > >> >> related APIs. In this case, some APIs may provide multiple
> > >> >> features by one operation and other APIs may provide single
> > >> >> feature by one operation. Developers may be confused with
> > >> >> the API. For example, developers may think "what does mean
> > >> >> NULL here?" or "can we use NULL here?" for
> > >> >> "RegisterCopyRoutine("new-format", NewFormatFromRoutine,
> > >> >> NULL)".
> > >> >
> > >> > We can document it in the comment for the registration function.
> > >>
> > >> I think that API that can be understandable without the
> > >> additional note is better API than API that needs some
> > >> notes.
> > >
> > > I don't see much difference in this case.
> >
> > OK. It seems that we can't agree on which API is better.
> >
> > I've implemented your idea as the v42 patch set. Can we
> > proceed this proposal with this approach? What is the next
> > step?
>
> I'll review the patches.
I've reviewed the 0001 and 0002 patches. The API implemented in the
0002 patch looks good to me, but I'm concerned about the capsulation
of copy state data. With the v42 patches, we pass the whole
CopyToStateData to the extension codes, but most of the fields in
CopyToStateData are internal working state data that shouldn't be
exposed to extensions. I think we need to sort out which fields are
exposed or not. That way, it would be safer and we would be able to
avoid exposing copyto_internal.h and extensions would not need to
include copyfrom_internal.h.
I've implemented a draft patch for that idea. In the 0001 patch, I
moved fields that are related to internal working state from
CopyToStateData to CopyToExectuionData. COPY routine APIs pass a
pointer of CopyToStateData but extensions can access only fields
except for CopyToExectuionData. In the 0002 patch, I've implemented
the registration API and some related APIs based on your v42 patch.
I've made similar changes to COPY FROM codes too.
The patch is a very PoC phase and we would need to scrutinize the
fields that should or should not be exposed. Feedback is very welcome.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
			
		Вложения
Hi, In <CAD21AoB0Z3gkOGALK3pXYmGTWATVvgDAmn-yXGp2mX64S-YrSw@mail.gmail.com> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 14 Jul 2025 03:28:16 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > I've reviewed the 0001 and 0002 patches. The API implemented in the > 0002 patch looks good to me, but I'm concerned about the capsulation > of copy state data. With the v42 patches, we pass the whole > CopyToStateData to the extension codes, but most of the fields in > CopyToStateData are internal working state data that shouldn't be > exposed to extensions. I think we need to sort out which fields are > exposed or not. That way, it would be safer and we would be able to > avoid exposing copyto_internal.h and extensions would not need to > include copyfrom_internal.h. FYI: We discussed this so far. For example: https://www.postgresql.org/message-id/flat/CAD21AoD%3DUapH4Wh06G6H5XAzPJ0iJg9YcW8r7E2UEJkZ8QsosA%40mail.gmail.com > I think we can move CopyToState to copy.h and we don't > need to have set/get functions for its fields. https://www.postgresql.org/message-id/flat/CAD21AoBpWFU4k-_bwrTq0AkFSAdwQqhAsSW188STmu9HxLJ0nQ%40mail.gmail.com > > What does "private" mean here? I thought that it means that > > "PostgreSQL itself can use it". But it seems that you mean > > that "PostgreSQL itself and custom format extensions can use > > it but other extensions can't use it". > > > > I'm not familiar with "_internal.h" in PostgreSQL but is > > "_internal.h" for the latter "private" mean? > > My understanding is that we don't strictly prohibit _internal.h from > being included in out of core files. For example, file_fdw.c includes > copyfrom_internal.h in order to access some fields of CopyFromState. In general, I agree that we should export only needed information. How about adding accessors instead of splitting Copy{From,To}State to Copy{From,To}ExecutionData? If we use the accessors approach, we can export only needed information step by step without breaking ABI. The built-in formats can keep using Copy{From,To}State directly with the accessors approach. We can avoid any performance regression of the built-in formats. If we split Copy{From,To}State to Copy{From,To}ExecutionData, performance may be changed. > I've implemented a draft patch for that idea. In the 0001 patch, I > moved fields that are related to internal working state from > CopyToStateData to CopyToExectuionData. COPY routine APIs pass a > pointer of CopyToStateData but extensions can access only fields > except for CopyToExectuionData. In the 0002 patch, I've implemented > the registration API and some related APIs based on your v42 patch. > I've made similar changes to COPY FROM codes too. > > The patch is a very PoC phase and we would need to scrutinize the > fields that should or should not be exposed. Feedback is very welcome. Based on our sample extensions [1][2], the following fields may be minimal. I added "(*)" marks that exist in Copy{From,To}StateDate in your patch. Other fields exist in Copy{From,To}ExecutionData. We need to export them to extensions. We can hide fields in Copy{From,To}StateData not listed here. FROM: - attnumlist (*) - bytes_processed - cur_attname - escontext - in_functions (*) - input_buf - input_reached_eof - line_buf - opts (*) - raw_buf - raw_buf_index - raw_buf_len - rel (*) - typioparams (*) TO: - attnumlist (*) - fe_msgbuf - opts (*) [1] https://github.com/kou/pg-copy-arrow/ [2] https://github.com/MasahikoSawada/pg_copy_jsonlines/ Thanks, -- kou
Hi,
In <20250714.173803.865595983884510428.kou@clear-code.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 14 Jul 2025 17:38:03 +0900 (JST),
  Sutou Kouhei <kou@clear-code.com> wrote:
>> I've reviewed the 0001 and 0002 patches. The API implemented in the
>> 0002 patch looks good to me, but I'm concerned about the capsulation
>> of copy state data. With the v42 patches, we pass the whole
>> CopyToStateData to the extension codes, but most of the fields in
>> CopyToStateData are internal working state data that shouldn't be
>> exposed to extensions. I think we need to sort out which fields are
>> exposed or not. That way, it would be safer and we would be able to
>> avoid exposing copyto_internal.h and extensions would not need to
>> include copyfrom_internal.h.
> In general, I agree that we should export only needed
> information.
> 
> How about adding accessors instead of splitting
> Copy{From,To}State to Copy{From,To}ExecutionData? If we use
> the accessors approach, we can export only needed
> information step by step without breaking ABI.
Another idea: We'll add Copy{From,To}State::opaque
eventually. (For example, the v40-0003 patch includes it.)
How about using it to hide fields only for built-in formats?
Thanks,
-- 
kou
			
		Hi, On 2025-07-14 03:28:16 +0900, Masahiko Sawada wrote: > I've reviewed the 0001 and 0002 patches. The API implemented in the > 0002 patch looks good to me, but I'm concerned about the capsulation > of copy state data. With the v42 patches, we pass the whole > CopyToStateData to the extension codes, but most of the fields in > CopyToStateData are internal working state data that shouldn't be > exposed to extensions. I think we need to sort out which fields are > exposed or not. That way, it would be safer and we would be able to > avoid exposing copyto_internal.h and extensions would not need to > include copyfrom_internal.h. > > I've implemented a draft patch for that idea. In the 0001 patch, I > moved fields that are related to internal working state from > CopyToStateData to CopyToExectuionData. COPY routine APIs pass a > pointer of CopyToStateData but extensions can access only fields > except for CopyToExectuionData. In the 0002 patch, I've implemented > the registration API and some related APIs based on your v42 patch. > I've made similar changes to COPY FROM codes too. I've not followed the development of this patch - but I continue to be concerned about the performance impact it has as-is and the amount of COPY performance improvements it forecloses. This seems to add yet another layer of indirection to a lot of hot functions like CopyGetData() etc. Greetings, Andres Freund
On Tue, Jul 15, 2025 at 5:37 AM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2025-07-14 03:28:16 +0900, Masahiko Sawada wrote: > > I've reviewed the 0001 and 0002 patches. The API implemented in the > > 0002 patch looks good to me, but I'm concerned about the capsulation > > of copy state data. With the v42 patches, we pass the whole > > CopyToStateData to the extension codes, but most of the fields in > > CopyToStateData are internal working state data that shouldn't be > > exposed to extensions. I think we need to sort out which fields are > > exposed or not. That way, it would be safer and we would be able to > > avoid exposing copyto_internal.h and extensions would not need to > > include copyfrom_internal.h. > > > > I've implemented a draft patch for that idea. In the 0001 patch, I > > moved fields that are related to internal working state from > > CopyToStateData to CopyToExectuionData. COPY routine APIs pass a > > pointer of CopyToStateData but extensions can access only fields > > except for CopyToExectuionData. In the 0002 patch, I've implemented > > the registration API and some related APIs based on your v42 patch. > > I've made similar changes to COPY FROM codes too. > > I've not followed the development of this patch - but I continue to be > concerned about the performance impact it has as-is and the amount of COPY > performance improvements it forecloses. > > This seems to add yet another layer of indirection to a lot of hot functions > like CopyGetData() etc. > The most refactoring works have been done by commit 7717f6300 and 2e4127b6d with a slight performance gain. At this stage, we're trying to introduce the registration API so that extensions can provide their callbacks to the core. Some functions required for I/O such as CopyGetData() and CopySendEndOfRow() would be exposed but I'm not going to add additional indirection function call layers. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
On Tue, Jul 15, 2025 at 12:54 AM Sutou Kouhei <kou@clear-code.com> wrote:
>
> Hi,
>
> In <20250714.173803.865595983884510428.kou@clear-code.com>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 14 Jul 2025 17:38:03 +0900 (JST),
>   Sutou Kouhei <kou@clear-code.com> wrote:
>
> >> I've reviewed the 0001 and 0002 patches. The API implemented in the
> >> 0002 patch looks good to me, but I'm concerned about the capsulation
> >> of copy state data. With the v42 patches, we pass the whole
> >> CopyToStateData to the extension codes, but most of the fields in
> >> CopyToStateData are internal working state data that shouldn't be
> >> exposed to extensions. I think we need to sort out which fields are
> >> exposed or not. That way, it would be safer and we would be able to
> >> avoid exposing copyto_internal.h and extensions would not need to
> >> include copyfrom_internal.h.
>
> > In general, I agree that we should export only needed
> > information.
> >
> > How about adding accessors instead of splitting
> > Copy{From,To}State to Copy{From,To}ExecutionData? If we use
> > the accessors approach, we can export only needed
> > information step by step without breaking ABI.
Yeah, while it can export required fields without breaking ABI, I'm
concerned that setter and getter functions could be bloated if we need
to have them for many fields.
>
> Another idea: We'll add Copy{From,To}State::opaque
> eventually. (For example, the v40-0003 patch includes it.)
>
> How about using it to hide fields only for built-in formats?
What is the difference between your idea and splitting CopyToState
into CopyToState and CopyToExecutionData?
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
			
		Hi,
In <CAD21AoAQkjU=o0nX4y0jtX0BnsrqA04g2ABqrUwjT88YeEWarA@mail.gmail.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Thu, 17 Jul 2025 13:33:13 -0700,
  Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> I've not followed the development of this patch - but I continue to be
>> concerned about the performance impact it has as-is and the amount of COPY
>> performance improvements it forecloses.
>>
>> This seems to add yet another layer of indirection to a lot of hot functions
>> like CopyGetData() etc.
>>
> 
> The most refactoring works have been done by commit 7717f6300 and
> 2e4127b6d with a slight performance gain. At this stage, we're trying
> to introduce the registration API so that extensions can provide their
> callbacks to the core. Some functions required for I/O such as
> CopyGetData() and CopySendEndOfRow() would be exposed but I'm not
> going to add additional indirection function call layers.
I think Andres is talking about any indirection not only
indirection function call. In this case, "cstate->XXX" ->
"cstate->edata->XXX".
It's also mentioned in my e-mail. I'm not sure whether it
has performance impact but it's better that we benchmark to
confirm whether there is any performance impact or not with
the Copy{From,To}ExecutionData approach.
Thanks,
-- 
kou
			
		Hi,
In <CAD21AoAZL2RzPM4RLOJKm_73z5LXq2_VOVF+S+T0tnbjHdWTFA@mail.gmail.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Thu, 17 Jul 2025 13:44:11 -0700,
  Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> > How about adding accessors instead of splitting
>> > Copy{From,To}State to Copy{From,To}ExecutionData? If we use
>> > the accessors approach, we can export only needed
>> > information step by step without breaking ABI.
> 
> Yeah, while it can export required fields without breaking ABI, I'm
> concerned that setter and getter functions could be bloated if we need
> to have them for many fields.
In general, I choose this approach in my projects even when
I need to define many accessors. Because I can hide
implementation details from users. I can change
implementation details without breaking API/ABI.
But PostgreSQL isn't my project. Is there any guideline for
PostgreSQL API(/ABI?) design that we can refer for this
case?
FYI: We need to export at least the following fields:
https://www.postgresql.org/message-id/flat/20250714.173803.865595983884510428.kou%40clear-code.com#78fdbccf89742f856aa2cf95eaf42032
> FROM:
> 
> - attnumlist (*)
> - bytes_processed
> - cur_attname
> - escontext
> - in_functions (*)
> - input_buf
> - input_reached_eof
> - line_buf
> - opts (*)
> - raw_buf
> - raw_buf_index
> - raw_buf_len
> - rel (*)
> - typioparams (*)
> 
> TO:
> 
> - attnumlist (*)
> - fe_msgbuf
> - opts (*)
Here are pros/cons of the Copy{From,To}ExecutionData
approach, right?
Pros:
1. We can hide internal data from extensions
Cons:
1. Built-in format routines need to refer fields via
   Copy{From,To}ExecutionData.
   * This MAY has performance impact. If there is no
     performance impact, this is not a cons.
2. API/ABI compatibility will be broken when we change
   exported fields.
   * I'm not sure whether this is a cons in the PostgreSQL
     design.
Here are pros/cons of the accessors approach:
Pros:
1. We can hide internal data from extensions
2. We can export new fields change field names
   without breaking API/ABI compatibility
3. We don't need to change built-in format routines.
   So we can assume that there is no performance impact.
Cons:
1. We may need to define many accessors
   * I'm not sure whether this is a cons in the PostgreSQL
     design.
>> Another idea: We'll add Copy{From,To}State::opaque
>> eventually. (For example, the v40-0003 patch includes it.)
>>
>> How about using it to hide fields only for built-in formats?
> 
> What is the difference between your idea and splitting CopyToState
> into CopyToState and CopyToExecutionData?
1. We don't need to manage 2 similar data for built-in
   formats and extensions.
   * Build-in formats use CopyToExecutionData and extensions
     use opaque.
2. We can introduce registration API now.
   * We can work on this topic AFTER we introduce
     registration API.
   * e.g.: Add registration API -> Add opaque -> Use opaque
     for internal fields (we will benchmark this
     implementation at this time)
Thanks,
-- 
kou
			
		On Fri, Jul 18, 2025 at 3:05 AM Sutou Kouhei <kou@clear-code.com> wrote:
>
> Hi,
>
> In <CAD21AoAZL2RzPM4RLOJKm_73z5LXq2_VOVF+S+T0tnbjHdWTFA@mail.gmail.com>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Thu, 17 Jul 2025 13:44:11 -0700,
>   Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> >> > How about adding accessors instead of splitting
> >> > Copy{From,To}State to Copy{From,To}ExecutionData? If we use
> >> > the accessors approach, we can export only needed
> >> > information step by step without breaking ABI.
> >
> > Yeah, while it can export required fields without breaking ABI, I'm
> > concerned that setter and getter functions could be bloated if we need
> > to have them for many fields.
>
> In general, I choose this approach in my projects even when
> I need to define many accessors. Because I can hide
> implementation details from users. I can change
> implementation details without breaking API/ABI.
>
> But PostgreSQL isn't my project. Is there any guideline for
> PostgreSQL API(/ABI?) design that we can refer for this
> case?
As far as I know there is no official guideline for PostgreSQL API and
ABI design, but I've never seen structs having more than 10 getter and
setter in PostgreSQL source code.
>
> FYI: We need to export at least the following fields:
>
>
https://www.postgresql.org/message-id/flat/20250714.173803.865595983884510428.kou%40clear-code.com#78fdbccf89742f856aa2cf95eaf42032
>
> > FROM:
> >
> > - attnumlist (*)
> > - bytes_processed
> > - cur_attname
> > - escontext
> > - in_functions (*)
> > - input_buf
> > - input_reached_eof
> > - line_buf
> > - opts (*)
> > - raw_buf
> > - raw_buf_index
> > - raw_buf_len
> > - rel (*)
> > - typioparams (*)
> >
> > TO:
> >
> > - attnumlist (*)
> > - fe_msgbuf
> > - opts (*)
I think we can think of the minimum list of fields that we need to
expose. For instance, fields used for buffered reads for COPY FROM
such as input_buf and raw_buf related fields don't necessarily need to
be exposed as extension can implement it in its own way. We can start
with the supporting simple copy format extensions like that read and
parse the binary data from the data source and fill 'values' and
'nulls' arrays as output. Considering these facts, it might be
sufficient for copy format extensions if they could access 'rel',
'attnumlist', and 'opts' in both COPY FROM and COPY TO (and
CopyFromErrorCallback related fields for COPY FROM).
Apart from this, we might want to reorganize CopyFromStateData fields
and CopyToStateData fields since they have mixed fields of general
purpose fields for COPY operations (e.g., num_defaults, whereClause,
and range_table) and built-in format specific fields (e.g., line_buf
and input_buf). Text and CSV formats are using some fields for parsing
fields with buffered reads so one idea is that we move related fields
to another struct so that both built-in formats (text and CSV) and
external extensions that want to use the buffered reads for text
parsing can use this functionality.
> Here are pros/cons of the Copy{From,To}ExecutionData
> approach, right?
>
> Pros:
> 1. We can hide internal data from extensions
>
> Cons:
> 1. Built-in format routines need to refer fields via
>    Copy{From,To}ExecutionData.
>    * This MAY has performance impact. If there is no
>      performance impact, this is not a cons.
> 2. API/ABI compatibility will be broken when we change
>    exported fields.
>    * I'm not sure whether this is a cons in the PostgreSQL
>      design.
>
> Here are pros/cons of the accessors approach:
>
> Pros:
> 1. We can hide internal data from extensions
> 2. We can export new fields change field names
>    without breaking API/ABI compatibility
> 3. We don't need to change built-in format routines.
>    So we can assume that there is no performance impact.
>
> Cons:
> 1. We may need to define many accessors
>    * I'm not sure whether this is a cons in the PostgreSQL
>      design.
I agree with the summary.
> >> Another idea: We'll add Copy{From,To}State::opaque
> >> eventually. (For example, the v40-0003 patch includes it.)
> >>
> >> How about using it to hide fields only for built-in formats?
> >
> > What is the difference between your idea and splitting CopyToState
> > into CopyToState and CopyToExecutionData?
>
> 1. We don't need to manage 2 similar data for built-in
>    formats and extensions.
>    * Build-in formats use CopyToExecutionData and extensions
>      use opaque.
> 2. We can introduce registration API now.
>    * We can work on this topic AFTER we introduce
>      registration API.
>    * e.g.: Add registration API -> Add opaque -> Use opaque
>      for internal fields (we will benchmark this
>      implementation at this time)
What if we find performance overhead in built-in format cases after
introducing opaque data? I personally would like to avoid merging the
registration API (i.e., supporting custom copy formats) while being
unsure about the overall design ahead and the potential performance
impact by following patches.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
			
		On Mon, Jul 28, 2025 at 12:33 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Fri, Jul 18, 2025 at 3:05 AM Sutou Kouhei <kou@clear-code.com> wrote:
> >
> > Hi,
> >
> > In <CAD21AoAZL2RzPM4RLOJKm_73z5LXq2_VOVF+S+T0tnbjHdWTFA@mail.gmail.com>
> >   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Thu, 17 Jul 2025 13:44:11 -0700,
> >   Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > >> > How about adding accessors instead of splitting
> > >> > Copy{From,To}State to Copy{From,To}ExecutionData? If we use
> > >> > the accessors approach, we can export only needed
> > >> > information step by step without breaking ABI.
> > >
> > > Yeah, while it can export required fields without breaking ABI, I'm
> > > concerned that setter and getter functions could be bloated if we need
> > > to have them for many fields.
> >
> > In general, I choose this approach in my projects even when
> > I need to define many accessors. Because I can hide
> > implementation details from users. I can change
> > implementation details without breaking API/ABI.
> >
> > But PostgreSQL isn't my project. Is there any guideline for
> > PostgreSQL API(/ABI?) design that we can refer for this
> > case?
>
> As far as I know there is no official guideline for PostgreSQL API and
> ABI design, but I've never seen structs having more than 10 getter and
> setter in PostgreSQL source code.
>
> >
> > FYI: We need to export at least the following fields:
> >
> >
https://www.postgresql.org/message-id/flat/20250714.173803.865595983884510428.kou%40clear-code.com#78fdbccf89742f856aa2cf95eaf42032
> >
> > > FROM:
> > >
> > > - attnumlist (*)
> > > - bytes_processed
> > > - cur_attname
> > > - escontext
> > > - in_functions (*)
> > > - input_buf
> > > - input_reached_eof
> > > - line_buf
> > > - opts (*)
> > > - raw_buf
> > > - raw_buf_index
> > > - raw_buf_len
> > > - rel (*)
> > > - typioparams (*)
> > >
> > > TO:
> > >
> > > - attnumlist (*)
> > > - fe_msgbuf
> > > - opts (*)
>
> I think we can think of the minimum list of fields that we need to
> expose. For instance, fields used for buffered reads for COPY FROM
> such as input_buf and raw_buf related fields don't necessarily need to
> be exposed as extension can implement it in its own way. We can start
> with the supporting simple copy format extensions like that read and
> parse the binary data from the data source and fill 'values' and
> 'nulls' arrays as output. Considering these facts, it might be
> sufficient for copy format extensions if they could access 'rel',
> 'attnumlist', and 'opts' in both COPY FROM and COPY TO (and
> CopyFromErrorCallback related fields for COPY FROM).
>
> Apart from this, we might want to reorganize CopyFromStateData fields
> and CopyToStateData fields since they have mixed fields of general
> purpose fields for COPY operations (e.g., num_defaults, whereClause,
> and range_table) and built-in format specific fields (e.g., line_buf
> and input_buf). Text and CSV formats are using some fields for parsing
> fields with buffered reads so one idea is that we move related fields
> to another struct so that both built-in formats (text and CSV) and
> external extensions that want to use the buffered reads for text
> parsing can use this functionality.
So probably it might be worth refactoring the codes in terms of:
1. hiding internal data from format callbacks
2. separating format-specific fields from the main state data.
I categorized the fields in CopyFromStateData. I think there are
roughly three different kind of fields mixed there:
1. fields used only the core (not by format callback)
- filename
- is_program
- whereClause
- cur_relname
- copycontext
- defmap
- num_defaults
- volatile_defexprs
- range_table
- rtrperminfos
- qualexpr
- transition_capture
2. fields used by both the core and format callbacks
- rel
- attnumlist
- cur_lineno
- cur_attname
- cur_attval
- relname_only
- num_errors
- opts
- in_functions
- typioparams
- escontext
- defexprs
- Input-related fields
    - copy_src
    - copy_file
    - fe_msgbuf
    - data_source_cb
    - byteprocessed
3. built-in format specific fields (mostly for text and csv)
- eol_type
- defaults
- Encoding related fields
    - file_encoding
    - need_transcoding
    - conversion_proc
- convert_select_flags
- raw data pointers
    - max_fields
    - raw_fields
- attribute_buf
- line_buf related fields
    - line_buf
    - line_buf_valid
- input_buf related fields
    - input_buf
    - input_buf_index
    - input_buf_len
    - input_reached_eof
    - input_reached_error
- raw_buf related fields
    - raw_buf
    - raw_buf_index
    - raw_buf_len
    - raw_reached_eof
The fields in 1 are mostly static fields, and the fields in 2 and 3
are likely to be accessed in hot functions during COPY FROM. Would it
be a good idea to restructure these fields so that we can hide the
fields in 1 from callback functions and having the fields in 3 in a
separate format-specific struct that can be accessed via an opaque
pointer? But could the latter change potentially cause performance
overheads?
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
			
		Hi,
In <CAD21AoBa0Wm3C2H12jaqkvLidP2zEhsC+gf=3w7XiA4LQnvx0g@mail.gmail.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 28 Jul 2025 22:19:36 -0700,
  Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> The fields in 1 are mostly static fields, and the fields in 2 and 3
> are likely to be accessed in hot functions during COPY FROM. Would it
> be a good idea to restructure these fields so that we can hide the
> fields in 1 from callback functions and having the fields in 3 in a
> separate format-specific struct that can be accessed via an opaque
> pointer? But could the latter change potentially cause performance
> overheads?
Yes. It changes memory layout (1 continuous memory chunk ->
2 separated memory chunks) and introduces indirect member
accesses (x->y -> x->z->y). They may not have performance
impact but we need to measure it if we want to use this
approach.
BTW, how about the following approach?
copyapi.h:
typedef struct CopyToStateData
{
    /* public members */
    /* ... */
} CopyToStateData;
copyto.c:
typedef struct CopyToStateInternalData
{
    CopyToStateData parent;
    /* private members */
    /* ... */
} CopyToStateInternalData;
We export CopyToStateData only with public members. We don't
export CopyToStateInternalData that has members only for
built-in formats.
CopyToStateInternalData has the same memory layout as
CopyToStateData. So we can use CopyToStateInternalData as
CopyToStateData.
We use CopyToStateData not CopyToStateInternalData in public
API. We cast CopyToStateData to CopyToStateInternalData when
we need to use private members:
static void
CopySendData(CopyToState cstate, const void *databuf, int datasize)
{
    CopyToStateInternal cstate_internal = (CopyToStateInternal) cstate;
    appendBinaryStringInfo(cstate_internal->fe_msgbuf, databuf, datasize);
}
It's still direct member access.
With this approach, we can keep the same memory layout (1
continuous memory chunk) and direct member access. I think
that this approach doesn't have performance impact.
See the attached patch for PoC of this approach.
Drawback: This approach always allocates
CopyToStateInternalData not CopyToStateData. So we need to
allocate needless memories for extensions. But this will
prevent performance regression of built-in formats. Is it
acceptable?
Thanks,
-- 
kou
From 4f1ee841677774cdc36091066020674d300714db Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Thu, 14 Aug 2025 15:21:50 +0900
Subject: [PATCH] Split CopyToStateData to CopyToStateData and
 CopyToStateInternalData
---
 src/backend/commands/copyto.c  | 133 +++++++++++++++++----------------
 src/include/commands/copyapi.h |  21 ++++++
 2 files changed, 91 insertions(+), 63 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 67b94b91cae..8c58cb18a4d 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -21,7 +21,6 @@
 #include "access/tableam.h"
 #include "commands/copyapi.h"
 #include "commands/progress.h"
-#include "executor/execdesc.h"
 #include "executor/executor.h"
 #include "executor/tuptable.h"
 #include "libpq/libpq.h"
@@ -62,40 +61,27 @@ typedef enum CopyDest
  * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true
  * when we have to do it the hard way.
  */
-typedef struct CopyToStateData
+typedef struct CopyToStateInternalData
 {
-    /* format-specific routines */
-    const CopyToRoutine *routine;
+    struct CopyToStateData parent;
 
     /* low-level state data */
     CopyDest    copy_dest;        /* type of copy source/destination */
     FILE       *copy_file;        /* used if copy_dest == COPY_FILE */
     StringInfo    fe_msgbuf;        /* used for all dests during COPY TO */
 
-    int            file_encoding;    /* file or remote side's character encoding */
     bool        need_transcoding;    /* file encoding diff from server? */
     bool        encoding_embeds_ascii;    /* ASCII can be non-first byte? */
 
-    /* parameters from the COPY command */
-    Relation    rel;            /* relation to copy to */
-    QueryDesc  *queryDesc;        /* executable query to copy from */
-    List       *attnumlist;        /* integer list of attnums to copy */
-    char       *filename;        /* filename, or NULL for STDOUT */
-    bool        is_program;        /* is 'filename' a program to popen? */
-    copy_data_dest_cb data_dest_cb; /* function for writing data */
-
-    CopyFormatOptions opts;
-    Node       *whereClause;    /* WHERE condition (or NULL) */
-
     /*
      * Working state
      */
     MemoryContext copycontext;    /* per-copy execution context */
 
-    FmgrInfo   *out_functions;    /* lookup info for output functions */
     MemoryContext rowcontext;    /* per-row evaluation context */
     uint64        bytes_processed;    /* number of bytes processed so far */
-} CopyToStateData;
+}            CopyToStateInternalData;
+typedef struct CopyToStateInternalData *CopyToStateInternal;
 
 /* DestReceiver for COPY (query) TO */
 typedef struct
@@ -189,11 +175,13 @@ CopyToGetRoutine(const CopyFormatOptions *opts)
 static void
 CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
 {
+    CopyToStateInternal cstate_internal = (CopyToStateInternal) cstate;
+
     /*
      * For non-binary copy, we need to convert null_print to file encoding,
      * because it will be sent directly with CopySendString.
      */
-    if (cstate->need_transcoding)
+    if (cstate_internal->need_transcoding)
         cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print,
                                                           cstate->opts.null_print_len,
                                                           cstate->file_encoding);
@@ -390,6 +378,7 @@ CopyToBinaryEnd(CopyToState cstate)
 static void
 SendCopyBegin(CopyToState cstate)
 {
+    CopyToStateInternal cstate_internal = (CopyToStateInternal) cstate;
     StringInfoData buf;
     int            natts = list_length(cstate->attnumlist);
     int16        format = (cstate->opts.binary ? 1 : 0);
@@ -401,14 +390,14 @@ SendCopyBegin(CopyToState cstate)
     for (i = 0; i < natts; i++)
         pq_sendint16(&buf, format); /* per-column formats */
     pq_endmessage(&buf);
-    cstate->copy_dest = COPY_FRONTEND;
+    cstate_internal->copy_dest = COPY_FRONTEND;
 }
 
 static void
 SendCopyEnd(CopyToState cstate)
 {
     /* Shouldn't have any unsent data */
-    Assert(cstate->fe_msgbuf->len == 0);
+    Assert(((CopyToStateInternal) cstate)->fe_msgbuf->len == 0);
     /* Send Copy Done message */
     pq_putemptymessage(PqMsg_CopyDone);
 }
@@ -426,32 +415,39 @@ SendCopyEnd(CopyToState cstate)
 static void
 CopySendData(CopyToState cstate, const void *databuf, int datasize)
 {
-    appendBinaryStringInfo(cstate->fe_msgbuf, databuf, datasize);
+    CopyToStateInternal cstate_internal = (CopyToStateInternal) cstate;
+
+    appendBinaryStringInfo(cstate_internal->fe_msgbuf, databuf, datasize);
 }
 
 static void
 CopySendString(CopyToState cstate, const char *str)
 {
-    appendBinaryStringInfo(cstate->fe_msgbuf, str, strlen(str));
+    CopyToStateInternal cstate_internal = (CopyToStateInternal) cstate;
+
+    appendBinaryStringInfo(cstate_internal->fe_msgbuf, str, strlen(str));
 }
 
 static void
 CopySendChar(CopyToState cstate, char c)
 {
-    appendStringInfoCharMacro(cstate->fe_msgbuf, c);
+    CopyToStateInternal cstate_internal = (CopyToStateInternal) cstate;
+
+    appendStringInfoCharMacro(cstate_internal->fe_msgbuf, c);
 }
 
 static void
 CopySendEndOfRow(CopyToState cstate)
 {
-    StringInfo    fe_msgbuf = cstate->fe_msgbuf;
+    CopyToStateInternal cstate_internal = (CopyToStateInternal) cstate;
+    StringInfo    fe_msgbuf = cstate_internal->fe_msgbuf;
 
-    switch (cstate->copy_dest)
+    switch (cstate_internal->copy_dest)
     {
         case COPY_FILE:
             if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1,
-                       cstate->copy_file) != 1 ||
-                ferror(cstate->copy_file))
+                       cstate_internal->copy_file) != 1 ||
+                ferror(cstate_internal->copy_file))
             {
                 if (cstate->is_program)
                 {
@@ -492,8 +488,8 @@ CopySendEndOfRow(CopyToState cstate)
     }
 
     /* Update the progress */
-    cstate->bytes_processed += fe_msgbuf->len;
-    pgstat_progress_update_param(PROGRESS_COPY_BYTES_PROCESSED, cstate->bytes_processed);
+    cstate_internal->bytes_processed += fe_msgbuf->len;
+    pgstat_progress_update_param(PROGRESS_COPY_BYTES_PROCESSED, cstate_internal->bytes_processed);
 
     resetStringInfo(fe_msgbuf);
 }
@@ -505,7 +501,9 @@ CopySendEndOfRow(CopyToState cstate)
 static inline void
 CopySendTextLikeEndOfRow(CopyToState cstate)
 {
-    switch (cstate->copy_dest)
+    CopyToStateInternal cstate_internal = (CopyToStateInternal) cstate;
+
+    switch (cstate_internal->copy_dest)
     {
         case COPY_FILE:
             /* Default line termination depends on platform */
@@ -561,11 +559,12 @@ CopySendInt16(CopyToState cstate, int16 val)
 static void
 ClosePipeToProgram(CopyToState cstate)
 {
+    CopyToStateInternal cstate_internal = (CopyToStateInternal) cstate;
     int            pclose_rc;
 
     Assert(cstate->is_program);
 
-    pclose_rc = ClosePipeStream(cstate->copy_file);
+    pclose_rc = ClosePipeStream(cstate_internal->copy_file);
     if (pclose_rc == -1)
         ereport(ERROR,
                 (errcode_for_file_access(),
@@ -586,13 +585,15 @@ ClosePipeToProgram(CopyToState cstate)
 static void
 EndCopy(CopyToState cstate)
 {
+    CopyToStateInternal cstate_internal = (CopyToStateInternal) cstate;
+
     if (cstate->is_program)
     {
         ClosePipeToProgram(cstate);
     }
     else
     {
-        if (cstate->filename != NULL && FreeFile(cstate->copy_file))
+        if (cstate->filename != NULL && FreeFile(cstate_internal->copy_file))
             ereport(ERROR,
                     (errcode_for_file_access(),
                      errmsg("could not close file \"%s\": %m",
@@ -601,7 +602,7 @@ EndCopy(CopyToState cstate)
 
     pgstat_progress_end_command();
 
-    MemoryContextDelete(cstate->copycontext);
+    MemoryContextDelete(cstate_internal->copycontext);
     pfree(cstate);
 }
 
@@ -631,6 +632,7 @@ BeginCopyTo(ParseState *pstate,
             List *options)
 {
     CopyToState cstate;
+    CopyToStateInternal cstate_internal;
     bool        pipe = (filename == NULL && data_dest_cb == NULL);
     TupleDesc    tupDesc;
     int            num_phys_attrs;
@@ -687,17 +689,18 @@ BeginCopyTo(ParseState *pstate,
 
 
     /* Allocate workspace and zero all fields */
-    cstate = (CopyToStateData *) palloc0(sizeof(CopyToStateData));
+    cstate_internal = (CopyToStateInternal) palloc0(sizeof(CopyToStateInternalData));
+    cstate = (CopyToState) cstate_internal;
 
     /*
      * We allocate everything used by a cstate in a new memory context. This
      * avoids memory leaks during repeated use of COPY in a query.
      */
-    cstate->copycontext = AllocSetContextCreate(CurrentMemoryContext,
-                                                "COPY",
-                                                ALLOCSET_DEFAULT_SIZES);
+    cstate_internal->copycontext = AllocSetContextCreate(CurrentMemoryContext,
+                                                         "COPY",
+                                                         ALLOCSET_DEFAULT_SIZES);
 
-    oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+    oldcontext = MemoryContextSwitchTo(cstate_internal->copycontext);
 
     /* Extract options from the statement node tree */
     ProcessCopyOptions(pstate, &cstate->opts, false /* is_from */ , options);
@@ -895,19 +898,19 @@ BeginCopyTo(ParseState *pstate,
      */
     if (cstate->file_encoding == GetDatabaseEncoding() ||
         cstate->file_encoding == PG_SQL_ASCII)
-        cstate->need_transcoding = false;
+        cstate_internal->need_transcoding = false;
     else
-        cstate->need_transcoding = true;
+        cstate_internal->need_transcoding = true;
 
     /* See Multibyte encoding comment above */
-    cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding);
+    cstate_internal->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding);
 
-    cstate->copy_dest = COPY_FILE;    /* default */
+    cstate_internal->copy_dest = COPY_FILE; /* default */
 
     if (data_dest_cb)
     {
         progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK;
-        cstate->copy_dest = COPY_CALLBACK;
+        cstate_internal->copy_dest = COPY_CALLBACK;
         cstate->data_dest_cb = data_dest_cb;
     }
     else if (pipe)
@@ -916,7 +919,7 @@ BeginCopyTo(ParseState *pstate,
 
         Assert(!is_program);    /* the grammar does not allow this */
         if (whereToSendOutput != DestRemote)
-            cstate->copy_file = stdout;
+            cstate_internal->copy_file = stdout;
     }
     else
     {
@@ -926,8 +929,8 @@ BeginCopyTo(ParseState *pstate,
         if (is_program)
         {
             progress_vals[1] = PROGRESS_COPY_TYPE_PROGRAM;
-            cstate->copy_file = OpenPipeStream(cstate->filename, PG_BINARY_W);
-            if (cstate->copy_file == NULL)
+            cstate_internal->copy_file = OpenPipeStream(cstate->filename, PG_BINARY_W);
+            if (cstate_internal->copy_file == NULL)
                 ereport(ERROR,
                         (errcode_for_file_access(),
                          errmsg("could not execute command \"%s\": %m",
@@ -952,14 +955,14 @@ BeginCopyTo(ParseState *pstate,
             oumask = umask(S_IWGRP | S_IWOTH);
             PG_TRY();
             {
-                cstate->copy_file = AllocateFile(cstate->filename, PG_BINARY_W);
+                cstate_internal->copy_file = AllocateFile(cstate->filename, PG_BINARY_W);
             }
             PG_FINALLY();
             {
                 umask(oumask);
             }
             PG_END_TRY();
-            if (cstate->copy_file == NULL)
+            if (cstate_internal->copy_file == NULL)
             {
                 /* copy errno because ereport subfunctions might change it */
                 int            save_errno = errno;
@@ -973,7 +976,7 @@ BeginCopyTo(ParseState *pstate,
                                  "You may want a client-side facility such as psql's \\copy.") : 0));
             }
 
-            if (fstat(fileno(cstate->copy_file), &st))
+            if (fstat(fileno(cstate_internal->copy_file), &st))
                 ereport(ERROR,
                         (errcode_for_file_access(),
                          errmsg("could not stat file \"%s\": %m",
@@ -991,7 +994,7 @@ BeginCopyTo(ParseState *pstate,
                                   cstate->rel ? RelationGetRelid(cstate->rel) : InvalidOid);
     pgstat_progress_update_multi_param(2, progress_cols, progress_vals);
 
-    cstate->bytes_processed = 0;
+    cstate_internal->bytes_processed = 0;
 
     MemoryContextSwitchTo(oldcontext);
 
@@ -1025,6 +1028,7 @@ EndCopyTo(CopyToState cstate)
 uint64
 DoCopyTo(CopyToState cstate)
 {
+    CopyToStateInternal cstate_internal = (CopyToStateInternal) cstate;
     bool        pipe = (cstate->filename == NULL && cstate->data_dest_cb == NULL);
     bool        fe_copy = (pipe && whereToSendOutput == DestRemote);
     TupleDesc    tupDesc;
@@ -1043,7 +1047,7 @@ DoCopyTo(CopyToState cstate)
     cstate->opts.null_print_client = cstate->opts.null_print;    /* default */
 
     /* We use fe_msgbuf as a per-row buffer regardless of copy_dest */
-    cstate->fe_msgbuf = makeStringInfo();
+    cstate_internal->fe_msgbuf = makeStringInfo();
 
     /* Get info about the columns we need to process. */
     cstate->out_functions = (FmgrInfo *) palloc(num_phys_attrs * sizeof(FmgrInfo));
@@ -1062,9 +1066,9 @@ DoCopyTo(CopyToState cstate)
      * datatype output routines, and should be faster than retail pfree's
      * anyway.  (We don't need a whole econtext as CopyFrom does.)
      */
-    cstate->rowcontext = AllocSetContextCreate(CurrentMemoryContext,
-                                               "COPY TO",
-                                               ALLOCSET_DEFAULT_SIZES);
+    cstate_internal->rowcontext = AllocSetContextCreate(CurrentMemoryContext,
+                                                        "COPY TO",
+                                                        ALLOCSET_DEFAULT_SIZES);
 
     cstate->routine->CopyToStart(cstate, tupDesc);
 
@@ -1107,7 +1111,7 @@ DoCopyTo(CopyToState cstate)
 
     cstate->routine->CopyToEnd(cstate);
 
-    MemoryContextDelete(cstate->rowcontext);
+    MemoryContextDelete(cstate_internal->rowcontext);
 
     if (fe_copy)
         SendCopyEnd(cstate);
@@ -1121,10 +1125,11 @@ DoCopyTo(CopyToState cstate)
 static inline void
 CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 {
+    CopyToStateInternal cstate_internal = (CopyToStateInternal) cstate;
     MemoryContext oldcontext;
 
-    MemoryContextReset(cstate->rowcontext);
-    oldcontext = MemoryContextSwitchTo(cstate->rowcontext);
+    MemoryContextReset(cstate_internal->rowcontext);
+    oldcontext = MemoryContextSwitchTo(cstate_internal->rowcontext);
 
     /* Make sure the tuple is fully deconstructed */
     slot_getallattrs(slot);
@@ -1146,12 +1151,13 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 static void
 CopyAttributeOutText(CopyToState cstate, const char *string)
 {
+    CopyToStateInternal cstate_internal = (CopyToStateInternal) cstate;
     const char *ptr;
     const char *start;
     char        c;
     char        delimc = cstate->opts.delim[0];
 
-    if (cstate->need_transcoding)
+    if (cstate_internal->need_transcoding)
         ptr = pg_server_to_any(string, strlen(string), cstate->file_encoding);
     else
         ptr = string;
@@ -1170,7 +1176,7 @@ CopyAttributeOutText(CopyToState cstate, const char *string)
      * it's worth making two copies of it to get the IS_HIGHBIT_SET() test out
      * of the normal safe-encoding path.
      */
-    if (cstate->encoding_embeds_ascii)
+    if (cstate_internal->encoding_embeds_ascii)
     {
         start = ptr;
         while ((c = *ptr) != '\0')
@@ -1300,6 +1306,7 @@ static void
 CopyAttributeOutCSV(CopyToState cstate, const char *string,
                     bool use_quote)
 {
+    CopyToStateInternal cstate_internal = (CopyToStateInternal) cstate;
     const char *ptr;
     const char *start;
     char        c;
@@ -1312,7 +1319,7 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string,
     if (!use_quote && strcmp(string, cstate->opts.null_print) == 0)
         use_quote = true;
 
-    if (cstate->need_transcoding)
+    if (cstate_internal->need_transcoding)
         ptr = pg_server_to_any(string, strlen(string), cstate->file_encoding);
     else
         ptr = string;
@@ -1342,7 +1349,7 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string,
                     use_quote = true;
                     break;
                 }
-                if (IS_HIGHBIT_SET(c) && cstate->encoding_embeds_ascii)
+                if (IS_HIGHBIT_SET(c) && cstate_internal->encoding_embeds_ascii)
                     tptr += pg_encoding_mblen(cstate->file_encoding, tptr);
                 else
                     tptr++;
@@ -1366,7 +1373,7 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string,
                 CopySendChar(cstate, escapec);
                 start = ptr;    /* we include char in next run */
             }
-            if (IS_HIGHBIT_SET(c) && cstate->encoding_embeds_ascii)
+            if (IS_HIGHBIT_SET(c) && cstate_internal->encoding_embeds_ascii)
                 ptr += pg_encoding_mblen(cstate->file_encoding, ptr);
             else
                 ptr++;
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 2a2d2f9876b..b818e13cd1b 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -15,6 +15,7 @@
 #define COPYAPI_H
 
 #include "commands/copy.h"
+#include "executor/execdesc.h"
 
 /*
  * API structure for a COPY TO format implementation. Note this must be
@@ -54,6 +55,26 @@ typedef struct CopyToRoutine
     void        (*CopyToEnd) (CopyToState cstate);
 } CopyToRoutine;
 
+typedef struct CopyToStateData
+{
+    /* format-specific routines */
+    const CopyToRoutine *routine;
+
+    /* parameters from the COPY command */
+    Relation    rel;            /* relation to copy to */
+    QueryDesc  *queryDesc;        /* executable query to copy from */
+    List       *attnumlist;        /* integer list of attnums to copy */
+    char       *filename;        /* filename, or NULL for STDOUT */
+    bool        is_program;        /* is 'filename' a program to popen? */
+    copy_data_dest_cb data_dest_cb; /* function for writing data */
+
+    int            file_encoding;    /* file or remote side's character encoding */
+
+    CopyFormatOptions opts;
+
+    FmgrInfo   *out_functions;    /* lookup info for output functions */
+} CopyToStateData;
+
 /*
  * API structure for a COPY FROM format implementation. Note this must be
  * allocated in a server-lifetime manner, typically as a static const struct.
-- 
2.50.0
			
		On Wed, Aug 13, 2025 at 11:37 PM Sutou Kouhei <kou@clear-code.com> wrote:
>
> Hi,
>
> In <CAD21AoBa0Wm3C2H12jaqkvLidP2zEhsC+gf=3w7XiA4LQnvx0g@mail.gmail.com>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 28 Jul 2025 22:19:36 -0700,
>   Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> > The fields in 1 are mostly static fields, and the fields in 2 and 3
> > are likely to be accessed in hot functions during COPY FROM. Would it
> > be a good idea to restructure these fields so that we can hide the
> > fields in 1 from callback functions and having the fields in 3 in a
> > separate format-specific struct that can be accessed via an opaque
> > pointer? But could the latter change potentially cause performance
> > overheads?
>
> Yes. It changes memory layout (1 continuous memory chunk ->
> 2 separated memory chunks) and introduces indirect member
> accesses (x->y -> x->z->y).
I think fields accessed by the hot functions are limited. If we
assemble fields required by hot functions into one struct and pass it
to them can we deal with the latter? For example, we assemble the
fields in 3 I mentioned above (i.e., built-in format specific fields)
into say CopyFromStateBuiltin and pass it to CopyReadLine() function
and the following functions, instead of CopyFromState. Since there are
some places where we need to access to CopyFromState (e.g.,
CopyGetData()), CopyFromStateBuiltin needs to have a pointer to
CopyFromState as well.
>  They may not have performance
> impact but we need to measure it if we want to use this
> approach.
Agreed.
> BTW, how about the following approach?
>
> copyapi.h:
>
> typedef struct CopyToStateData
> {
>         /* public members */
>         /* ... */
> } CopyToStateData;
>
> copyto.c:
>
> typedef struct CopyToStateInternalData
> {
>         CopyToStateData parent;
>
>         /* private members */
>         /* ... */
> } CopyToStateInternalData;
>
> We export CopyToStateData only with public members. We don't
> export CopyToStateInternalData that has members only for
> built-in formats.
>
> CopyToStateInternalData has the same memory layout as
> CopyToStateData. So we can use CopyToStateInternalData as
> CopyToStateData.
>
> We use CopyToStateData not CopyToStateInternalData in public
> API. We cast CopyToStateData to CopyToStateInternalData when
> we need to use private members:
>
> static void
> CopySendData(CopyToState cstate, const void *databuf, int datasize)
> {
>         CopyToStateInternal cstate_internal = (CopyToStateInternal) cstate;
>         appendBinaryStringInfo(cstate_internal->fe_msgbuf, databuf, datasize);
> }
>
> It's still direct member access.
>
>
> With this approach, we can keep the same memory layout (1
> continuous memory chunk) and direct member access. I think
> that this approach doesn't have performance impact.
>
> See the attached patch for PoC of this approach.
>
> Drawback: This approach always allocates
> CopyToStateInternalData not CopyToStateData. So we need to
> allocate needless memories for extensions. But this will
> prevent performance regression of built-in formats. Is it
> acceptable?
While this approach could make sense to avoid potential performance
overheads for built-in format, I find it's somewhat odd that
extensions cannot allocate memory for its working state having
CopyToStateData as the base type.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
			
		Hi,
In <CAD21AoCCjKA77xkUxx59qJ8an_G_58Mry_EtCEcFgd=g9N2xew@mail.gmail.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 8 Sep 2025 14:08:16 -0700,
  Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> > The fields in 1 are mostly static fields, and the fields in 2 and 3
>> > are likely to be accessed in hot functions during COPY FROM. Would it
>> > be a good idea to restructure these fields so that we can hide the
>> > fields in 1 from callback functions and having the fields in 3 in a
>> > separate format-specific struct that can be accessed via an opaque
>> > pointer? But could the latter change potentially cause performance
>> > overheads?
>>
>> Yes. It changes memory layout (1 continuous memory chunk ->
>> 2 separated memory chunks) and introduces indirect member
>> accesses (x->y -> x->z->y).
> 
> I think fields accessed by the hot functions are limited. If we
> assemble fields required by hot functions into one struct and pass it
> to them can we deal with the latter? For example, we assemble the
> fields in 3 I mentioned above (i.e., built-in format specific fields)
> into say CopyFromStateBuiltin and pass it to CopyReadLine() function
> and the following functions, instead of CopyFromState. Since there are
> some places where we need to access to CopyFromState (e.g.,
> CopyGetData()), CopyFromStateBuiltin needs to have a pointer to
> CopyFromState as well.
It can change indirect member accesses (built-in format
specific members can be direct access but other members in
CopyFromState are indirect access) but it doesn't change 2
separated memory chunks.
If this approach has performance impact and it's caused by
indirect member accesses for built-in format specific
members, your suggestion will work. If performance impact is
caused by another reason, your suggestion may not work.
Anyway, we need to measure performance to proceed with this
approach. If we can confirm that this approach doesn't have
any performance impact, we can use the original your idea.
Do you have any idea how to measure performance of this
approach?
We did it when we introduce Copy{To,From}Routine. But it was
difficult to evaluate the results:
* I don't have machines for stable benchmark results
  * We may not be able to use them for the final decision
* Most results showed performance improvement but
  there was a result showed mysterious result[1]
[1] https://www.postgresql.org/message-id/flat/CAEG8a3LUBcvjwqgt6AijJmg67YN_b_NZ4Kzoxc_dH4rpAq0pKg%40mail.gmail.com
>> Drawback: This approach always allocates
>> CopyToStateInternalData not CopyToStateData. So we need to
>> allocate needless memories for extensions. But this will
>> prevent performance regression of built-in formats. Is it
>> acceptable?
> 
> While this approach could make sense to avoid potential performance
> overheads for built-in format, I find it's somewhat odd that
> extensions cannot allocate memory for its working state having
> CopyToStateData as the base type.
Is it important? We'll provide a opaque member for
extensions. Extensions should use the opaque member instead
of extending Copy{From,To}StateData.
I don't object your approach but we need a good way to
measure performance. If we use this approach, we can omit it
for now and we can revisit your approach later without
breaking compatibility. How about using this approach if we
can't find a good way to measure performance?
Thanks,
-- 
kou
			
		On Mon, Sep 8, 2025 at 7:50 PM Sutou Kouhei <kou@clear-code.com> wrote:
>
> Hi,
>
> In <CAD21AoCCjKA77xkUxx59qJ8an_G_58Mry_EtCEcFgd=g9N2xew@mail.gmail.com>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 8 Sep 2025 14:08:16 -0700,
>   Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> >> > The fields in 1 are mostly static fields, and the fields in 2 and 3
> >> > are likely to be accessed in hot functions during COPY FROM. Would it
> >> > be a good idea to restructure these fields so that we can hide the
> >> > fields in 1 from callback functions and having the fields in 3 in a
> >> > separate format-specific struct that can be accessed via an opaque
> >> > pointer? But could the latter change potentially cause performance
> >> > overheads?
> >>
> >> Yes. It changes memory layout (1 continuous memory chunk ->
> >> 2 separated memory chunks) and introduces indirect member
> >> accesses (x->y -> x->z->y).
> >
> > I think fields accessed by the hot functions are limited. If we
> > assemble fields required by hot functions into one struct and pass it
> > to them can we deal with the latter? For example, we assemble the
> > fields in 3 I mentioned above (i.e., built-in format specific fields)
> > into say CopyFromStateBuiltin and pass it to CopyReadLine() function
> > and the following functions, instead of CopyFromState. Since there are
> > some places where we need to access to CopyFromState (e.g.,
> > CopyGetData()), CopyFromStateBuiltin needs to have a pointer to
> > CopyFromState as well.
>
> It can change indirect member accesses (built-in format
> specific members can be direct access but other members in
> CopyFromState are indirect access) but it doesn't change 2
> separated memory chunks.
Right. IIUC the latter point is related to cache locality. But I'm not
sure how much the latter point affects the performance as currently we
don't declare fields to CopyFromState while carefully considering the
cache locality even today. Separating a single memory into multiple
chunks could even have a positive effect on it.
>
> If this approach has performance impact and it's caused by
> indirect member accesses for built-in format specific
> members, your suggestion will work. If performance impact is
> caused by another reason, your suggestion may not work.
>
> Anyway, we need to measure performance to proceed with this
> approach. If we can confirm that this approach doesn't have
> any performance impact, we can use the original your idea.
>
> Do you have any idea how to measure performance of this
> approach?
I think we can start with measuring the entire COPY execution time
with several scenarios as we did previously. For example, reading a
huge file with a single column value and with many columns etc.
>
> We did it when we introduce Copy{To,From}Routine. But it was
> difficult to evaluate the results:
>
> * I don't have machines for stable benchmark results
>   * We may not be able to use them for the final decision
> * Most results showed performance improvement but
>   there was a result showed mysterious result[1]
Perhaps measuring cache-misses help to see how changes could affect
the performance?
>
> [1] https://www.postgresql.org/message-id/flat/CAEG8a3LUBcvjwqgt6AijJmg67YN_b_NZ4Kzoxc_dH4rpAq0pKg%40mail.gmail.com
>
> >> Drawback: This approach always allocates
> >> CopyToStateInternalData not CopyToStateData. So we need to
> >> allocate needless memories for extensions. But this will
> >> prevent performance regression of built-in formats. Is it
> >> acceptable?
> >
> > While this approach could make sense to avoid potential performance
> > overheads for built-in format, I find it's somewhat odd that
> > extensions cannot allocate memory for its working state having
> > CopyToStateData as the base type.
>
> Is it important? We'll provide a opaque member for
> extensions. Extensions should use the opaque member instead
> of extending Copy{From,To}StateData.
I think yes, because it could be a blocker for future improvements
that might require a large field to CopyFrom/ToStateData.
> I don't object your approach but we need a good way to
> measure performance. If we use this approach, we can omit it
> for now and we can revisit your approach later without
> breaking compatibility. How about using this approach if we
> can't find a good way to measure performance?
I think it would be better to hear more opinions about this idea and
then make a decision, rather than basing our decision on whether or
not we can measure its performance, so we can be more confident in the
idea we have chosen. While this idea has the above downside, it could
make sense because we always allocate the entire CopyFrom/ToStateData
even today in spite of some fields being not used at all in binary
format and it requires less implementation costs to hide the
for-core-only fields. On the other hand, another possible idea is that
we have three different structs for categories 1 (core-only), 2 (core
and extensions), and 3 (extension-only), and expose only 2 that has a
void pointer to 3. The core can allocate the memory for 1 that embeds
2 at the beginning of the fields. While this design looks cleaner and
we can minimize overheads due to indirect references, it would require
more implementation costs. Which method we choose, I think we need
performance measurements in several scenarios to check if performance
regressions don't happen unexpectedly.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
			
		Hi, In <CAD21AoCidyfKcpf9-f2Np8kWgkM09c4TjnS1h1hcO_-CCbjeqw@mail.gmail.com> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Tue, 9 Sep 2025 13:15:43 -0700, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> I don't object your approach but we need a good way to >> measure performance. If we use this approach, we can omit it >> for now and we can revisit your approach later without >> breaking compatibility. How about using this approach if we >> can't find a good way to measure performance? > > I think it would be better to hear more opinions about this idea and > then make a decision, rather than basing our decision on whether or > not we can measure its performance, so we can be more confident in the > idea we have chosen. While this idea has the above downside, it could > make sense because we always allocate the entire CopyFrom/ToStateData > even today in spite of some fields being not used at all in binary > format and it requires less implementation costs to hide the > for-core-only fields. On the other hand, another possible idea is that > we have three different structs for categories 1 (core-only), 2 (core > and extensions), and 3 (extension-only), and expose only 2 that has a > void pointer to 3. The core can allocate the memory for 1 that embeds > 2 at the beginning of the fields. While this design looks cleaner and > we can minimize overheads due to indirect references, it would require > more implementation costs. Which method we choose, I think we need > performance measurements in several scenarios to check if performance > regressions don't happen unexpectedly. OK. So the next step is collecting more opinions, right? Could you add key people in this area to Cc to hear their opinions? I'm not familiar with key people in the PostgreSQL community... Thanks, -- kou
On Tue, Sep 9, 2025 at 7:41 PM Sutou Kouhei <kou@clear-code.com> wrote: > > Hi, > > In <CAD21AoCidyfKcpf9-f2Np8kWgkM09c4TjnS1h1hcO_-CCbjeqw@mail.gmail.com> > "Re: Make COPY format extendable: Extract COPY TO format implementations" on Tue, 9 Sep 2025 13:15:43 -0700, > Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > >> I don't object your approach but we need a good way to > >> measure performance. If we use this approach, we can omit it > >> for now and we can revisit your approach later without > >> breaking compatibility. How about using this approach if we > >> can't find a good way to measure performance? > > > > I think it would be better to hear more opinions about this idea and > > then make a decision, rather than basing our decision on whether or > > not we can measure its performance, so we can be more confident in the > > idea we have chosen. While this idea has the above downside, it could > > make sense because we always allocate the entire CopyFrom/ToStateData > > even today in spite of some fields being not used at all in binary > > format and it requires less implementation costs to hide the > > for-core-only fields. On the other hand, another possible idea is that > > we have three different structs for categories 1 (core-only), 2 (core > > and extensions), and 3 (extension-only), and expose only 2 that has a > > void pointer to 3. The core can allocate the memory for 1 that embeds > > 2 at the beginning of the fields. While this design looks cleaner and > > we can minimize overheads due to indirect references, it would require > > more implementation costs. Which method we choose, I think we need > > performance measurements in several scenarios to check if performance > > regressions don't happen unexpectedly. > > OK. So the next step is collecting more opinions, right? > > Could you add key people in this area to Cc to hear their > opinions? I'm not familiar with key people in the PostgreSQL > community... How about another idea like we move format-specific data to another struct that embeds CopyFrom/ToStateData at the first field and have CopyFrom/ToStart callback return memory with the size of that struct?It resolves the concerns about adding an extra indirection layer and extensions doesn't need to allocate memory for unnecessary fields (used only for built-in formats). While extensions can access the internal fields I think we can live with that given that there are some similar precedents such as table AM's scan descriptions. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Hi,
In <CAD21AoBb3t7EcsjYT4w68p9OfMNwWTYsbSVaSRY6DRhi7sNRFg@mail.gmail.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Wed, 10 Sep 2025 00:36:38 -0700,
  Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> How about another idea like we move format-specific data to another
> struct that embeds CopyFrom/ToStateData at the first field and have
> CopyFrom/ToStart callback return memory with the size of that
> struct?It resolves the concerns about adding an extra indirection
> layer and extensions doesn't need to allocate memory for unnecessary
> fields (used only for built-in formats). While extensions can access
> the internal fields I think we can live with that given that there are
> some similar precedents such as table AM's scan descriptions.
The another idea looks like the following, right?
struct CopyToStateBuiltInData
{
  struct CopyToStateData parent;
  /* Members for built-in formats */
  ...;
}
typedef CopyToState *(*CopyToStart) (void);
CopyToState
BeginCopyTo(..., CopyToStart copy_to_start)
{
  ...;
  /* Allocate workspace and zero all fields */
  cstate = copy_to_start();
  ...;
}
This idea will almost work. But we can't know which
CopyToStart should be used before we parse "FORMAT" option
of COPY.
If we can iterate options twice in BeginCopy{To,From}(), we
can know it. For example:
BeginCopyTo(...)
{
  ...;
  CopyToStart copy_to_start = NULL;
  foreach(option, options)
  {
    DefElem  *defel = lfirst_node(DefElem, option);
    if (strcmp(defel->defname, "format") == 0)
    {
       char *fmt = defGetString(defel);
       if (strcmp(fmt, "text") == 0 ||
           strcmp(fmt, "csv") == 0 ||
           strcmp(fmt, "binary") == 0) {
         /* Use the builtin cstate */
       } else {
         copy_to_start = /* Detect CopyToStart for custom format */;
       }
    }
  }
  if (copy_to_start)
    cstate = copy_to_start();
  else 
    cstate = (CopyToStateData *) palloc0(sizeof(CopyToStateBuiltInData));
  ...;
}
(It may be better that we add
Copy{To,From}Routine::Copy{To,From}Allocate() instead of
CopyToStart callback.)
I think that this is acceptable because this must be a light
process. This must not have negative performance impact.
Thanks,
-- 
kou
			
		On Wed, Sep 10, 2025 at 10:46 PM Sutou Kouhei <kou@clear-code.com> wrote:
>
> Hi,
>
> In <CAD21AoBb3t7EcsjYT4w68p9OfMNwWTYsbSVaSRY6DRhi7sNRFg@mail.gmail.com>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Wed, 10 Sep 2025 00:36:38 -0700,
>   Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> > How about another idea like we move format-specific data to another
> > struct that embeds CopyFrom/ToStateData at the first field and have
> > CopyFrom/ToStart callback return memory with the size of that
> > struct?It resolves the concerns about adding an extra indirection
> > layer and extensions doesn't need to allocate memory for unnecessary
> > fields (used only for built-in formats). While extensions can access
> > the internal fields I think we can live with that given that there are
> > some similar precedents such as table AM's scan descriptions.
>
> The another idea looks like the following, right?
>
> struct CopyToStateBuiltInData
> {
>   struct CopyToStateData parent;
>
>   /* Members for built-in formats */
>   ...;
> }
>
> typedef CopyToState *(*CopyToStart) (void);
>
> CopyToState
> BeginCopyTo(..., CopyToStart copy_to_start)
> {
>   ...;
>
>   /* Allocate workspace and zero all fields */
>   cstate = copy_to_start();
>   ...;
> }
Right.
> This idea will almost work. But we can't know which
> CopyToStart should be used before we parse "FORMAT" option
> of COPY.
>
> If we can iterate options twice in BeginCopy{To,From}(), we
> can know it. For example:
>
> BeginCopyTo(...)
> {
>   ...;
>
>   CopyToStart copy_to_start = NULL;
>   foreach(option, options)
>   {
>     DefElem  *defel = lfirst_node(DefElem, option);
>
>     if (strcmp(defel->defname, "format") == 0)
>     {
>        char *fmt = defGetString(defel);
>        if (strcmp(fmt, "text") == 0 ||
>            strcmp(fmt, "csv") == 0 ||
>            strcmp(fmt, "binary") == 0) {
>          /* Use the builtin cstate */
>        } else {
>          copy_to_start = /* Detect CopyToStart for custom format */;
>        }
>     }
>   }
>   if (copy_to_start)
>     cstate = copy_to_start();
>   else
>     cstate = (CopyToStateData *) palloc0(sizeof(CopyToStateBuiltInData));
>   ...;
> }
>
> (It may be better that we add
> Copy{To,From}Routine::Copy{To,From}Allocate() instead of
> CopyToStart callback.)
I think we can use a local variable of CopyFormatOptions and memcpy it
to the opts of the returned cstate.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
			
		Hi, In <CAD21AoCfqD=f2ELqPxg62+_QADhHi_kJXCDMhAerBtvxudd-xQ@mail.gmail.com> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Thu, 11 Sep 2025 13:41:26 -0700, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > I think we can use a local variable of CopyFormatOptions and memcpy it > to the opts of the returned cstate. It'll work too. Can we proceed this proposal with this approach? Should we collect more opinions before we proceed? If so, Could you add key people in this area to Cc to hear their opinions? Thanks, -- kou
On Thu, Sep 11, 2025 at 5:07 PM Sutou Kouhei <kou@clear-code.com> wrote: > > Hi, > > In <CAD21AoCfqD=f2ELqPxg62+_QADhHi_kJXCDMhAerBtvxudd-xQ@mail.gmail.com> > "Re: Make COPY format extendable: Extract COPY TO format implementations" on Thu, 11 Sep 2025 13:41:26 -0700, > Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > I think we can use a local variable of CopyFormatOptions and memcpy it > > to the opts of the returned cstate. > > It'll work too. Can we proceed this proposal with this > approach? Should we collect more opinions before we proceed? > If so, Could you add key people in this area to Cc to hear > their opinions? Since we don't have a single decision-maker, we should proceed through consensus-building and careful evaluation of each approach. I see that several senior hackers are already included in this thread, which is excellent. Since you and I, who have been involved in these discussions, agreed with this approach, I believe we can proceed with this direction. If anyone proposes alternative solutions that we find more compelling, we might have to change the approach. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Hi, In <CAD21AoADXWgdizS0mV5w8wdfftDRsm8sUtNW=CzYYS1OhjFD2A@mail.gmail.com> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 15 Sep 2025 10:00:18 -0700, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> > I think we can use a local variable of CopyFormatOptions and memcpy it >> > to the opts of the returned cstate. >> >> It'll work too. Can we proceed this proposal with this >> approach? Should we collect more opinions before we proceed? >> If so, Could you add key people in this area to Cc to hear >> their opinions? > > Since we don't have a single decision-maker, we should proceed through > consensus-building and careful evaluation of each approach. I see that > several senior hackers are already included in this thread, which is > excellent. Since you and I, who have been involved in these > discussions, agreed with this approach, I believe we can proceed with > this direction. If anyone proposes alternative solutions that we find > more compelling, we might have to change the approach. OK. There is no objection for now. How about the attached patch? The patch uses the approach only for CopyToStateData. If this looks good, I can do it for CopyFromStateData too. This patch splits CopyToStateData to * CopyToStateData * CopyToStateInternalData * CopyToStateBuiltinData structs. This is based on the category described in https://www.postgresql.org/message-id/flat/CAD21AoBa0Wm3C2H12jaqkvLidP2zEhsC%2Bgf%3D3w7XiA4LQnvx0g%40mail.gmail.com#85cb988b0bec243d1e8dce699e02e009 : > 1. fields used only the core (not by format callback) > 2. fields used by both the core and format callbacks > 3. built-in format specific fields (mostly for text and csv) CopyToStateInternalData is for 1. CopyToStateData is for 2. CopyToStateBuiltinData is for 3. This patch adds CopyToRoutine::CopyToGetStateSize() that returns size of state struct for the routine. For example, Built-in formats use sizeof(CopyToStateBuiltinData) for it. BeginCopyTo() allocates sizeof(CopyToStateInternalData) + CopyToGetStateSize() size continuous memory and uses the front part as CopyToStateInternalData and the back part as CopyToStateData/CopyToStateBuilinData. Thanks, -- kou From 20539ad10512ef45785c4bd70b93f94eec1125ba Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Fri, 3 Oct 2025 15:23:01 +0900 Subject: [PATCH] Split CopyToStateData to CopyToState{,Internal,Builtin}Data --- src/backend/commands/copyto.c | 282 ++++++++++++++++++--------------- src/include/commands/copyapi.h | 53 +++++++ 2 files changed, 211 insertions(+), 124 deletions(-) diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c index 67b94b91cae..30298c0df0c 100644 --- a/src/backend/commands/copyto.c +++ b/src/backend/commands/copyto.c @@ -37,19 +37,36 @@ #include "utils/snapmgr.h" /* - * Represents the different dest cases we need to worry about at - * the bottom level + * This struct contains the state variables used internally. All COPY TO + * routines including built-in format routines should not use this. */ -typedef enum CopyDest +typedef struct CopyToStateInternalData { - COPY_FILE, /* to file (or a piped program) */ - COPY_FRONTEND, /* to frontend */ - COPY_CALLBACK, /* to callback function */ -} CopyDest; + /* format-specific routines */ + const CopyToRoutine *routine; + + /* parameters from the COPY command */ + QueryDesc *queryDesc; /* executable query to copy from */ + char *filename; /* filename, or NULL for STDOUT */ + bool is_program; /* is 'filename' a program to popen? */ + + Node *whereClause; /* WHERE condition (or NULL) */ + + /* + * Working state + */ + MemoryContext copycontext; /* per-copy execution context */ + MemoryContext rowcontext; /* per-row evaluation context */ +} CopyToStateInternalData; +typedef struct CopyToStateInternalData *CopyToStateInternal; + +#define CopyToStateInternalGetState(cstate_internal) \ + ((CopyToState) (((char *) cstate_internal) + sizeof(CopyToStateInternalData))) +#define CopyToStateGetInternal(cstate) \ + ((CopyToStateInternal) (((char *) cstate) - sizeof(CopyToStateInternalData))) /* - * This struct contains all the state variables used throughout a COPY TO - * operation. + * This struct contains the state variables used by built-in format routines. * * Multi-byte encodings: all supported client-side encodings encode multi-byte * characters by having the first byte's high bit set. Subsequent bytes of the @@ -62,40 +79,16 @@ typedef enum CopyDest * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true * when we have to do it the hard way. */ -typedef struct CopyToStateData +typedef struct CopyToStateBuiltinData { - /* format-specific routines */ - const CopyToRoutine *routine; + CopyToStateData parent; /* low-level state data */ - CopyDest copy_dest; /* type of copy source/destination */ - FILE *copy_file; /* used if copy_dest == COPY_FILE */ - StringInfo fe_msgbuf; /* used for all dests during COPY TO */ - int file_encoding; /* file or remote side's character encoding */ bool need_transcoding; /* file encoding diff from server? */ bool encoding_embeds_ascii; /* ASCII can be non-first byte? */ - - /* parameters from the COPY command */ - Relation rel; /* relation to copy to */ - QueryDesc *queryDesc; /* executable query to copy from */ - List *attnumlist; /* integer list of attnums to copy */ - char *filename; /* filename, or NULL for STDOUT */ - bool is_program; /* is 'filename' a program to popen? */ - copy_data_dest_cb data_dest_cb; /* function for writing data */ - - CopyFormatOptions opts; - Node *whereClause; /* WHERE condition (or NULL) */ - - /* - * Working state - */ - MemoryContext copycontext; /* per-copy execution context */ - - FmgrInfo *out_functions; /* lookup info for output functions */ - MemoryContext rowcontext; /* per-row evaluation context */ - uint64 bytes_processed; /* number of bytes processed so far */ -} CopyToStateData; +} CopyToStateBuiltinData; +typedef struct CopyToStateBuiltinData *CopyToStateBuiltin; /* DestReceiver for COPY (query) TO */ typedef struct @@ -118,6 +111,7 @@ static void CopyAttributeOutCSV(CopyToState cstate, const char *string, bool use_quote); /* built-in format-specific routines */ +static size_t CopyToBuiltinGetStateSize(void); static void CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc); static void CopyToTextLikeOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo); static void CopyToTextOneRow(CopyToState cstate, TupleTableSlot *slot); @@ -150,6 +144,7 @@ static void CopySendInt16(CopyToState cstate, int16 val); /* text format */ static const CopyToRoutine CopyToRoutineText = { + .CopyToGetStateSize = CopyToBuiltinGetStateSize, .CopyToStart = CopyToTextLikeStart, .CopyToOutFunc = CopyToTextLikeOutFunc, .CopyToOneRow = CopyToTextOneRow, @@ -158,6 +153,7 @@ static const CopyToRoutine CopyToRoutineText = { /* CSV format */ static const CopyToRoutine CopyToRoutineCSV = { + .CopyToGetStateSize = CopyToBuiltinGetStateSize, .CopyToStart = CopyToTextLikeStart, .CopyToOutFunc = CopyToTextLikeOutFunc, .CopyToOneRow = CopyToCSVOneRow, @@ -166,6 +162,7 @@ static const CopyToRoutine CopyToRoutineCSV = { /* binary format */ static const CopyToRoutine CopyToRoutineBinary = { + .CopyToGetStateSize = CopyToBuiltinGetStateSize, .CopyToStart = CopyToBinaryStart, .CopyToOutFunc = CopyToBinaryOutFunc, .CopyToOneRow = CopyToBinaryOneRow, @@ -185,18 +182,27 @@ CopyToGetRoutine(const CopyFormatOptions *opts) return &CopyToRoutineText; } +/* Implementation of the allocate callback for all built-in formats */ +static size_t +CopyToBuiltinGetStateSize(void) +{ + return sizeof(CopyToStateBuiltinData); +} + /* Implementation of the start callback for text and CSV formats */ static void CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc) { + CopyToStateBuiltin cstate_builtin = (CopyToStateBuiltin) cstate; + /* * For non-binary copy, we need to convert null_print to file encoding, * because it will be sent directly with CopySendString. */ - if (cstate->need_transcoding) + if (cstate_builtin->need_transcoding) cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print, cstate->opts.null_print_len, - cstate->file_encoding); + cstate_builtin->file_encoding); /* if a header has been requested send the line */ if (cstate->opts.header_line == COPY_HEADER_TRUE) @@ -401,7 +407,7 @@ SendCopyBegin(CopyToState cstate) for (i = 0; i < natts; i++) pq_sendint16(&buf, format); /* per-column formats */ pq_endmessage(&buf); - cstate->copy_dest = COPY_FRONTEND; + cstate->copy_dest = COPY_DEST_FRONTEND; } static void @@ -444,16 +450,17 @@ CopySendChar(CopyToState cstate, char c) static void CopySendEndOfRow(CopyToState cstate) { + CopyToStateInternal cstate_internal = CopyToStateGetInternal(cstate); StringInfo fe_msgbuf = cstate->fe_msgbuf; switch (cstate->copy_dest) { - case COPY_FILE: + case COPY_DEST_FILE: if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1, cstate->copy_file) != 1 || ferror(cstate->copy_file)) { - if (cstate->is_program) + if (cstate_internal->is_program) { if (errno == EPIPE) { @@ -482,11 +489,11 @@ CopySendEndOfRow(CopyToState cstate) errmsg("could not write to COPY file: %m"))); } break; - case COPY_FRONTEND: + case COPY_DEST_FRONTEND: /* Dump the accumulated row as one CopyData message */ (void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len); break; - case COPY_CALLBACK: + case COPY_DEST_CALLBACK: cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len); break; } @@ -507,7 +514,7 @@ CopySendTextLikeEndOfRow(CopyToState cstate) { switch (cstate->copy_dest) { - case COPY_FILE: + case COPY_DEST_FILE: /* Default line termination depends on platform */ #ifndef WIN32 CopySendChar(cstate, '\n'); @@ -515,7 +522,7 @@ CopySendTextLikeEndOfRow(CopyToState cstate) CopySendString(cstate, "\r\n"); #endif break; - case COPY_FRONTEND: + case COPY_DEST_FRONTEND: /* The FE/BE protocol uses \n as newline for all platforms */ CopySendChar(cstate, '\n'); break; @@ -561,9 +568,10 @@ CopySendInt16(CopyToState cstate, int16 val) static void ClosePipeToProgram(CopyToState cstate) { + CopyToStateInternal cstate_internal = CopyToStateGetInternal(cstate); int pclose_rc; - Assert(cstate->is_program); + Assert(cstate_internal->is_program); pclose_rc = ClosePipeStream(cstate->copy_file); if (pclose_rc == -1) @@ -575,7 +583,7 @@ ClosePipeToProgram(CopyToState cstate) ereport(ERROR, (errcode(ERRCODE_EXTERNAL_ROUTINE_EXCEPTION), errmsg("program \"%s\" failed", - cstate->filename), + cstate_internal->filename), errdetail_internal("%s", wait_result_to_str(pclose_rc)))); } } @@ -586,23 +594,25 @@ ClosePipeToProgram(CopyToState cstate) static void EndCopy(CopyToState cstate) { - if (cstate->is_program) + CopyToStateInternal cstate_internal = CopyToStateGetInternal(cstate); + + if (cstate_internal->is_program) { ClosePipeToProgram(cstate); } else { - if (cstate->filename != NULL && FreeFile(cstate->copy_file)) + if (cstate_internal->filename != NULL && FreeFile(cstate->copy_file)) ereport(ERROR, (errcode_for_file_access(), errmsg("could not close file \"%s\": %m", - cstate->filename))); + cstate_internal->filename))); } pgstat_progress_end_command(); - MemoryContextDelete(cstate->copycontext); - pfree(cstate); + MemoryContextDelete(cstate_internal->copycontext); + pfree(CopyToStateGetInternal(cstate)); } /* @@ -630,11 +640,16 @@ BeginCopyTo(ParseState *pstate, List *attnamelist, List *options) { + CopyToStateInternal cstate_internal; CopyToState cstate; + CopyToStateBuiltin cstate_builtin; bool pipe = (filename == NULL && data_dest_cb == NULL); TupleDesc tupDesc; int num_phys_attrs; + MemoryContext copycontext; MemoryContext oldcontext; + CopyFormatOptions opts = {0}; + const CopyToRoutine *routine; const int progress_cols[] = { PROGRESS_COPY_COMMAND, PROGRESS_COPY_TYPE @@ -686,24 +701,34 @@ BeginCopyTo(ParseState *pstate, } - /* Allocate workspace and zero all fields */ - cstate = (CopyToStateData *) palloc0(sizeof(CopyToStateData)); - /* * We allocate everything used by a cstate in a new memory context. This * avoids memory leaks during repeated use of COPY in a query. */ - cstate->copycontext = AllocSetContextCreate(CurrentMemoryContext, - "COPY", - ALLOCSET_DEFAULT_SIZES); + copycontext = AllocSetContextCreate(CurrentMemoryContext, + "COPY", + ALLOCSET_DEFAULT_SIZES); - oldcontext = MemoryContextSwitchTo(cstate->copycontext); + oldcontext = MemoryContextSwitchTo(copycontext); /* Extract options from the statement node tree */ - ProcessCopyOptions(pstate, &cstate->opts, false /* is_from */ , options); + ProcessCopyOptions(pstate, &opts, false /* is_from */ , options); - /* Set format routine */ - cstate->routine = CopyToGetRoutine(&cstate->opts); + /* Get format routine */ + routine = CopyToGetRoutine(&opts); + + /* Allocate workspace and set known values */ + MemoryContextSwitchTo(oldcontext); + cstate_internal = (CopyToStateInternal) palloc0(sizeof(CopyToStateInternal) + routine->CopyToGetStateSize()); + MemoryContextSwitchTo(copycontext); + cstate = CopyToStateInternalGetState(cstate_internal); + if (routine == &CopyToRoutineText || routine == &CopyToRoutineCSV || routine == &CopyToRoutineBinary) + cstate_builtin = (CopyToStateBuiltin) cstate; + else + cstate_builtin = NULL; + cstate_internal->copycontext = copycontext; + cstate->opts = opts; + cstate_internal->routine = routine; /* Process the source/target relation or query */ if (rel) @@ -835,19 +860,19 @@ BeginCopyTo(ParseState *pstate, ((DR_copy *) dest)->cstate = cstate; /* Create a QueryDesc requesting no output */ - cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext, - GetActiveSnapshot(), - InvalidSnapshot, - dest, NULL, NULL, 0); + cstate_internal->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext, + GetActiveSnapshot(), + InvalidSnapshot, + dest, NULL, NULL, 0); /* * Call ExecutorStart to prepare the plan for execution. * * ExecutorStart computes a result tupdesc for us */ - ExecutorStart(cstate->queryDesc, 0); + ExecutorStart(cstate_internal->queryDesc, 0); - tupDesc = cstate->queryDesc->tupDesc; + tupDesc = cstate_internal->queryDesc->tupDesc; } /* Generate or convert list of attributes to process */ @@ -883,31 +908,34 @@ BeginCopyTo(ParseState *pstate, } } - /* Use client encoding when ENCODING option is not specified. */ - if (cstate->opts.file_encoding < 0) - cstate->file_encoding = pg_get_client_encoding(); - else - cstate->file_encoding = cstate->opts.file_encoding; + if (cstate_builtin) + { + /* Use client encoding when ENCODING option is not specified. */ + if (cstate->opts.file_encoding < 0) + cstate_builtin->file_encoding = pg_get_client_encoding(); + else + cstate_builtin->file_encoding = cstate->opts.file_encoding; - /* - * Set up encoding conversion info if the file and server encodings differ - * (see also pg_server_to_any). - */ - if (cstate->file_encoding == GetDatabaseEncoding() || - cstate->file_encoding == PG_SQL_ASCII) - cstate->need_transcoding = false; - else - cstate->need_transcoding = true; + /* + * Set up encoding conversion info if the file and server encodings + * differ (see also pg_server_to_any). + */ + if (cstate_builtin->file_encoding == GetDatabaseEncoding() || + cstate_builtin->file_encoding == PG_SQL_ASCII) + cstate_builtin->need_transcoding = false; + else + cstate_builtin->need_transcoding = true; - /* See Multibyte encoding comment above */ - cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding); + /* See Multibyte encoding comment above */ + cstate_builtin->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate_builtin->file_encoding); + } - cstate->copy_dest = COPY_FILE; /* default */ + cstate->copy_dest = COPY_DEST_FILE; /* default */ if (data_dest_cb) { progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK; - cstate->copy_dest = COPY_CALLBACK; + cstate->copy_dest = COPY_DEST_CALLBACK; cstate->data_dest_cb = data_dest_cb; } else if (pipe) @@ -920,18 +948,18 @@ BeginCopyTo(ParseState *pstate, } else { - cstate->filename = pstrdup(filename); - cstate->is_program = is_program; + cstate_internal->filename = pstrdup(filename); + cstate_internal->is_program = is_program; if (is_program) { progress_vals[1] = PROGRESS_COPY_TYPE_PROGRAM; - cstate->copy_file = OpenPipeStream(cstate->filename, PG_BINARY_W); + cstate->copy_file = OpenPipeStream(cstate_internal->filename, PG_BINARY_W); if (cstate->copy_file == NULL) ereport(ERROR, (errcode_for_file_access(), errmsg("could not execute command \"%s\": %m", - cstate->filename))); + cstate_internal->filename))); } else { @@ -952,7 +980,7 @@ BeginCopyTo(ParseState *pstate, oumask = umask(S_IWGRP | S_IWOTH); PG_TRY(); { - cstate->copy_file = AllocateFile(cstate->filename, PG_BINARY_W); + cstate->copy_file = AllocateFile(cstate_internal->filename, PG_BINARY_W); } PG_FINALLY(); { @@ -967,7 +995,7 @@ BeginCopyTo(ParseState *pstate, ereport(ERROR, (errcode_for_file_access(), errmsg("could not open file \"%s\" for writing: %m", - cstate->filename), + cstate_internal->filename), (save_errno == ENOENT || save_errno == EACCES) ? errhint("COPY TO instructs the PostgreSQL server process to write a file. " "You may want a client-side facility such as psql's \\copy.") : 0)); @@ -977,12 +1005,12 @@ BeginCopyTo(ParseState *pstate, ereport(ERROR, (errcode_for_file_access(), errmsg("could not stat file \"%s\": %m", - cstate->filename))); + cstate_internal->filename))); if (S_ISDIR(st.st_mode)) ereport(ERROR, (errcode(ERRCODE_WRONG_OBJECT_TYPE), - errmsg("\"%s\" is a directory", cstate->filename))); + errmsg("\"%s\" is a directory", cstate_internal->filename))); } } @@ -1004,12 +1032,14 @@ BeginCopyTo(ParseState *pstate, void EndCopyTo(CopyToState cstate) { - if (cstate->queryDesc != NULL) + CopyToStateInternal cstate_internal = CopyToStateGetInternal(cstate); + + if (cstate_internal->queryDesc != NULL) { /* Close down the query and free resources. */ - ExecutorFinish(cstate->queryDesc); - ExecutorEnd(cstate->queryDesc); - FreeQueryDesc(cstate->queryDesc); + ExecutorFinish(cstate_internal->queryDesc); + ExecutorEnd(cstate_internal->queryDesc); + FreeQueryDesc(cstate_internal->queryDesc); PopActiveSnapshot(); } @@ -1025,7 +1055,8 @@ EndCopyTo(CopyToState cstate) uint64 DoCopyTo(CopyToState cstate) { - bool pipe = (cstate->filename == NULL && cstate->data_dest_cb == NULL); + CopyToStateInternal cstate_internal = CopyToStateGetInternal(cstate); + bool pipe = (cstate_internal->filename == NULL && cstate->data_dest_cb == NULL); bool fe_copy = (pipe && whereToSendOutput == DestRemote); TupleDesc tupDesc; int num_phys_attrs; @@ -1038,7 +1069,7 @@ DoCopyTo(CopyToState cstate) if (cstate->rel) tupDesc = RelationGetDescr(cstate->rel); else - tupDesc = cstate->queryDesc->tupDesc; + tupDesc = cstate_internal->queryDesc->tupDesc; num_phys_attrs = tupDesc->natts; cstate->opts.null_print_client = cstate->opts.null_print; /* default */ @@ -1052,8 +1083,8 @@ DoCopyTo(CopyToState cstate) int attnum = lfirst_int(cur); Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1); - cstate->routine->CopyToOutFunc(cstate, attr->atttypid, - &cstate->out_functions[attnum - 1]); + cstate_internal->routine->CopyToOutFunc(cstate, attr->atttypid, + &cstate->out_functions[attnum - 1]); } /* @@ -1062,11 +1093,11 @@ DoCopyTo(CopyToState cstate) * datatype output routines, and should be faster than retail pfree's * anyway. (We don't need a whole econtext as CopyFrom does.) */ - cstate->rowcontext = AllocSetContextCreate(CurrentMemoryContext, - "COPY TO", - ALLOCSET_DEFAULT_SIZES); + cstate_internal->rowcontext = AllocSetContextCreate(CurrentMemoryContext, + "COPY TO", + ALLOCSET_DEFAULT_SIZES); - cstate->routine->CopyToStart(cstate, tupDesc); + cstate_internal->routine->CopyToStart(cstate, tupDesc); if (cstate->rel) { @@ -1101,13 +1132,13 @@ DoCopyTo(CopyToState cstate) else { /* run the plan --- the dest receiver will send tuples */ - ExecutorRun(cstate->queryDesc, ForwardScanDirection, 0); - processed = ((DR_copy *) cstate->queryDesc->dest)->processed; + ExecutorRun(cstate_internal->queryDesc, ForwardScanDirection, 0); + processed = ((DR_copy *) cstate_internal->queryDesc->dest)->processed; } - cstate->routine->CopyToEnd(cstate); + cstate_internal->routine->CopyToEnd(cstate); - MemoryContextDelete(cstate->rowcontext); + MemoryContextDelete(cstate_internal->rowcontext); if (fe_copy) SendCopyEnd(cstate); @@ -1121,15 +1152,16 @@ DoCopyTo(CopyToState cstate) static inline void CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot) { + CopyToStateInternal cstate_internal = CopyToStateGetInternal(cstate); MemoryContext oldcontext; - MemoryContextReset(cstate->rowcontext); - oldcontext = MemoryContextSwitchTo(cstate->rowcontext); + MemoryContextReset(cstate_internal->rowcontext); + oldcontext = MemoryContextSwitchTo(cstate_internal->rowcontext); /* Make sure the tuple is fully deconstructed */ slot_getallattrs(slot); - cstate->routine->CopyToOneRow(cstate, slot); + cstate_internal->routine->CopyToOneRow(cstate, slot); MemoryContextSwitchTo(oldcontext); } @@ -1146,13 +1178,14 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot) static void CopyAttributeOutText(CopyToState cstate, const char *string) { + CopyToStateBuiltin cstate_builtin = (CopyToStateBuiltin) cstate; const char *ptr; const char *start; char c; char delimc = cstate->opts.delim[0]; - if (cstate->need_transcoding) - ptr = pg_server_to_any(string, strlen(string), cstate->file_encoding); + if (cstate_builtin->need_transcoding) + ptr = pg_server_to_any(string, strlen(string), cstate_builtin->file_encoding); else ptr = string; @@ -1170,7 +1203,7 @@ CopyAttributeOutText(CopyToState cstate, const char *string) * it's worth making two copies of it to get the IS_HIGHBIT_SET() test out * of the normal safe-encoding path. */ - if (cstate->encoding_embeds_ascii) + if (cstate_builtin->encoding_embeds_ascii) { start = ptr; while ((c = *ptr) != '\0') @@ -1225,7 +1258,7 @@ CopyAttributeOutText(CopyToState cstate, const char *string) start = ptr++; /* we include char in next run */ } else if (IS_HIGHBIT_SET(c)) - ptr += pg_encoding_mblen(cstate->file_encoding, ptr); + ptr += pg_encoding_mblen(cstate_builtin->file_encoding, ptr); else ptr++; } @@ -1300,6 +1333,7 @@ static void CopyAttributeOutCSV(CopyToState cstate, const char *string, bool use_quote) { + CopyToStateBuiltin cstate_builtin = (CopyToStateBuiltin) cstate; const char *ptr; const char *start; char c; @@ -1312,8 +1346,8 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string, if (!use_quote && strcmp(string, cstate->opts.null_print) == 0) use_quote = true; - if (cstate->need_transcoding) - ptr = pg_server_to_any(string, strlen(string), cstate->file_encoding); + if (cstate_builtin->need_transcoding) + ptr = pg_server_to_any(string, strlen(string), cstate_builtin->file_encoding); else ptr = string; @@ -1342,8 +1376,8 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string, use_quote = true; break; } - if (IS_HIGHBIT_SET(c) && cstate->encoding_embeds_ascii) - tptr += pg_encoding_mblen(cstate->file_encoding, tptr); + if (IS_HIGHBIT_SET(c) && cstate_builtin->encoding_embeds_ascii) + tptr += pg_encoding_mblen(cstate_builtin->file_encoding, tptr); else tptr++; } @@ -1366,8 +1400,8 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string, CopySendChar(cstate, escapec); start = ptr; /* we include char in next run */ } - if (IS_HIGHBIT_SET(c) && cstate->encoding_embeds_ascii) - ptr += pg_encoding_mblen(cstate->file_encoding, ptr); + if (IS_HIGHBIT_SET(c) && cstate_builtin->encoding_embeds_ascii) + ptr += pg_encoding_mblen(cstate_builtin->file_encoding, ptr); else ptr++; } diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index 2a2d2f9876b..7c536c74a18 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -16,12 +16,65 @@ #include "commands/copy.h" +/* + * Represents the different dest cases we need to worry about at + * the bottom level + */ +typedef enum CopyDest +{ + COPY_DEST_FILE, /* to file (or a piped program) */ + COPY_DEST_FRONTEND, /* to frontend */ + COPY_DEST_CALLBACK, /* to callback function */ +} CopyDest; + +/* + * This struct contains the state variables used by PostgreSQL, built-in + * format routines and custom format routines. + */ +typedef struct CopyToStateData +{ + /* parameters from the COPY command */ + Relation rel; /* relation to copy to */ + List *attnumlist; /* integer list of attnums to copy */ + copy_data_dest_cb data_dest_cb; /* function for writing data */ + CopyFormatOptions opts; + FmgrInfo *out_functions; /* lookup info for output functions */ + + /* low-level state data */ + CopyDest copy_dest; /* type of copy source/destination */ + FILE *copy_file; /* used if copy_dest == COPY_FILE */ + StringInfo fe_msgbuf; /* used for all dests during COPY TO */ + + /* + * Working state + */ + uint64 bytes_processed; /* number of bytes processed so far */ +} CopyToStateData; + /* * API structure for a COPY TO format implementation. Note this must be * allocated in a server-lifetime manner, typically as a static const struct. */ typedef struct CopyToRoutine { + /* + * Return state size for this routine. + * + * If this routine uses CopyToStateData as-is, `return + * sizeof(CopyToStateData)` can be used. + * + * If this routine needs additional data than CopyToStateData, a new + * struct based on CopyToStateData can be used something like: + * + * typedef struct MyCopyToStateDate { + * struct CopyToStateData parent; + * int define_additional_members_here; + * } MyCopyToStateData; + * + * In the case, this callback returns `sizeof(MyCopyToStateData)`. + */ + size_t (*CopyToGetStateSize) (void); + /* * Set output function information. This callback is called once at the * beginning of COPY TO. -- 2.51.0
On Fri, Oct 3, 2025 at 12:06 AM Sutou Kouhei <kou@clear-code.com> wrote: > > Hi, > > In <CAD21AoADXWgdizS0mV5w8wdfftDRsm8sUtNW=CzYYS1OhjFD2A@mail.gmail.com> > "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 15 Sep 2025 10:00:18 -0700, > Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > >> > I think we can use a local variable of CopyFormatOptions and memcpy it > >> > to the opts of the returned cstate. > >> > >> It'll work too. Can we proceed this proposal with this > >> approach? Should we collect more opinions before we proceed? > >> If so, Could you add key people in this area to Cc to hear > >> their opinions? > > > > Since we don't have a single decision-maker, we should proceed through > > consensus-building and careful evaluation of each approach. I see that > > several senior hackers are already included in this thread, which is > > excellent. Since you and I, who have been involved in these > > discussions, agreed with this approach, I believe we can proceed with > > this direction. If anyone proposes alternative solutions that we find > > more compelling, we might have to change the approach. > > OK. There is no objection for now. > > How about the attached patch? The patch uses the approach > only for CopyToStateData. If this looks good, I can do it > for CopyFromStateData too. > > This patch splits CopyToStateData to > > * CopyToStateData > * CopyToStateInternalData > * CopyToStateBuiltinData > > structs. > > This is based on the category described in > https://www.postgresql.org/message-id/flat/CAD21AoBa0Wm3C2H12jaqkvLidP2zEhsC%2Bgf%3D3w7XiA4LQnvx0g%40mail.gmail.com#85cb988b0bec243d1e8dce699e02e009 > : > > > 1. fields used only the core (not by format callback) > > 2. fields used by both the core and format callbacks > > 3. built-in format specific fields (mostly for text and csv) > > CopyToStateInternalData is for 1. > CopyToStateData is for 2. > CopyToStateBuiltinData is for 3. > > > This patch adds CopyToRoutine::CopyToGetStateSize() that > returns size of state struct for the routine. For example, > Built-in formats use sizeof(CopyToStateBuiltinData) for it. > > BeginCopyTo() allocates sizeof(CopyToStateInternalData) + > CopyToGetStateSize() size continuous memory and uses the > front part as CopyToStateInternalData and the back part as > CopyToStateData/CopyToStateBuilinData. Thank you for drafting the idea! The patch refactors the CopyToStateData so that we can both hide internal-use-only fields from extensions and extension can use its own state data, while not adding extra indirection layers. TBH I'm really not sure we must fully hide internal fields from extensions. Other extendable components seem not to strictly hide internal information from extensions. I'd suggest starting with only the latter point. That is, we merge fields in CopyToStateInternalData to CopyToStateData. What do you think? Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Hi,
In <CAD21AoBkA=g=PN17r_iieru+vLyLtGZ8WvohgANa2vzsMfMogQ@mail.gmail.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 13 Oct 2025 14:40:31 -0700,
  Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> The patch refactors the CopyToStateData so that we can both hide
> internal-use-only fields from extensions and extension can use its own
> state data, while not adding extra indirection layers. TBH I'm really
> not sure we must fully hide internal fields from extensions. Other
> extendable components seem not to strictly hide internal information
> from extensions. I'd suggest starting with only the latter point. That
> is, we merge fields in CopyToStateInternalData to CopyToStateData.
> What do you think?
OK. Let's follow the existing style. How about the attached
patch? It merges CopyToStateInternalData to CopyToStateData.
Thanks,
-- 
kou
From 325f56d4b4372f7b90b88c9c9068d253fcc9f39a Mon Sep 17 00:00:00 2001
From: Sutou Kouhei <kou@clear-code.com>
Date: Tue, 14 Oct 2025 11:08:23 +0900
Subject: [PATCH] Split CopyToStateData to CopyToState{,Builtin}Data
---
 src/backend/commands/copyto.c  | 170 ++++++++++++++++-----------------
 src/include/commands/copyapi.h |  66 +++++++++++++
 2 files changed, 148 insertions(+), 88 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index e5781155cdf..176d98f866b 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -21,7 +21,6 @@
 #include "access/tableam.h"
 #include "commands/copyapi.h"
 #include "commands/progress.h"
-#include "executor/execdesc.h"
 #include "executor/executor.h"
 #include "executor/tuptable.h"
 #include "libpq/libpq.h"
@@ -37,19 +36,7 @@
 #include "utils/snapmgr.h"
 
 /*
- * Represents the different dest cases we need to worry about at
- * the bottom level
- */
-typedef enum CopyDest
-{
-    COPY_FILE,                    /* to file (or a piped program) */
-    COPY_FRONTEND,                /* to frontend */
-    COPY_CALLBACK,                /* to callback function */
-} CopyDest;
-
-/*
- * This struct contains all the state variables used throughout a COPY TO
- * operation.
+ * This struct contains the state variables used by built-in format routines.
  *
  * Multi-byte encodings: all supported client-side encodings encode multi-byte
  * characters by having the first byte's high bit set. Subsequent bytes of the
@@ -62,40 +49,16 @@ typedef enum CopyDest
  * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true
  * when we have to do it the hard way.
  */
-typedef struct CopyToStateData
+typedef struct CopyToStateBuiltinData
 {
-    /* format-specific routines */
-    const CopyToRoutine *routine;
+    CopyToStateData parent;
 
     /* low-level state data */
-    CopyDest    copy_dest;        /* type of copy source/destination */
-    FILE       *copy_file;        /* used if copy_dest == COPY_FILE */
-    StringInfo    fe_msgbuf;        /* used for all dests during COPY TO */
-
     int            file_encoding;    /* file or remote side's character encoding */
     bool        need_transcoding;    /* file encoding diff from server? */
     bool        encoding_embeds_ascii;    /* ASCII can be non-first byte? */
-
-    /* parameters from the COPY command */
-    Relation    rel;            /* relation to copy to */
-    QueryDesc  *queryDesc;        /* executable query to copy from */
-    List       *attnumlist;        /* integer list of attnums to copy */
-    char       *filename;        /* filename, or NULL for STDOUT */
-    bool        is_program;        /* is 'filename' a program to popen? */
-    copy_data_dest_cb data_dest_cb; /* function for writing data */
-
-    CopyFormatOptions opts;
-    Node       *whereClause;    /* WHERE condition (or NULL) */
-
-    /*
-     * Working state
-     */
-    MemoryContext copycontext;    /* per-copy execution context */
-
-    FmgrInfo   *out_functions;    /* lookup info for output functions */
-    MemoryContext rowcontext;    /* per-row evaluation context */
-    uint64        bytes_processed;    /* number of bytes processed so far */
-} CopyToStateData;
+}            CopyToStateBuiltinData;
+typedef struct CopyToStateBuiltinData *CopyToStateBuiltin;
 
 /* DestReceiver for COPY (query) TO */
 typedef struct
@@ -118,6 +81,7 @@ static void CopyAttributeOutCSV(CopyToState cstate, const char *string,
                                 bool use_quote);
 
 /* built-in format-specific routines */
+static size_t CopyToBuiltinGetStateSize(void);
 static void CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc);
 static void CopyToTextLikeOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo);
 static void CopyToTextOneRow(CopyToState cstate, TupleTableSlot *slot);
@@ -150,6 +114,7 @@ static void CopySendInt16(CopyToState cstate, int16 val);
 
 /* text format */
 static const CopyToRoutine CopyToRoutineText = {
+    .CopyToGetStateSize = CopyToBuiltinGetStateSize,
     .CopyToStart = CopyToTextLikeStart,
     .CopyToOutFunc = CopyToTextLikeOutFunc,
     .CopyToOneRow = CopyToTextOneRow,
@@ -158,6 +123,7 @@ static const CopyToRoutine CopyToRoutineText = {
 
 /* CSV format */
 static const CopyToRoutine CopyToRoutineCSV = {
+    .CopyToGetStateSize = CopyToBuiltinGetStateSize,
     .CopyToStart = CopyToTextLikeStart,
     .CopyToOutFunc = CopyToTextLikeOutFunc,
     .CopyToOneRow = CopyToCSVOneRow,
@@ -166,6 +132,7 @@ static const CopyToRoutine CopyToRoutineCSV = {
 
 /* binary format */
 static const CopyToRoutine CopyToRoutineBinary = {
+    .CopyToGetStateSize = CopyToBuiltinGetStateSize,
     .CopyToStart = CopyToBinaryStart,
     .CopyToOutFunc = CopyToBinaryOutFunc,
     .CopyToOneRow = CopyToBinaryOneRow,
@@ -185,18 +152,27 @@ CopyToGetRoutine(const CopyFormatOptions *opts)
     return &CopyToRoutineText;
 }
 
+/* Implementation of the allocate callback for all built-in formats */
+static size_t
+CopyToBuiltinGetStateSize(void)
+{
+    return sizeof(CopyToStateBuiltinData);
+}
+
 /* Implementation of the start callback for text and CSV formats */
 static void
 CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc)
 {
+    CopyToStateBuiltin cstate_builtin = (CopyToStateBuiltin) cstate;
+
     /*
      * For non-binary copy, we need to convert null_print to file encoding,
      * because it will be sent directly with CopySendString.
      */
-    if (cstate->need_transcoding)
+    if (cstate_builtin->need_transcoding)
         cstate->opts.null_print_client = pg_server_to_any(cstate->opts.null_print,
                                                           cstate->opts.null_print_len,
-                                                          cstate->file_encoding);
+                                                          cstate_builtin->file_encoding);
 
     /* if a header has been requested send the line */
     if (cstate->opts.header_line == COPY_HEADER_TRUE)
@@ -401,7 +377,7 @@ SendCopyBegin(CopyToState cstate)
     for (i = 0; i < natts; i++)
         pq_sendint16(&buf, format); /* per-column formats */
     pq_endmessage(&buf);
-    cstate->copy_dest = COPY_FRONTEND;
+    cstate->copy_dest = COPY_DEST_FRONTEND;
 }
 
 static void
@@ -448,7 +424,7 @@ CopySendEndOfRow(CopyToState cstate)
 
     switch (cstate->copy_dest)
     {
-        case COPY_FILE:
+        case COPY_DEST_FILE:
             if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1,
                        cstate->copy_file) != 1 ||
                 ferror(cstate->copy_file))
@@ -482,11 +458,11 @@ CopySendEndOfRow(CopyToState cstate)
                              errmsg("could not write to COPY file: %m")));
             }
             break;
-        case COPY_FRONTEND:
+        case COPY_DEST_FRONTEND:
             /* Dump the accumulated row as one CopyData message */
             (void) pq_putmessage(PqMsg_CopyData, fe_msgbuf->data, fe_msgbuf->len);
             break;
-        case COPY_CALLBACK:
+        case COPY_DEST_CALLBACK:
             cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len);
             break;
     }
@@ -507,7 +483,7 @@ CopySendTextLikeEndOfRow(CopyToState cstate)
 {
     switch (cstate->copy_dest)
     {
-        case COPY_FILE:
+        case COPY_DEST_FILE:
             /* Default line termination depends on platform */
 #ifndef WIN32
             CopySendChar(cstate, '\n');
@@ -515,7 +491,7 @@ CopySendTextLikeEndOfRow(CopyToState cstate)
             CopySendString(cstate, "\r\n");
 #endif
             break;
-        case COPY_FRONTEND:
+        case COPY_DEST_FRONTEND:
             /* The FE/BE protocol uses \n as newline for all platforms */
             CopySendChar(cstate, '\n');
             break;
@@ -631,10 +607,14 @@ BeginCopyTo(ParseState *pstate,
             List *options)
 {
     CopyToState cstate;
+    CopyToStateBuiltin cstate_builtin;
     bool        pipe = (filename == NULL && data_dest_cb == NULL);
     TupleDesc    tupDesc;
     int            num_phys_attrs;
+    MemoryContext copycontext;
     MemoryContext oldcontext;
+    CopyFormatOptions opts = {0};
+    const CopyToRoutine *routine;
     const int    progress_cols[] = {
         PROGRESS_COPY_COMMAND,
         PROGRESS_COPY_TYPE
@@ -686,24 +666,33 @@ BeginCopyTo(ParseState *pstate,
     }
 
 
-    /* Allocate workspace and zero all fields */
-    cstate = (CopyToStateData *) palloc0(sizeof(CopyToStateData));
-
     /*
      * We allocate everything used by a cstate in a new memory context. This
      * avoids memory leaks during repeated use of COPY in a query.
      */
-    cstate->copycontext = AllocSetContextCreate(CurrentMemoryContext,
-                                                "COPY",
-                                                ALLOCSET_DEFAULT_SIZES);
+    copycontext = AllocSetContextCreate(CurrentMemoryContext,
+                                        "COPY",
+                                        ALLOCSET_DEFAULT_SIZES);
 
-    oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+    oldcontext = MemoryContextSwitchTo(copycontext);
 
     /* Extract options from the statement node tree */
-    ProcessCopyOptions(pstate, &cstate->opts, false /* is_from */ , options);
+    ProcessCopyOptions(pstate, &opts, false /* is_from */ , options);
 
-    /* Set format routine */
-    cstate->routine = CopyToGetRoutine(&cstate->opts);
+    /* Get format routine */
+    routine = CopyToGetRoutine(&opts);
+
+    /* Allocate workspace and set known values */
+    MemoryContextSwitchTo(oldcontext);
+    cstate = (CopyToState) palloc0(routine->CopyToGetStateSize());
+    MemoryContextSwitchTo(copycontext);
+    if (routine == &CopyToRoutineText || routine == &CopyToRoutineCSV || routine == &CopyToRoutineBinary)
+        cstate_builtin = (CopyToStateBuiltin) cstate;
+    else
+        cstate_builtin = NULL;
+    cstate->copycontext = copycontext;
+    cstate->opts = opts;
+    cstate->routine = routine;
 
     /* Process the source/target relation or query */
     if (rel)
@@ -883,31 +872,34 @@ BeginCopyTo(ParseState *pstate,
         }
     }
 
-    /* Use client encoding when ENCODING option is not specified. */
-    if (cstate->opts.file_encoding < 0)
-        cstate->file_encoding = pg_get_client_encoding();
-    else
-        cstate->file_encoding = cstate->opts.file_encoding;
+    if (cstate_builtin)
+    {
+        /* Use client encoding when ENCODING option is not specified. */
+        if (cstate->opts.file_encoding < 0)
+            cstate_builtin->file_encoding = pg_get_client_encoding();
+        else
+            cstate_builtin->file_encoding = cstate->opts.file_encoding;
 
-    /*
-     * Set up encoding conversion info if the file and server encodings differ
-     * (see also pg_server_to_any).
-     */
-    if (cstate->file_encoding == GetDatabaseEncoding() ||
-        cstate->file_encoding == PG_SQL_ASCII)
-        cstate->need_transcoding = false;
-    else
-        cstate->need_transcoding = true;
+        /*
+         * Set up encoding conversion info if the file and server encodings
+         * differ (see also pg_server_to_any).
+         */
+        if (cstate_builtin->file_encoding == GetDatabaseEncoding() ||
+            cstate_builtin->file_encoding == PG_SQL_ASCII)
+            cstate_builtin->need_transcoding = false;
+        else
+            cstate_builtin->need_transcoding = true;
 
-    /* See Multibyte encoding comment above */
-    cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding);
+        /* See Multibyte encoding comment above */
+        cstate_builtin->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate_builtin->file_encoding);
+    }
 
-    cstate->copy_dest = COPY_FILE;    /* default */
+    cstate->copy_dest = COPY_DEST_FILE; /* default */
 
     if (data_dest_cb)
     {
         progress_vals[1] = PROGRESS_COPY_TYPE_CALLBACK;
-        cstate->copy_dest = COPY_CALLBACK;
+        cstate->copy_dest = COPY_DEST_CALLBACK;
         cstate->data_dest_cb = data_dest_cb;
     }
     else if (pipe)
@@ -1146,13 +1138,14 @@ CopyOneRowTo(CopyToState cstate, TupleTableSlot *slot)
 static void
 CopyAttributeOutText(CopyToState cstate, const char *string)
 {
+    CopyToStateBuiltin cstate_builtin = (CopyToStateBuiltin) cstate;
     const char *ptr;
     const char *start;
     char        c;
     char        delimc = cstate->opts.delim[0];
 
-    if (cstate->need_transcoding)
-        ptr = pg_server_to_any(string, strlen(string), cstate->file_encoding);
+    if (cstate_builtin->need_transcoding)
+        ptr = pg_server_to_any(string, strlen(string), cstate_builtin->file_encoding);
     else
         ptr = string;
 
@@ -1170,7 +1163,7 @@ CopyAttributeOutText(CopyToState cstate, const char *string)
      * it's worth making two copies of it to get the IS_HIGHBIT_SET() test out
      * of the normal safe-encoding path.
      */
-    if (cstate->encoding_embeds_ascii)
+    if (cstate_builtin->encoding_embeds_ascii)
     {
         start = ptr;
         while ((c = *ptr) != '\0')
@@ -1225,7 +1218,7 @@ CopyAttributeOutText(CopyToState cstate, const char *string)
                 start = ptr++;    /* we include char in next run */
             }
             else if (IS_HIGHBIT_SET(c))
-                ptr += pg_encoding_mblen(cstate->file_encoding, ptr);
+                ptr += pg_encoding_mblen(cstate_builtin->file_encoding, ptr);
             else
                 ptr++;
         }
@@ -1300,6 +1293,7 @@ static void
 CopyAttributeOutCSV(CopyToState cstate, const char *string,
                     bool use_quote)
 {
+    CopyToStateBuiltin cstate_builtin = (CopyToStateBuiltin) cstate;
     const char *ptr;
     const char *start;
     char        c;
@@ -1312,8 +1306,8 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string,
     if (!use_quote && strcmp(string, cstate->opts.null_print) == 0)
         use_quote = true;
 
-    if (cstate->need_transcoding)
-        ptr = pg_server_to_any(string, strlen(string), cstate->file_encoding);
+    if (cstate_builtin->need_transcoding)
+        ptr = pg_server_to_any(string, strlen(string), cstate_builtin->file_encoding);
     else
         ptr = string;
 
@@ -1342,8 +1336,8 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string,
                     use_quote = true;
                     break;
                 }
-                if (IS_HIGHBIT_SET(c) && cstate->encoding_embeds_ascii)
-                    tptr += pg_encoding_mblen(cstate->file_encoding, tptr);
+                if (IS_HIGHBIT_SET(c) && cstate_builtin->encoding_embeds_ascii)
+                    tptr += pg_encoding_mblen(cstate_builtin->file_encoding, tptr);
                 else
                     tptr++;
             }
@@ -1366,8 +1360,8 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string,
                 CopySendChar(cstate, escapec);
                 start = ptr;    /* we include char in next run */
             }
-            if (IS_HIGHBIT_SET(c) && cstate->encoding_embeds_ascii)
-                ptr += pg_encoding_mblen(cstate->file_encoding, ptr);
+            if (IS_HIGHBIT_SET(c) && cstate_builtin->encoding_embeds_ascii)
+                ptr += pg_encoding_mblen(cstate_builtin->file_encoding, ptr);
             else
                 ptr++;
         }
diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h
index 2a2d2f9876b..aece73f4ca2 100644
--- a/src/include/commands/copyapi.h
+++ b/src/include/commands/copyapi.h
@@ -15,6 +15,7 @@
 #define COPYAPI_H
 
 #include "commands/copy.h"
+#include "executor/execdesc.h"
 
 /*
  * API structure for a COPY TO format implementation. Note this must be
@@ -22,6 +23,25 @@
  */
 typedef struct CopyToRoutine
 {
+    /* ---
+     * Return state size for this routine.
+     *
+     * If this routine uses CopyToStateData as-is, `return
+     * sizeof(CopyToStateData)` can be used.
+     *
+     * If this routine needs additional data than CopyToStateData, a new
+     * struct based on CopyToStateData can be used something like:
+     *
+     * typedef struct MyCopyToStateDate {
+     *     struct CopyToStateData parent;
+     *     int define_additional_members_here;
+     * } MyCopyToStateData;
+     *
+     * In the case, this callback returns `sizeof(MyCopyToStateData)`.
+     * ---
+     */
+    size_t        (*CopyToGetStateSize) (void);
+
     /*
      * Set output function information. This callback is called once at the
      * beginning of COPY TO.
@@ -54,6 +74,52 @@ typedef struct CopyToRoutine
     void        (*CopyToEnd) (CopyToState cstate);
 } CopyToRoutine;
 
+/*
+ * Represents the different dest cases we need to worry about at
+ * the bottom level
+ */
+typedef enum CopyDest
+{
+    COPY_DEST_FILE,                /* to file (or a piped program) */
+    COPY_DEST_FRONTEND,            /* to frontend */
+    COPY_DEST_CALLBACK,            /* to callback function */
+} CopyDest;
+
+/*
+ * This struct contains the state variables used by PostgreSQL, built-in
+ * format routines and custom format routines.
+ */
+typedef struct CopyToStateData
+{
+    /* format-specific routines */
+    const CopyToRoutine *routine;
+
+    /* low-level state data */
+    CopyDest    copy_dest;        /* type of copy source/destination */
+    FILE       *copy_file;        /* used if copy_dest == COPY_FILE */
+    StringInfo    fe_msgbuf;        /* used for all dests during COPY TO */
+
+    /* parameters from the COPY command */
+    Relation    rel;            /* relation to copy to */
+    QueryDesc  *queryDesc;        /* executable query to copy from */
+    List       *attnumlist;        /* integer list of attnums to copy */
+    char       *filename;        /* filename, or NULL for STDOUT */
+    bool        is_program;        /* is 'filename' a program to popen? */
+    copy_data_dest_cb data_dest_cb; /* function for writing data */
+
+    CopyFormatOptions opts;
+    Node       *whereClause;    /* WHERE condition (or NULL) */
+
+    /*
+     * Working state
+     */
+    MemoryContext copycontext;    /* per-copy execution context */
+
+    FmgrInfo   *out_functions;    /* lookup info for output functions */
+    MemoryContext rowcontext;    /* per-row evaluation context */
+    uint64        bytes_processed;    /* number of bytes processed so far */
+} CopyToStateData;
+
 /*
  * API structure for a COPY FROM format implementation. Note this must be
  * allocated in a server-lifetime manner, typically as a static const struct.
-- 
2.51.0