Re: Make COPY format extendable: Extract COPY TO format implementations

Поиск
Список
Период
Сортировка
От Sutou Kouhei
Тема Re: Make COPY format extendable: Extract COPY TO format implementations
Дата
Msg-id 20240110.152023.1920937326588672387.kou@clear-code.com
обсуждение исходный текст
Ответ на Re: Make COPY format extendable: Extract COPY TO format implementations  (Junwang Zhao <zhjwpku@gmail.com>)
Ответы Re: Make COPY format extendable: Extract COPY TO format implementations  (Junwang Zhao <zhjwpku@gmail.com>)
Список pgsql-hackers
Hi,

In <CAEG8a3+jG_NKOUmcxDyEX2xSggBXReZ4H=e3RFsUtedY88A03w@mail.gmail.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 22 Dec 2023 10:58:05 +0800,
  Junwang Zhao <zhjwpku@gmail.com> wrote:

>> 1. Add an opaque space for custom COPY TO handler
>>    * Add CopyToState{Get,Set}Opaque()
>>    https://github.com/kou/postgres/commit/5a610b6a066243f971e029432db67152cfe5e944
>>
>> 2. Export CopyToState::attnumlist
>>    * Add CopyToStateGetAttNumList()
>>    https://github.com/kou/postgres/commit/15fcba8b4e95afa86edb3f677a7bdb1acb1e7688
>>
>> 3. Export CopySend*()
>>    * Rename CopySend*() to CopyToStateSend*() and export them
>>    * Exception: CopySendEndOfRow() to CopyToStateFlush() because
>>      it just flushes the internal buffer now.
>>    https://github.com/kou/postgres/commit/289a5640135bde6733a1b8e2c412221ad522901e
>>
> I guess the purpose of these helpers is to avoid expose CopyToState to
> copy.h,

Yes.

>         but I
> think expose CopyToState to user might make life easier, users might want to use
> the memory contexts of the structure (though I agree not all the
> fields are necessary
> for extension handers).

OK. I don't object it as I said in another e-mail:

https://www.postgresql.org/message-id/flat/20240110.120644.1876591646729327180.kou%40clear-code.com#d923173e9625c20319750155083cbd72

>> 2. Need an opaque space like IndexScanDesc::opaque does
>>
>>    * A custom COPY TO handler needs to keep its data
> 
> I once thought users might want to parse their own options, maybe this
> is a use case for this opaque space.

Good catch! I forgot to suggest a callback for custom format
options. How about the following API?

----
...
typedef bool (*CopyToProcessOption_function) (CopyToState cstate, DefElem *defel);

...
typedef bool (*CopyFromProcessOption_function) (CopyFromState cstate, DefElem *defel);

typedef struct CopyToFormatRoutine
{
    ...
    CopyToProcessOption_function process_option_fn;
} CopyToFormatRoutine;

typedef struct CopyFromFormatRoutine
{
    ...
    CopyFromProcessOption_function process_option_fn;
} CopyFromFormatRoutine;
----

----
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index e7597894bf..1aa8b62551 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -416,6 +416,7 @@ void
 ProcessCopyOptions(ParseState *pstate,
                    CopyFormatOptions *opts_out,
                    bool is_from,
+                   void *cstate, /* CopyToState* for !is_from, CopyFromState* for is_from */
                    List *options)
 {
     bool        format_specified = false;
@@ -593,11 +594,19 @@ ProcessCopyOptions(ParseState *pstate,
                          parser_errposition(pstate, defel->location)));
         }
         else
-            ereport(ERROR,
-                    (errcode(ERRCODE_SYNTAX_ERROR),
-                     errmsg("option \"%s\" not recognized",
-                            defel->defname),
-                     parser_errposition(pstate, defel->location)));
+        {
+            bool processed;
+            if (is_from)
+                processed = opts_out->from_ops->process_option_fn(cstate, defel);
+            else
+                processed = opts_out->to_ops->process_option_fn(cstate, defel);
+            if (!processed)
+                ereport(ERROR,
+                        (errcode(ERRCODE_SYNTAX_ERROR),
+                         errmsg("option \"%s\" not recognized",
+                                defel->defname),
+                         parser_errposition(pstate, defel->location)));
+        }
     }
 
     /*
----

> For the name, I thought private_data might be a better candidate than
> opaque, but I do not insist.

I don't have a strong opinion for this. Here are the number
of headers that use "private_data" and "opaque":

$ grep -r private_data --files-with-matches src/include | wc -l
6
$ grep -r opaque --files-with-matches src/include | wc -l
38

It seems that we use "opaque" than "private_data" in general.

but it seems that we use
"opaque" than "private_data" in our code.

> Do you use the arrow library to control the memory?

Yes.

>                                                     Is there a way that
> we can let the arrow use postgres' memory context?

Yes. Apache Arrow C++ provides a memory pool feature and we
can implement PostgreSQL's memory context based memory pool
for this. (But this is a custom COPY TO/FROM handler's
implementation details.)

>                                                    I'm not sure this
> is necessary, just raise the question for discussion.

Could you clarify what should we discuss? We should require
that COPY TO/FROM handlers should use PostgreSQL's memory
context for all internal memory allocations?

> +PG_FUNCTION_INFO_V1(copy_testfmt_handler);
> +Datum
> +copy_testfmt_handler(PG_FUNCTION_ARGS)
> +{
> + bool is_from = PG_GETARG_BOOL(0);
> + CopyFormatRoutine *cp = makeNode(CopyFormatRoutine);;
> +
> 
> extra semicolon.

I noticed it too :-)
But I ignored it because the current implementation is only
for discussion. We know that it may be dirty.


Thanks,
-- 
kou



В списке pgsql-hackers по дате отправления:

Предыдущее
От: John Naylor
Дата:
Сообщение: Re: Add BF member koel-like indentation checks to SanityCheck CI
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Add BF member koel-like indentation checks to SanityCheck CI