Re: Make COPY format extendable: Extract COPY TO format implementations

Поиск
Список
Период
Сортировка
От Junwang Zhao
Тема Re: Make COPY format extendable: Extract COPY TO format implementations
Дата
Msg-id CAEG8a3+jG_NKOUmcxDyEX2xSggBXReZ4H=e3RFsUtedY88A03w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Make COPY format extendable: Extract COPY TO format implementations  (Sutou Kouhei <kou@clear-code.com>)
Ответы Re: Make COPY format extendable: Extract COPY TO format implementations
Список pgsql-hackers
On Thu, Dec 21, 2023 at 5:35 PM Sutou Kouhei <kou@clear-code.com> wrote:
>
> Hi,
>
> In <CAD21AoCunywHird3GaPzWe6s9JG1wzxj3Cr6vGN36DDheGjOjA@mail.gmail.com>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 11 Dec 2023 23:31:29 +0900,
>   Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> > I've sketched the above idea including a test module in
> > src/test/module/test_copy_format, based on v2 patch. It's not splitted
> > and is dirty so just for discussion.
>
> I implemented a sample COPY TO handler for Apache Arrow that
> supports only integer and text.
>
> I needed to extend the patch:
>
> 1. Add an opaque space for custom COPY TO handler
>    * Add CopyToState{Get,Set}Opaque()
>    https://github.com/kou/postgres/commit/5a610b6a066243f971e029432db67152cfe5e944
>
> 2. Export CopyToState::attnumlist
>    * Add CopyToStateGetAttNumList()
>    https://github.com/kou/postgres/commit/15fcba8b4e95afa86edb3f677a7bdb1acb1e7688
>
> 3. Export CopySend*()
>    * Rename CopySend*() to CopyToStateSend*() and export them
>    * Exception: CopySendEndOfRow() to CopyToStateFlush() because
>      it just flushes the internal buffer now.
>    https://github.com/kou/postgres/commit/289a5640135bde6733a1b8e2c412221ad522901e
>
I guess the purpose of these helpers is to avoid expose CopyToState to
copy.h, but I
think expose CopyToState to user might make life easier, users might want to use
the memory contexts of the structure (though I agree not all the
fields are necessary
for extension handers).

> The attached patch is based on the Sawada-san's patch and
> includes the above changes. Note that this patch is also
> dirty so just for discussion.
>
> My suggestions from this experience:
>
> 1. Split COPY handler to COPY TO handler and COPY FROM handler
>
>    * CopyFormatRoutine is a bit tricky. An extension needs
>      to create a CopyFormatRoutine node and
>      a CopyToFormatRoutine node.
>
>    * If we just require "copy_to_${FORMAT}(internal)"
>      function and "copy_from_${FORMAT}(internal)" function,
>      we can remove the tricky approach. And it also avoid
>      name collisions with other handler such as tablesample
>      handler.
>      See also:
>
https://www.postgresql.org/message-id/flat/20231214.184414.2179134502876898942.kou%40clear-code.com#af71f364d0a9f5c144e45b447e5c16c9
>
> 2. Need an opaque space like IndexScanDesc::opaque does
>
>    * A custom COPY TO handler needs to keep its data

I once thought users might want to parse their own options, maybe this
is a use case for this opaque space.

For the name, I thought private_data might be a better candidate than
opaque, but I do not insist.
>
> 3. Export CopySend*()
>
>    * If we like minimum API, we just need to export
>      CopySendData() and CopySendEndOfRow(). But
>      CopySend{String,Char,Int32,Int16}() will be convenient
>      custom COPY TO handlers. (A custom COPY TO handler for
>      Apache Arrow doesn't need them.)

Do you use the arrow library to control the memory? Is there a way that
we can let the arrow use postgres' memory context? I'm not sure this
is necessary, just raise the question for discussion.
>
> Questions:
>
> 1. What value should be used for "format" in
>    PgMsg_CopyOutResponse message?
>
>
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/commands/copyto.c;h=c66a047c4a79cc614784610f385f1cd0935350f3;hb=9ca6e7b9411e36488ef539a2c1f6846ac92a7072#l144
>
>    It's 1 for binary format and 0 for text/csv format.
>
>    Should we make it customizable by custom COPY TO handler?
>    If so, what value should be used for this?
>
> 2. Do we need more tries for design discussion for the first
>    implementation? If we need, what should we try?
>
>
> Thanks,
> --
> kou

+PG_FUNCTION_INFO_V1(copy_testfmt_handler);
+Datum
+copy_testfmt_handler(PG_FUNCTION_ARGS)
+{
+ bool is_from = PG_GETARG_BOOL(0);
+ CopyFormatRoutine *cp = makeNode(CopyFormatRoutine);;
+

extra semicolon.

--
Regards
Junwang Zhao



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: Track in pg_replication_slots the reason why slots conflict?
Следующее
От: Alexander Lakhin
Дата:
Сообщение: Re: trying again to get incremental backup