Re: Make COPY format extendable: Extract COPY TO format implementations

Поиск
Список
Период
Сортировка
От Junwang Zhao
Тема Re: Make COPY format extendable: Extract COPY TO format implementations
Дата
Msg-id CAEG8a3J02NzGBxG1rP9C4u7qRLOqUjSOdy3q5_5v__fydS3XcA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Make COPY format extendable: Extract COPY TO format implementations  (Sutou Kouhei <kou@clear-code.com>)
Ответы Re: Make COPY format extendable: Extract COPY TO format implementations  (Sutou Kouhei <kou@clear-code.com>)
Список pgsql-hackers
Hi,

On Wed, Jan 10, 2024 at 2:20 PM Sutou Kouhei <kou@clear-code.com> wrote:
>
> Hi,
>
> In <CAEG8a3+jG_NKOUmcxDyEX2xSggBXReZ4H=e3RFsUtedY88A03w@mail.gmail.com>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 22 Dec 2023 10:58:05 +0800,
>   Junwang Zhao <zhjwpku@gmail.com> wrote:
>
> >> 1. Add an opaque space for custom COPY TO handler
> >>    * Add CopyToState{Get,Set}Opaque()
> >>    https://github.com/kou/postgres/commit/5a610b6a066243f971e029432db67152cfe5e944
> >>
> >> 2. Export CopyToState::attnumlist
> >>    * Add CopyToStateGetAttNumList()
> >>    https://github.com/kou/postgres/commit/15fcba8b4e95afa86edb3f677a7bdb1acb1e7688
> >>
> >> 3. Export CopySend*()
> >>    * Rename CopySend*() to CopyToStateSend*() and export them
> >>    * Exception: CopySendEndOfRow() to CopyToStateFlush() because
> >>      it just flushes the internal buffer now.
> >>    https://github.com/kou/postgres/commit/289a5640135bde6733a1b8e2c412221ad522901e
> >>
> > I guess the purpose of these helpers is to avoid expose CopyToState to
> > copy.h,
>
> Yes.
>
> >         but I
> > think expose CopyToState to user might make life easier, users might want to use
> > the memory contexts of the structure (though I agree not all the
> > fields are necessary
> > for extension handers).
>
> OK. I don't object it as I said in another e-mail:
>
https://www.postgresql.org/message-id/flat/20240110.120644.1876591646729327180.kou%40clear-code.com#d923173e9625c20319750155083cbd72
>
> >> 2. Need an opaque space like IndexScanDesc::opaque does
> >>
> >>    * A custom COPY TO handler needs to keep its data
> >
> > I once thought users might want to parse their own options, maybe this
> > is a use case for this opaque space.
>
> Good catch! I forgot to suggest a callback for custom format
> options. How about the following API?
>
> ----
> ...
> typedef bool (*CopyToProcessOption_function) (CopyToState cstate, DefElem *defel);
>
> ...
> typedef bool (*CopyFromProcessOption_function) (CopyFromState cstate, DefElem *defel);
>
> typedef struct CopyToFormatRoutine
> {
>         ...
>         CopyToProcessOption_function process_option_fn;
> } CopyToFormatRoutine;
>
> typedef struct CopyFromFormatRoutine
> {
>         ...
>         CopyFromProcessOption_function process_option_fn;
> } CopyFromFormatRoutine;
> ----
>
> ----
> diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
> index e7597894bf..1aa8b62551 100644
> --- a/src/backend/commands/copy.c
> +++ b/src/backend/commands/copy.c
> @@ -416,6 +416,7 @@ void
>  ProcessCopyOptions(ParseState *pstate,
>                                    CopyFormatOptions *opts_out,
>                                    bool is_from,
> +                                  void *cstate, /* CopyToState* for !is_from, CopyFromState* for is_from */
>                                    List *options)
>  {
>         bool            format_specified = false;
> @@ -593,11 +594,19 @@ ProcessCopyOptions(ParseState *pstate,
>                                                  parser_errposition(pstate, defel->location)));
>                 }
>                 else
> -                       ereport(ERROR,
> -                                       (errcode(ERRCODE_SYNTAX_ERROR),
> -                                        errmsg("option \"%s\" not recognized",
> -                                                       defel->defname),
> -                                        parser_errposition(pstate, defel->location)));
> +               {
> +                       bool processed;
> +                       if (is_from)
> +                               processed = opts_out->from_ops->process_option_fn(cstate, defel);
> +                       else
> +                               processed = opts_out->to_ops->process_option_fn(cstate, defel);
> +                       if (!processed)
> +                               ereport(ERROR,
> +                                               (errcode(ERRCODE_SYNTAX_ERROR),
> +                                                errmsg("option \"%s\" not recognized",
> +                                                               defel->defname),
> +                                                parser_errposition(pstate, defel->location)));
> +               }
>         }
>
>         /*
> ----

Looks good.

>
> > For the name, I thought private_data might be a better candidate than
> > opaque, but I do not insist.
>
> I don't have a strong opinion for this. Here are the number
> of headers that use "private_data" and "opaque":
>
> $ grep -r private_data --files-with-matches src/include | wc -l
> 6
> $ grep -r opaque --files-with-matches src/include | wc -l
> 38
>
> It seems that we use "opaque" than "private_data" in general.
>
> but it seems that we use
> "opaque" than "private_data" in our code.
>
> > Do you use the arrow library to control the memory?
>
> Yes.
>
> >                                                     Is there a way that
> > we can let the arrow use postgres' memory context?
>
> Yes. Apache Arrow C++ provides a memory pool feature and we
> can implement PostgreSQL's memory context based memory pool
> for this. (But this is a custom COPY TO/FROM handler's
> implementation details.)
>
> >                                                    I'm not sure this
> > is necessary, just raise the question for discussion.
>
> Could you clarify what should we discuss? We should require
> that COPY TO/FROM handlers should use PostgreSQL's memory
> context for all internal memory allocations?

Yes, handlers should use PostgreSQL's memory context, and I think
creating other memory context under CopyToStateData.copycontext
should be suggested for handler creators, so I proposed exporting
CopyToStateData to public header.
>
> > +PG_FUNCTION_INFO_V1(copy_testfmt_handler);
> > +Datum
> > +copy_testfmt_handler(PG_FUNCTION_ARGS)
> > +{
> > + bool is_from = PG_GETARG_BOOL(0);
> > + CopyFormatRoutine *cp = makeNode(CopyFormatRoutine);;
> > +
> >
> > extra semicolon.
>
> I noticed it too :-)
> But I ignored it because the current implementation is only
> for discussion. We know that it may be dirty.
>
>
> Thanks,
> --
> kou



--
Regards
Junwang Zhao



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bertrand Drouvot
Дата:
Сообщение: Re: Synchronizing slots from primary to standby
Следующее
От: Bertrand Drouvot
Дата:
Сообщение: Re: Synchronizing slots from primary to standby