Re: Make COPY format extendable: Extract COPY TO format implementations
От | Junwang Zhao |
---|---|
Тема | Re: Make COPY format extendable: Extract COPY TO format implementations |
Дата | |
Msg-id | CAEG8a3J02NzGBxG1rP9C4u7qRLOqUjSOdy3q5_5v__fydS3XcA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Make COPY format extendable: Extract COPY TO format implementations (Sutou Kouhei <kou@clear-code.com>) |
Ответы |
Re: Make COPY format extendable: Extract COPY TO format implementations
(Sutou Kouhei <kou@clear-code.com>)
|
Список | pgsql-hackers |
Hi, On Wed, Jan 10, 2024 at 2:20 PM Sutou Kouhei <kou@clear-code.com> wrote: > > Hi, > > In <CAEG8a3+jG_NKOUmcxDyEX2xSggBXReZ4H=e3RFsUtedY88A03w@mail.gmail.com> > "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 22 Dec 2023 10:58:05 +0800, > Junwang Zhao <zhjwpku@gmail.com> wrote: > > >> 1. Add an opaque space for custom COPY TO handler > >> * Add CopyToState{Get,Set}Opaque() > >> https://github.com/kou/postgres/commit/5a610b6a066243f971e029432db67152cfe5e944 > >> > >> 2. Export CopyToState::attnumlist > >> * Add CopyToStateGetAttNumList() > >> https://github.com/kou/postgres/commit/15fcba8b4e95afa86edb3f677a7bdb1acb1e7688 > >> > >> 3. Export CopySend*() > >> * Rename CopySend*() to CopyToStateSend*() and export them > >> * Exception: CopySendEndOfRow() to CopyToStateFlush() because > >> it just flushes the internal buffer now. > >> https://github.com/kou/postgres/commit/289a5640135bde6733a1b8e2c412221ad522901e > >> > > I guess the purpose of these helpers is to avoid expose CopyToState to > > copy.h, > > Yes. > > > but I > > think expose CopyToState to user might make life easier, users might want to use > > the memory contexts of the structure (though I agree not all the > > fields are necessary > > for extension handers). > > OK. I don't object it as I said in another e-mail: > https://www.postgresql.org/message-id/flat/20240110.120644.1876591646729327180.kou%40clear-code.com#d923173e9625c20319750155083cbd72 > > >> 2. Need an opaque space like IndexScanDesc::opaque does > >> > >> * A custom COPY TO handler needs to keep its data > > > > I once thought users might want to parse their own options, maybe this > > is a use case for this opaque space. > > Good catch! I forgot to suggest a callback for custom format > options. How about the following API? > > ---- > ... > typedef bool (*CopyToProcessOption_function) (CopyToState cstate, DefElem *defel); > > ... > typedef bool (*CopyFromProcessOption_function) (CopyFromState cstate, DefElem *defel); > > typedef struct CopyToFormatRoutine > { > ... > CopyToProcessOption_function process_option_fn; > } CopyToFormatRoutine; > > typedef struct CopyFromFormatRoutine > { > ... > CopyFromProcessOption_function process_option_fn; > } CopyFromFormatRoutine; > ---- > > ---- > diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c > index e7597894bf..1aa8b62551 100644 > --- a/src/backend/commands/copy.c > +++ b/src/backend/commands/copy.c > @@ -416,6 +416,7 @@ void > ProcessCopyOptions(ParseState *pstate, > CopyFormatOptions *opts_out, > bool is_from, > + void *cstate, /* CopyToState* for !is_from, CopyFromState* for is_from */ > List *options) > { > bool format_specified = false; > @@ -593,11 +594,19 @@ ProcessCopyOptions(ParseState *pstate, > parser_errposition(pstate, defel->location))); > } > else > - ereport(ERROR, > - (errcode(ERRCODE_SYNTAX_ERROR), > - errmsg("option \"%s\" not recognized", > - defel->defname), > - parser_errposition(pstate, defel->location))); > + { > + bool processed; > + if (is_from) > + processed = opts_out->from_ops->process_option_fn(cstate, defel); > + else > + processed = opts_out->to_ops->process_option_fn(cstate, defel); > + if (!processed) > + ereport(ERROR, > + (errcode(ERRCODE_SYNTAX_ERROR), > + errmsg("option \"%s\" not recognized", > + defel->defname), > + parser_errposition(pstate, defel->location))); > + } > } > > /* > ---- Looks good. > > > For the name, I thought private_data might be a better candidate than > > opaque, but I do not insist. > > I don't have a strong opinion for this. Here are the number > of headers that use "private_data" and "opaque": > > $ grep -r private_data --files-with-matches src/include | wc -l > 6 > $ grep -r opaque --files-with-matches src/include | wc -l > 38 > > It seems that we use "opaque" than "private_data" in general. > > but it seems that we use > "opaque" than "private_data" in our code. > > > Do you use the arrow library to control the memory? > > Yes. > > > Is there a way that > > we can let the arrow use postgres' memory context? > > Yes. Apache Arrow C++ provides a memory pool feature and we > can implement PostgreSQL's memory context based memory pool > for this. (But this is a custom COPY TO/FROM handler's > implementation details.) > > > I'm not sure this > > is necessary, just raise the question for discussion. > > Could you clarify what should we discuss? We should require > that COPY TO/FROM handlers should use PostgreSQL's memory > context for all internal memory allocations? Yes, handlers should use PostgreSQL's memory context, and I think creating other memory context under CopyToStateData.copycontext should be suggested for handler creators, so I proposed exporting CopyToStateData to public header. > > > +PG_FUNCTION_INFO_V1(copy_testfmt_handler); > > +Datum > > +copy_testfmt_handler(PG_FUNCTION_ARGS) > > +{ > > + bool is_from = PG_GETARG_BOOL(0); > > + CopyFormatRoutine *cp = makeNode(CopyFormatRoutine);; > > + > > > > extra semicolon. > > I noticed it too :-) > But I ignored it because the current implementation is only > for discussion. We know that it may be dirty. > > > Thanks, > -- > kou -- Regards Junwang Zhao
В списке pgsql-hackers по дате отправления: