Re: COPY formatting

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: COPY formatting
Дата
Msg-id 9056.1079622996@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: COPY formatting  (Karel Zak <zakkr@zf.jcu.cz>)
Ответы Re: COPY formatting  (Karel Zak <zakkr@zf.jcu.cz>)
Список pgsql-hackers
Karel Zak <zakkr@zf.jcu.cz> writes:
>> On Wed, Mar 17, 2004 at 11:02:38AM -0500, Tom Lane wrote:
>>> Karel Zak <zakkr@zf.jcu.cz> writes:
>>>> This seems like it could only reasonably be implemented as a C function.
>> 
>> Why? I said it's pseudo code. It should use standard fmgr API like
>> every other PostgreSQL function or is it problem and I overlook
>> something? It must to support arbitrary programming language and not
>> C only.

Sure, but the question is whether the *stuff it has to do* can
reasonably be coded in anything but C.  Why are you passing in a
relation OID, if not for lookups in relcache entries that are simply
not accessible above the C level?  (Don't tell me you want the function
to do a bunch of actual SELECTs from system catalogs for every line
of the copy...)

Passing in a relation OID is probably a bad idea anyway, as it ties this
API to the assumption that COPY is only for complete relations.  There's
been talk before of allowing a SELECT result to be presented via the
COPY protocol, for instance.  What might be a more usable API is

COPY OUT:    function formatter_out(text[]) returns text
COPY IN:    function formatter_in(text) returns text[]

where the text array is either the results of or the input to the
per-column datatype I/O routines.  This makes it explicit that the
formatter's job is solely to determine the column-level wrapping and
unwrapping of the data.  I'm assuming here that there is no good reason
for the formatter to care about the specific datatypes involved; can you
give a counterexample?

>  It's pity  that main idea of  current COPY is based  on separated lines
>  and it is not more common interface for streaming data between FE and BE.

Yeah, that was another concern I had.  This API would let the formatter
control line-level layout but it would not eliminate the hard-wired
significance of newline.  What's worse, there isn't any clean way to
deal with reading quoted newlines --- the formatter can't really replace
the default quoting rules if the low-level code is going to decide
whether a newline is quoted or not.

We could possibly solve that by specifying that the text output or input
(respectively) is the complete line sent to or from the client,
including newline or whatever other line-level formatting you are using.
This still leaves the problem of how the low-level COPY IN code knows
what is a complete line to pass off to the formatter_in routine.  We
could possibly fix this by adding a second input-control routine
function formatter_linelength(text) returns integer

which is defined to return -1 if the input isn't a complete line yet
(i.e., read some more data, append to the buffer, and try again), or
>= 0 to indicate that the first N bytes of the buffer represent a
complete line to be passed off to formatter_in.  I don't see a way to
combine formatter_in and formatter_linelength into a single function
without relying on "out" parameters, which would again confine the
feature to format functions written in C.

It's a tad annoying that we need two functions for input.  One way that
we could still keep the COPY option syntax to be justFORMAT csv
is to create an arbitrary difference in the signatures of the input
functions.  Then we could have coexisting functionscsv(text[]) returns textcsv(text) returns text[]csv(text, ...)
returnsint
 
that are referenced by "FORMAT csv".
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: COPY formatting
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Problem on cluster initialization