Re: COPY formatting
От | Tom Lane |
---|---|
Тема | Re: COPY formatting |
Дата | |
Msg-id | 9056.1079622996@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: COPY formatting (Karel Zak <zakkr@zf.jcu.cz>) |
Ответы |
Re: COPY formatting
(Karel Zak <zakkr@zf.jcu.cz>)
|
Список | pgsql-hackers |
Karel Zak <zakkr@zf.jcu.cz> writes: >> On Wed, Mar 17, 2004 at 11:02:38AM -0500, Tom Lane wrote: >>> Karel Zak <zakkr@zf.jcu.cz> writes: >>>> This seems like it could only reasonably be implemented as a C function. >> >> Why? I said it's pseudo code. It should use standard fmgr API like >> every other PostgreSQL function or is it problem and I overlook >> something? It must to support arbitrary programming language and not >> C only. Sure, but the question is whether the *stuff it has to do* can reasonably be coded in anything but C. Why are you passing in a relation OID, if not for lookups in relcache entries that are simply not accessible above the C level? (Don't tell me you want the function to do a bunch of actual SELECTs from system catalogs for every line of the copy...) Passing in a relation OID is probably a bad idea anyway, as it ties this API to the assumption that COPY is only for complete relations. There's been talk before of allowing a SELECT result to be presented via the COPY protocol, for instance. What might be a more usable API is COPY OUT: function formatter_out(text[]) returns text COPY IN: function formatter_in(text) returns text[] where the text array is either the results of or the input to the per-column datatype I/O routines. This makes it explicit that the formatter's job is solely to determine the column-level wrapping and unwrapping of the data. I'm assuming here that there is no good reason for the formatter to care about the specific datatypes involved; can you give a counterexample? > It's pity that main idea of current COPY is based on separated lines > and it is not more common interface for streaming data between FE and BE. Yeah, that was another concern I had. This API would let the formatter control line-level layout but it would not eliminate the hard-wired significance of newline. What's worse, there isn't any clean way to deal with reading quoted newlines --- the formatter can't really replace the default quoting rules if the low-level code is going to decide whether a newline is quoted or not. We could possibly solve that by specifying that the text output or input (respectively) is the complete line sent to or from the client, including newline or whatever other line-level formatting you are using. This still leaves the problem of how the low-level COPY IN code knows what is a complete line to pass off to the formatter_in routine. We could possibly fix this by adding a second input-control routine function formatter_linelength(text) returns integer which is defined to return -1 if the input isn't a complete line yet (i.e., read some more data, append to the buffer, and try again), or >= 0 to indicate that the first N bytes of the buffer represent a complete line to be passed off to formatter_in. I don't see a way to combine formatter_in and formatter_linelength into a single function without relying on "out" parameters, which would again confine the feature to format functions written in C. It's a tad annoying that we need two functions for input. One way that we could still keep the COPY option syntax to be justFORMAT csv is to create an arbitrary difference in the signatures of the input functions. Then we could have coexisting functionscsv(text[]) returns textcsv(text) returns text[]csv(text, ...) returnsint that are referenced by "FORMAT csv". regards, tom lane
В списке pgsql-hackers по дате отправления: