Re: COPY formatting

Поиск
Список
Период
Сортировка
От Karel Zak
Тема Re: COPY formatting
Дата
Msg-id 20040319105021.GB16735@zf.jcu.cz
обсуждение исходный текст
Ответ на Re: COPY formatting  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: COPY formatting  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Thu, Mar 18, 2004 at 10:16:36AM -0500, Tom Lane wrote:
> Passing in a relation OID is probably a bad idea anyway, as it ties this
> API to the assumption that COPY is only for complete relations.  There's
> been talk before of allowing a SELECT result to be presented via the
> COPY protocol, for instance.  What might be a more usable API is
> 
> COPY OUT:
>         function formatter_out(text[]) returns text
> COPY IN:
>         function formatter_in(text) returns text[]
> 
> where the text array is either the results of or the input to the
> per-column datatype I/O routines.  This makes it explicit that the
> formatter's job is solely to determine the column-level wrapping and
> unwrapping of the data.  I'm assuming here that there is no good reason
> for the formatter to care about the specific datatypes involved; can you
> give a counterexample?
The idea was put maximum information about tuple to formatter, and whatwill formatter do with this information is a
formatterproblem.
 

> >  It's pity  that main idea of  current COPY is based  on separated lines
> >  and it is not more common interface for streaming data between FE and BE.
> 
> Yeah, that was another concern I had.  This API would let the formatter
> control line-level layout but it would not eliminate the hard-wired
> significance of newline.  What's worse, there isn't any clean way to
> deal with reading quoted newlines --- the formatter can't really replace
> the default quoting rules if the low-level code is going to decide
> whether a newline is quoted or not.
I think latest  protocol version works with blocks of  data and no withlines and client PQputCopyData() returns a block
--only docs says thatit is row of table.
 

> We could possibly solve that by specifying that the text output or input
> (respectively) is the complete line sent to or from the client,
> including newline or whatever other line-level formatting you are using.
> This still leaves the problem of how the low-level COPY IN code knows
> what is a complete line to pass off to the formatter_in routine.  We
> could possibly fix this by adding a second input-control routine
> 
>     function formatter_linelength(text) returns integer
> 
> which is defined to return -1 if the input isn't a complete line yet
But  formatter_linelength()  will  need   some  context  information  Ithink. The others  words some  struct with
formatter specific internaldata. And  for more  difficult formats  like XML  you need  some otherscontext data (parser
data)too.
 
Maybe there can be some global  exported struct (like for triggers) andfunctions that is written in C  can use it. It
meansfor simple formatslike CSV you can  use non-C functions and for formats  like XML you canuse C functions. And  if
itwill intereting for PL  developers they canadd support for access to this structs to their languages.
 

> (i.e., read some more data, append to the buffer, and try again), or
> >= 0 to indicate that the first N bytes of the buffer represent a
> complete line to be passed off to formatter_in.  I don't see a way to
> combine formatter_in and formatter_linelength into a single function
> without relying on "out" parameters, which would again confine the
> feature to format functions written in C.

> It's a tad annoying that we need two functions for input.  One way that
> we could still keep the COPY option syntax to be just
>     FORMAT csv
> is to create an arbitrary difference in the signatures of the input
> functions.  Then we could have coexisting functions
>     csv(text[]) returns text
>     csv(text) returns text[]
>     csv(text, ...) returns int
> that are referenced by "FORMAT csv".
It sounds good, but I think we  both not full sure about it now, right?CSV support will probably better add by
DELIMITERextension.
 
   Karel

-- Karel Zak  <zakkr@zf.jcu.cz>http://home.zf.jcu.cz/~zakkr/


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Richard Huxton
Дата:
Сообщение: Question on restoring and compiled plans
Следующее
От: Fabien COELHO
Дата:
Сообщение: pg_advisor schema proof of concept