Re: COPY Transform support
| От | Tom Lane | 
|---|---|
| Тема | Re: COPY Transform support | 
| Дата | |
| Msg-id | 29848.1207234826@sss.pgh.pa.us обсуждение исходный текст | 
| Ответ на | COPY Transform support (Dimitri Fontaine <dfontaine@hi-media.com>) | 
| Ответы | Re: COPY Transform support Re: COPY Transform support | 
| Список | pgsql-hackers | 
Dimitri Fontaine <dfontaine@hi-media.com> writes:
> Here's a proposal for COPY to support the T part of an ETL, that is adding the 
> capability for COPY FROM to Transform the data it gets.
> The idea is quite simple: adding to COPY FROM the option to run a function on 
> the data before to call datatype_in functions.
The major concern I have about this is to ensure that no detectable
overhead is added to COPY when the feature isn't being used.
I am not actually convinced that the column-by-column design you seem to
have in mind is worth anything.  The examples that I remember seeing
often involve removing columns, generating one column from multiple ones
or vice versa, dealing with nonstandard column delimiters, etc.  What
would makes sense in my mind is a single function taking and returning
text, which is invoked once on each complete input line before it is
broken into fields.
This is, of course, just a substitute for running a sed or perl or
similar script over the data before feeding it to COPY --- and probably
not an amazingly good substitute at that.  For instance, assuming you
like perl for text-wrangling, I'd fully expect the function approach
to be slower than an external script because of the large overhead of
getting into and out of libperl for each line,
In situations where it's actually useful to apply SQL functions rather
than text-mangling operations to the data, you always have the option to
COPY into a temp table and then do INSERT/SELECT from there.
So the whole thing seems just marginally attractive to me.
        regards, tom lane
		
	В списке pgsql-hackers по дате отправления: