Re: COPY Transform support

Поиск
Список
Период
Сортировка
От Gregory Stark
Тема Re: COPY Transform support
Дата
Msg-id 87sky2g8ea.fsf@oxford.xeocode.com
обсуждение исходный текст
Ответ на Re: COPY Transform support  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: COPY Transform support  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> Dimitri Fontaine <dfontaine@hi-media.com> writes:
>> Here's a proposal for COPY to support the T part of an ETL, that is adding the 
>> capability for COPY FROM to Transform the data it gets.
>
>> The idea is quite simple: adding to COPY FROM the option to run a function on 
>> the data before to call datatype_in functions.
>
> The major concern I have about this is to ensure that no detectable
> overhead is added to COPY when the feature isn't being used.
>
> I am not actually convinced that the column-by-column design you seem to
> have in mind is worth anything.  The examples that I remember seeing
> often involve removing columns, generating one column from multiple ones
> or vice versa, dealing with nonstandard column delimiters, etc.  What
> would makes sense in my mind is a single function taking and returning
> text, which is invoked once on each complete input line before it is
> broken into fields.

I think not having to deal with separating fields is actually one of the few
reasons to do this within COPY. If you can separate out yourself or need to do
something more clever than COPY is capable of to split the columns then you're
better off preprocessing it with perl or something anyways.

To that end all the other use cases you describe could be handled with his
plan. There's nothing stopping you from doing

CREATE READER foo (a integer, b integer)
INSERT INTO b (SELECT a+b FROM foo);
or
INSERT INTO b (SELECT 1, a, b, greatest(a,b) FROM foo)

However I'm not sure we even need new syntax for CREATE READER. I would think
something like this would make more sense:

CREATE FUNCTION transform(integer, integer) RETURNS SETOF b;

COPY b FROM 'foo' USING transform(integer,integer);

> So the whole thing seems just marginally attractive to me.

Everything about ETL is only marginally attractive, but it's something people
spend a lot of time doing. Nobody's come up with any particularly clean
solutions I think.

AFAIK the state of the art is actually to load the data into a table which
closely matches the source material, sometimes just columns of text. Then copy
it all to another table doing transformations. Not impressed.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services!


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Dunstan
Дата:
Сообщение: Re: [GENERAL] SHA1 on postgres 8.3
Следующее
От: "D'Arcy J.M. Cain"
Дата:
Сообщение: Re: modules