Обсуждение: RETURNING syntax for COPY

Поиск
Список
Период
Сортировка

RETURNING syntax for COPY

От
Karol Trzcionka
Дата:
Hello,
as a continuation of my proposal expanding RETURNING syntax by
AFTER/BEFORE, there can be enough time to implement RETURNING for COPY.
I'd like to hear your opinion on that. My draft idea is:
COPY FROM ... RETURNING table_name.* -> returns all values copied to
table after all triggers invoke
COPY FROM ... RETURNING BEFORE.* -> returns all values copied to table
before triggers (all raw values from file)
COPY TO ... RETURNING table_name.* -> returns all values from table
COPY TO ... RETURNING AFTER.*/BEFORE.* -> the same
What do you think about?
Regards,
Karol Trzcionka



Re: RETURNING syntax for COPY

От
Pavel Stehule
Дата:
Hello

why? What is motivation? What is use case?

Regards

Pavel


2013/5/8 Karol Trzcionka <karlikt@gmail.com>
Hello,
as a continuation of my proposal expanding RETURNING syntax by
AFTER/BEFORE, there can be enough time to implement RETURNING for COPY.
I'd like to hear your opinion on that. My draft idea is:
COPY FROM ... RETURNING table_name.* -> returns all values copied to
table after all triggers invoke
COPY FROM ... RETURNING BEFORE.* -> returns all values copied to table
before triggers (all raw values from file)
COPY TO ... RETURNING table_name.* -> returns all values from table
COPY TO ... RETURNING AFTER.*/BEFORE.* -> the same
What do you think about?
Regards,
Karol Trzcionka


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: RETURNING syntax for COPY

От
Tom Lane
Дата:
Karol Trzcionka <karlikt@gmail.com> writes:
> as a continuation of my proposal expanding RETURNING syntax by
> AFTER/BEFORE, there can be enough time to implement RETURNING for COPY.

No there isn't; what you suggest would require FE/BE protocol
extensions, making it several orders of magnitude more work than the
other thing.
        regards, tom lane



Re: RETURNING syntax for COPY

От
Heikki Linnakangas
Дата:
On 08.05.2013 19:44, Tom Lane wrote:
> Karol Trzcionka<karlikt@gmail.com>  writes:
>> as a continuation of my proposal expanding RETURNING syntax by
>> AFTER/BEFORE, there can be enough time to implement RETURNING for COPY.
>
> No there isn't; what you suggest would require FE/BE protocol
> extensions, making it several orders of magnitude more work than the
> other thing.

I'd imagine that the flow would go something like this:

BE    FE

CopyInResponseCopyDataCopyData...CopyDone
RowDescription
DataRow
DataRow
CommandComplete

A well-written client should be able to handle that. But if one expects 
the backend to always send a CommandComplete after CopyDone, it will get 
confused. Implementing that doesn't seem too difficult to me.

I agree that this is much more work than the UPDATE RETURNING 
BEFORE/AFTER, though. Not sure if that's a good or a bad thing.

- Heikki



Re: RETURNING syntax for COPY

От
Tom Lane
Дата:
Heikki Linnakangas <hlinnakangas@vmware.com> writes:
> On 08.05.2013 19:44, Tom Lane wrote:
>> No there isn't; what you suggest would require FE/BE protocol
>> extensions, making it several orders of magnitude more work than the
>> other thing.

> I'd imagine that the flow would go something like this:

> BE    FE

> CopyInResponse
>     CopyData
>     CopyData
>     ...
>     CopyDone
> RowDescription
> DataRow
> DataRow
> CommandComplete

That would require the backend to buffer the entire query response,
which isn't a great idea.  I would expect that such an operation would
need to interleave CopyData to the backend with DataRow responses.  Such
a thing could possibly be built on COPY_BOTH mode, but it would be a lot
of work (at both ends) for extremely debatable value.

The general idea of COPY is to load data as fast as possible, so weighing
it down with processing options seems like a pretty dubious idea even if
the implementation were easy.
        regards, tom lane



Re: RETURNING syntax for COPY

От
Dimitri Fontaine
Дата:
Karol Trzcionka <karlikt@gmail.com> writes:
> as a continuation of my proposal expanding RETURNING syntax by

What about implementing support for OLD/NEW in per-statement triggers? I
guess you would expose the data via a SRF.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support



Re: RETURNING syntax for COPY

От
Andrew Dunstan
Дата:
On 05/08/2013 01:16 PM, Tom Lane wrote:

> That would require the backend to buffer the entire query response,
> which isn't a great idea.  I would expect that such an operation would
> need to interleave CopyData to the backend with DataRow responses.  Such
> a thing could possibly be built on COPY_BOTH mode, but it would be a lot
> of work (at both ends) for extremely debatable value.
>
> The general idea of COPY is to load data as fast as possible, so weighing
> it down with processing options seems like a pretty dubious idea even if
> the implementation were easy.
>
>             


That was my reaction. I would look it this very skeptically.

cheers

andrew



Re: RETURNING syntax for COPY

От
"Jonathan S. Katz"
Дата:
On May 8, 2013, at 1:16 PM, Tom Lane wrote:

> Heikki Linnakangas <hlinnakangas@vmware.com> writes:
>> On 08.05.2013 19:44, Tom Lane wrote:
>>> No there isn't; what you suggest would require FE/BE protocol
>>> extensions, making it several orders of magnitude more work than the
>>> other thing.
>
>> I'd imagine that the flow would go something like this:
>
>> BE    FE
>
>> CopyInResponse
>>     CopyData
>>     CopyData
>>     ...
>>     CopyDone
>> RowDescription
>> DataRow
>> DataRow
>> CommandComplete
>
> That would require the backend to buffer the entire query response,
> which isn't a great idea.  I would expect that such an operation would
> need to interleave CopyData to the backend with DataRow responses.  Such
> a thing could possibly be built on COPY_BOTH mode, but it would be a lot
> of work (at both ends) for extremely debatable value.
>
> The general idea of COPY is to load data as fast as possible, so weighing
> it down with processing options seems like a pretty dubious idea even if
> the implementation were easy.

There are cases that I indeed want to load data very quickly, but I want to perform an operation on it immediately
after,e.g. removing bad data that was immediately added from that copy.  For instance, I do have this scenario: 

WITH new_data AS (COPY FROM ...RETURNING id, field_to_check
)
DELETE FROM table
USING new_data
WHEREtable.id = new_data.id ANDnew_data.field_to_check ~* 'bad data';

Now I can take care of that all in one step, and I know I'm only removing fields I just added.  This comes up when I am
importingexternal files from other sources where I may not necessarily want all of the rows or some of the rows contain
baddata. 

This also presumes that COPY works in a CTE, which I'm not sure it does (and I will do the TIAS test after I hit send
onthis message). 

Jonathan


Re: RETURNING syntax for COPY

От
David Fetter
Дата:
On Wed, May 08, 2013 at 01:16:14PM -0400, Tom Lane wrote:
> Heikki Linnakangas <hlinnakangas@vmware.com> writes:
> > On 08.05.2013 19:44, Tom Lane wrote:
> >> No there isn't; what you suggest would require FE/BE protocol
> >> extensions, making it several orders of magnitude more work than the
> >> other thing.
> 
> > I'd imagine that the flow would go something like this:
> 
> > BE    FE
> 
> > CopyInResponse
> >     CopyData
> >     CopyData
> >     ...
> >     CopyDone
> > RowDescription
> > DataRow
> > DataRow
> > CommandComplete
> 
> That would require the backend to buffer the entire query response,
> which isn't a great idea.  I would expect that such an operation would
> need to interleave CopyData to the backend with DataRow responses.  Such
> a thing could possibly be built on COPY_BOTH mode, but it would be a lot
> of work (at both ends) for extremely debatable value.
> 
> The general idea of COPY is to load data as fast as possible,

With utmost respect, that is one of several use cases, and any change
would need to keep that use case unburdened.  A sometimes overlapping
set of use cases move data in and out of the database in a simple
manner.  In some of these, people might wish to trade some performance
for the feature.

A particular example would be one where there are several tables to be
loaded, some with generated columns that the future ones would depend
on.  Yes, it's possible (kinda) to do this with the FDW machinery, but
the burden is much higher as it requires DDL permission in general
each time.

> so weighing it down with processing options seems like a pretty
> dubious idea even if the implementation were easy.

Totally agreed that the "fast load/unload" code path must not be
affected by any such changes.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: RETURNING syntax for COPY

От
Ryan Kelly
Дата:
On Wed, May 05/08/13, 2013 at 10:55:40AM -0700, David Fetter wrote:
> On Wed, May 08, 2013 at 01:16:14PM -0400, Tom Lane wrote:
> > Heikki Linnakangas <hlinnakangas@vmware.com> writes:
> > > On 08.05.2013 19:44, Tom Lane wrote:
> > >> No there isn't; what you suggest would require FE/BE protocol
> > >> extensions, making it several orders of magnitude more work than the
> > >> other thing.
> > 
> > > I'd imagine that the flow would go something like this:
> > 
> > > BE    FE
> > 
> > > CopyInResponse
> > >     CopyData
> > >     CopyData
> > >     ...
> > >     CopyDone
> > > RowDescription
> > > DataRow
> > > DataRow
> > > CommandComplete
> > 
> > That would require the backend to buffer the entire query response,
> > which isn't a great idea.  I would expect that such an operation would
> > need to interleave CopyData to the backend with DataRow responses.  Such
> > a thing could possibly be built on COPY_BOTH mode, but it would be a lot
> > of work (at both ends) for extremely debatable value.
> > 
> > The general idea of COPY is to load data as fast as possible,
> 
> With utmost respect, that is one of several use cases, and any change
> would need to keep that use case unburdened.  A sometimes overlapping
> set of use cases move data in and out of the database in a simple
> manner.  In some of these, people might wish to trade some performance
> for the feature.

99% of my uses at work for COPY are as a general data import and export
facility. I often find myself loading CSV files into our database for
analysis and further cleanup, and then use COPY to output queries as CSV
files for consumption by other members of the business.

The recent work for (PRE|POST)PROCESSOR options to COPY is indicative of
the fact that users are not merely using COPY to "load data as fast as
possible".

Other discussions around a COMPRESSED option are more than just a
performance enhancement, in my view, as I oftern receive files
compressed and decompressing the data is just another step standing in
the way of myself importing the data into the database.

Additionally, once I have the data imported, I often take many steps to
cleanup and format the data, prior to applying actual typing to a table
(which invariably fails due to invalid dates, and other nonsense).

COPY ... RETURNING would certainly be useful to apply additional
transformations to the data before finally sending it to its ultimate
destination.

> A particular example would be one where there are several tables to be
> loaded, some with generated columns that the future ones would depend
> on.  Yes, it's possible (kinda) to do this with the FDW machinery, but
> the burden is much higher as it requires DDL permission in general
> each time.

I find using the FDW machinery to perform many queries to be much slower
than importing the data once and then running my queries. There is also
no ability to use indexes.

> > so weighing it down with processing options seems like a pretty
> > dubious idea even if the implementation were easy.
> 
> Totally agreed that the "fast load/unload" code path must not be
> affected by any such changes.

Agreed here as well.

-Ryan P. Kelly




Re: RETURNING syntax for COPY

От
Stephen Frost
Дата:
* Ryan Kelly (rpkelly22@gmail.com) wrote:
> COPY ... RETURNING would certainly be useful to apply additional
> transformations to the data before finally sending it to its ultimate
> destination.

If we really think that COPY ... RETURNING is only going to be used in a
CTE or similar, then we could always only support that and forgo any
changes to the FE/BE protocol to support it.  Or, at least, take the
simplest approach to supporting it which would involve cacheing the data
entirely before sending it back to the client (isn't that what we do on
a big INSERT ... VALUES ... RETURNING anyway?  people can transfer in
blocks if they want to with INSERT .. VALUES or COPY .. RETURNING).
Thanks,
    Stephen

Re: RETURNING syntax for COPY

От
Jim Nasby
Дата:
On 5/8/13 12:54 PM, Jonathan S. Katz wrote:
> On May 8, 2013, at 1:16 PM, Tom Lane wrote:
>
>> Heikki Linnakangas <hlinnakangas@vmware.com> writes:
>>> On 08.05.2013 19:44, Tom Lane wrote:
>>>> No there isn't; what you suggest would require FE/BE protocol
>>>> extensions, making it several orders of magnitude more work than the
>>>> other thing.
>>
>>> I'd imagine that the flow would go something like this:
>>
>>> BE    FE
>>
>>> CopyInResponse
>>>     CopyData
>>>     CopyData
>>>     ...
>>>     CopyDone
>>> RowDescription
>>> DataRow
>>> DataRow
>>> CommandComplete
>>
>> That would require the backend to buffer the entire query response,
>> which isn't a great idea.  I would expect that such an operation would
>> need to interleave CopyData to the backend with DataRow responses.  Such
>> a thing could possibly be built on COPY_BOTH mode, but it would be a lot
>> of work (at both ends) for extremely debatable value.
>>
>> The general idea of COPY is to load data as fast as possible, so weighing
>> it down with processing options seems like a pretty dubious idea even if
>> the implementation were easy.
>
> There are cases that I indeed want to load data very quickly, but I want to perform an operation on it immediately
after,e.g. removing bad data that was immediately added from that copy.  For instance, I do have this scenario:
 
>
> WITH new_data AS (
>     COPY FROM ...
>     RETURNING id, field_to_check
> )
> DELETE FROM table
> USING new_data
> WHERE
>     table.id = new_data.id AND
>     new_data.field_to_check ~* 'bad data';
>
> Now I can take care of that all in one step, and I know I'm only removing fields I just added.  This comes up when I
amimporting external files from other sources where I may not necessarily want all of the rows or some of the rows
containbad data.
 
>
> This also presumes that COPY works in a CTE, which I'm not sure it does (and I will do the TIAS test after I hit send
onthis message).
 

What you're really asking for here is some kind of stream processing capability. There are spin-offs of Postgres that
providethat capability (I know Neil Conway worked on some). Those are geared completely around stream processing, but I
thinkit would be extremely interesting to provide some minimal support for that in community Postgres.
 

Using your use case as an example, something like this would be very interesting:

COPY table FROM ...  WHERE field_to_check !~* 'bad data'
;

In this case we're just applying a simple WHERE clause against each incoming row.

Perhaps what I'm suggesting could be implemented with a CTE, but I'm not sure it makes sense to do it the way you
propose,at least not initially. A CTE would provide so much flexibility that it'd be difficult for the optimizer to be
efficientabout it. Something like a WHERE clause directly on COPY would be a lot easier to handle. As someone
mentioned,FDW might be another option there.
 
-- 
Jim C. Nasby, Data Architect                       jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net



Re: RETURNING syntax for COPY

От
Jim Nasby
Дата:
On 5/8/13 12:33 PM, Dimitri Fontaine wrote:
> Karol Trzcionka <karlikt@gmail.com> writes:
>> as a continuation of my proposal expanding RETURNING syntax by
>
> What about implementing support for OLD/NEW in per-statement triggers? I
> guess you would expose the data via a SRF.

Per statement NEW/OLD is an interesting case, in that it shares some of the same challenges; namely how to store the
NEWand OLD recordsets efficiently. I've wondered if there'd be some way to do that by just storing a list of CTIDs (not
sureif that'd work with HOT for OLD though).
 

I still like the idea of being able to exclude certain records during COPY though; not writing a tuple will always be
moreefficient than creating one and then nuking it after the fact. There's a similar argument to be made about in-line
transformstoo.
 
-- 
Jim C. Nasby, Data Architect                       jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net



Re: RETURNING syntax for COPY

От
Andrew Dunstan
Дата:
On 05/08/2013 03:23 PM, Jim Nasby wrote:
>> WITH new_data AS (
>>     COPY FROM ...
>>     RETURNING id, field_to_check
>> )
>

Why is this better than this, which you can do today?
   WITH new_data AS (        INSERT into ... FROM foreign_table_with_file_fdw RETURNING ...   )


The whole reason I abandoned trying to do this sort of thing with COPY 
was that I realized the FDW would provide what I wanted.

cheers

andrew



Re: RETURNING syntax for COPY

От
Ryan Kelly
Дата:
On Wed, May 05/08/13, 2013 at 03:38:10PM -0400, Andrew Dunstan wrote:
> 
> On 05/08/2013 03:23 PM, Jim Nasby wrote:
> >>WITH new_data AS (
> >>    COPY FROM ...
> >>    RETURNING id, field_to_check
> >>)
> >
> 
> Why is this better than this, which you can do today?
> 
>    WITH new_data AS (
>         INSERT into ... FROM foreign_table_with_file_fdw RETURNING ...
>    )
> 
> 
> The whole reason I abandoned trying to do this sort of thing with
> COPY was that I realized the FDW would provide what I wanted.

You need to first CREATE EXTENSION file_fdw. Then you need to CREATE
SERVER. Then CREATE FOREIGN TABLE. Which requires appropriate permission
to do those things, and certainly has no hope of working on the client
side.

-Ryan P. Kelly