Обсуждение: COPY FROM WHEN condition

Поиск
Список
Период
Сортировка

COPY FROM WHEN condition

От
Surafel Temesgen
Дата:

Hello,

Currently we can not moves data from a file to a table based on some condition on a certain column but I think there are many use case for it that worth supporting. Attache is a patch for escaping a row that does not satisfy WHEN condition from inserting into a table and its work on the top of commit b68ff3ea672c06

and the syntax is

COPY table_name [ ( column_name [, ...] ) ]

FROM { 'filename' | PROGRAM 'command' | STDIN }

[ [ WITH ] ( option [, ...] ) ]

[ WHEN condition ]

comment ?

Regards

Surafel

Вложения

Re: COPY FROM WHEN condition

От
Christoph Moench-Tegeder
Дата:
## Surafel Temesgen (surafel3000@gmail.com):

> Currently we can not moves data from a file to a table based on some
> condition on a certain column

You can:
  COPY ( query ) TO 'filename';

There's even an example in the documentation:
https://www.postgresql.org/docs/10/static/sql-copy.html
"To copy into a file just the countries whose names start with 'A':
 COPY (SELECT * FROM country WHERE country_name LIKE 'A%') TO '/usr1/proj/bray/sql/a_list_countries.copy';"

Regards,
Christoph

-- 
Spare Space.


Re: COPY FROM WHEN condition

От
Surafel Temesgen
Дата:


On Thu, Oct 11, 2018 at 12:00 PM Christoph Moench-Tegeder <cmt@burggraben.net> wrote:
You can:
  COPY ( query ) TO 'filename';
it is for COPY FROM

regards
Surafel

Re: COPY FROM WHEN condition

От
Corey Huinker
Дата:
On Thu, Oct 11, 2018 at 5:04 AM Surafel Temesgen <surafel3000@gmail.com> wrote:


On Thu, Oct 11, 2018 at 12:00 PM Christoph Moench-Tegeder <cmt@burggraben.net> wrote:
You can:
  COPY ( query ) TO 'filename';
it is for COPY FROM

regards
Surafel

It didn't get far, but you may want to take a look at a rejected patch for copy_srf() (set returning function)

Having a set returning function gives you the full expressiveness of SQL, at the cost of an extra materialization step.


 

Re: COPY FROM WHEN condition

От
David Fetter
Дата:
On Thu, Oct 11, 2018 at 05:12:48AM -0400, Corey Huinker wrote:
> On Thu, Oct 11, 2018 at 5:04 AM Surafel Temesgen <surafel3000@gmail.com>
> wrote:
> 
> >
> >
> > On Thu, Oct 11, 2018 at 12:00 PM Christoph Moench-Tegeder <
> > cmt@burggraben.net> wrote:
> >
> >> You can:
> >>   COPY ( query ) TO 'filename';
> >>
> > it is for COPY FROM
> >
> > regards
> > Surafel
> >
> 
> It didn't get far, but you may want to take a look at a rejected patch for
> copy_srf() (set returning function)
> https://www.postgresql.org/message-id/CADkLM%3DdoeiWQX4AGtDNG4PsWfSXz3ai7kY%3DPZm3sUhsUeev9Bg%40mail.gmail.com
> https://commitfest.postgresql.org/12/869/
> 
> Having a set returning function gives you the full expressiveness of SQL,
> at the cost of an extra materialization step.

I wonder whether something JIT-like could elide this. A very
interesting subset of such WHEN clauses could be pretty
straight-forward to implement in a pretty efficient way.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


RE: COPY FROM WHEN condition

От
임명규
Дата:
Hello,

I want test this patch, but can't apply it.

(master)$ git apply copy_from_when_con_v1.patch
error: patch failed: src/backend/commands/copy.c:849
error: src/backend/commands/copy.c: patch does not apply

fix or another way, let me know.

Best regards,
Myungkyu, Lim



Re: COPY FROM WHEN condition

От
Surafel Temesgen
Дата:


On Fri, Oct 26, 2018 at 9:09 AM 임명규 <myungkyu.lim@samsung.com> wrote:
Hello,

I want test this patch, but can't apply it.

(master)$ git apply copy_from_when_con_v1.patch
i use patch and it work
try (master)$ patch -p1 < copy_from_when_con_v1.patch
regards 
Surafel 

Re: COPY FROM WHEN condition

От
Tomas Vondra
Дата:
Hi,

I've taken a quick look at this on the way back from pgconf.eu, and it
seems like a nice COPY improvement in a fairly good shape.

Firstly, I think it's a valuable because it allows efficiently importing
a subset of data. Currently, we either have to create an intermediate
table, copy all the data into that, do the filtering, and then insert
the subset into the actual target table. Another common approach that I
see in practice is using file_fdw to do this, but it's not particularly
straight-forward (having to create the FDW servers etc. first) not
efficient (particularly for large amounts of data). This feature allows
skipping this extra step (at least in simpler ETL cases).

So, I like the idea and I think it makes sense.


A couple of comments regarding the code/docs:


1) I think this deserves at least some regression tests. Plenty of tests
already use COPY, but there's no coverage for the new piece. So let's
add a new test suite, or maybe add a couple of tests into copy2.sql.


2) In copy.sqml, the new item is defined like this

    <term><literal>WHEN Clause</literal></term>

I suggest we use just <term><literal>WHEN</literal></term>, that's what
the other items do (see ENCODING).

The other thing is that this does not say what expressions are allowed
in the WHEN clause. It seems pretty close to WHEN clause for triggers,
which e.g. mentions that subselects are not allowed. I'm pretty sure
that's true here too, because it fails like this (118 = T_SubLink):

    test=# copy t(a,b,c) from '/tmp/t.data' when ((select 1) < 10);
    ERROR:  unrecognized node type: 118

So, the patch needs to detect this, produce a reasonable error message
and document the limitations in copy.sqml, just like we do for CREATE
TRIGGER.


3) For COPY TO, the WHEN clause is accepted but ignored, leading to
confusing cases like this:

    test=# copy t(a,b,c) to '/tmp/t.data' when ((select 100) < 10);
    COPY 151690

So, it contains subselect, but unlike COPY FROM it does not fail
(because we never execute it). The fun part is that the expression is
logically false, so a user might expect it to filter rows, yet we copy
everything.

IMHO we need to either error-out in these cases, complaining about WHEN
not being supported for COPY TO, or make it work (effectively treating
it as a simpler alternative to COPY (subselect) TO).


4) There are some minor code style issues in copy.c - the variable is
misnamed as when_cluase, there are no spaces after 'if' etc. See the
attached patch fixing this.

AFAICS we could just get rid of the extra when_cluase variable and mess
with the cstate->whenClause directly, depending on how (3) gets fixed.


5) As I mentioned, the CREATE TRIGGER already has WHEN clause, but it
requires it to be 'WHEN (expr)'. I suggest we do the same thing here,
requiring the parentheses.


6) The skip logic in CopyFrom() seems to be slightly wrong. It does
work, but the next_record label is defined after CHECK_FOR_INTERRUPTS()
so a COPY will not respond to Ctrl-C unless it finds a row matching the
WHEN condition. If you have a highly selective condition, that's a bit
inconvenient.

It also skips

    MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));

so I wonder what the heap_form_tuple() right after the next_record label
will use for tuples right after a skipped one. I'd bet it'll use the
oldcontext (essentially the long-lived context), essentially making it
a memory leak.

So I suggest to get rid of the next_record label, and use 'continue'
instead of the 'goto next_record'.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

RE: COPY FROM WHEN condition

От
"myungkyu.lim"
Дата:
Hello,

Basically, this patch worked very well in my tests.

> 3) For COPY TO, the WHEN clause is accepted but ignored, leading to confusing cases like this:

I found same issue.

    postgres=# copy t1 to '/home/lmk/t1.data' when c1 < 1;

In the 'COPY TO' statement, 'WHEN clause' does not do any extra work.
(include error or hint)

> 4) There are some minor code style issues in copy.c - the variable is misnamed as when_cluase, there are no spaces
after'if' etc. See the attached patch fixing this. 

It is recommended to use 'pg tool' when you finish development.

    src/tools/pgindent/pgindent

    pgindent is used to fix the source code style to conform to pg standards.

Best regards,
Myungkyu, Lim



Re: COPY FROM WHEN condition

От
Surafel Temesgen
Дата:

Hi,
Thank you for looking at it .
On Sun, Oct 28, 2018 at 7:19 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: 

1) I think this deserves at least some regression tests. Plenty of tests
already use COPY, but there's no coverage for the new piece. So let's
add a new test suite, or maybe add a couple of tests into copy2.sql.


2) In copy.sqml, the new item is defined like this

    <term><literal>WHEN Clause</literal></term>

I suggest we use just <term><literal>WHEN</literal></term>, that's what
the other items do (see ENCODING).

The other thing is that this does not say what expressions are allowed
in the WHEN clause. It seems pretty close to WHEN clause for triggers,
which e.g. mentions that subselects are not allowed. I'm pretty sure
that's true here too, because it fails like this (118 = T_SubLink):

    test=# copy t(a,b,c) from '/tmp/t.data' when ((select 1) < 10);
    ERROR:  unrecognized node type: 118

So, the patch needs to detect this, produce a reasonable error message
and document the limitations in copy.sqml, just like we do for CREATE
TRIGGER.
fixed 

3) For COPY TO, the WHEN clause is accepted but ignored, leading to
confusing cases like this:

    test=# copy t(a,b,c) to '/tmp/t.data' when ((select 100) < 10);
    COPY 151690

So, it contains subselect, but unlike COPY FROM it does not fail
(because we never execute it). The fun part is that the expression is
logically false, so a user might expect it to filter rows, yet we copy
everything.

IMHO we need to either error-out in these cases, complaining about WHEN
not being supported for COPY TO, or make it work (effectively treating
it as a simpler alternative to COPY (subselect) TO).

English is not my first language but I chose error-out because WHEN condition for COPY TO seems to me semantically incorrect


AFAICS we could just get rid of the extra when_cluase variable and mess
with the cstate->whenClause directly, depending on how (3) gets fixed.

I did it this way because CopyState structure memory allocate and initialize in BeginCopyFrom but the analysis done before it


5) As I mentioned, the CREATE TRIGGER already has WHEN clause, but it
requires it to be 'WHEN (expr)'. I suggest we do the same thing here,
requiring the parentheses.


6) The skip logic in CopyFrom() seems to be slightly wrong. It does
work, but the next_record label is defined after CHECK_FOR_INTERRUPTS()
so a COPY will not respond to Ctrl-C unless it finds a row matching the
WHEN condition. If you have a highly selective condition, that's a bit
inconvenient.

It also skips

    MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));

so I wonder what the heap_form_tuple() right after the next_record label
will use for tuples right after a skipped one. I'd bet it'll use the
oldcontext (essentially the long-lived context), essentially making it
a memory leak.

So I suggest to get rid of the next_record label, and use 'continue'
instead of the 'goto next_record'.

fixed 
regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Вложения

Re: COPY FROM WHEN condition

От
Masahiko Sawada
Дата:
On Tue, Oct 30, 2018 at 11:47 PM Surafel Temesgen <surafel3000@gmail.com> wrote:
>
>
> Hi,
> Thank you for looking at it .
> On Sun, Oct 28, 2018 at 7:19 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>>
>>
>> 1) I think this deserves at least some regression tests. Plenty of tests
>> already use COPY, but there's no coverage for the new piece. So let's
>> add a new test suite, or maybe add a couple of tests into copy2.sql.
>>
>>
>> 2) In copy.sqml, the new item is defined like this
>>
>>     <term><literal>WHEN Clause</literal></term>
>>
>> I suggest we use just <term><literal>WHEN</literal></term>, that's what
>> the other items do (see ENCODING).
>>
>> The other thing is that this does not say what expressions are allowed
>> in the WHEN clause. It seems pretty close to WHEN clause for triggers,
>> which e.g. mentions that subselects are not allowed. I'm pretty sure
>> that's true here too, because it fails like this (118 = T_SubLink):
>>
>>     test=# copy t(a,b,c) from '/tmp/t.data' when ((select 1) < 10);
>>     ERROR:  unrecognized node type: 118
>>
>> So, the patch needs to detect this, produce a reasonable error message
>> and document the limitations in copy.sqml, just like we do for CREATE
>> TRIGGER.
>
> fixed
>>
>>
>> 3) For COPY TO, the WHEN clause is accepted but ignored, leading to
>> confusing cases like this:
>>
>>     test=# copy t(a,b,c) to '/tmp/t.data' when ((select 100) < 10);
>>     COPY 151690
>>
>> So, it contains subselect, but unlike COPY FROM it does not fail
>> (because we never execute it). The fun part is that the expression is
>> logically false, so a user might expect it to filter rows, yet we copy
>> everything.
>>
>> IMHO we need to either error-out in these cases, complaining about WHEN
>> not being supported for COPY TO, or make it work (effectively treating
>> it as a simpler alternative to COPY (subselect) TO).
>
> English is not my first language but I chose error-out because WHEN condition for COPY TO seems to me semantically
incorrect
>>
>>
>> AFAICS we could just get rid of the extra when_cluase variable and mess
>> with the cstate->whenClause directly, depending on how (3) gets fixed.
>
> I did it this way because CopyState structure memory allocate and initialize in BeginCopyFrom but the analysis done
beforeit
 
>>
>>
>> 5) As I mentioned, the CREATE TRIGGER already has WHEN clause, but it
>> requires it to be 'WHEN (expr)'. I suggest we do the same thing here,
>> requiring the parentheses.
>>
>>
>> 6) The skip logic in CopyFrom() seems to be slightly wrong. It does
>> work, but the next_record label is defined after CHECK_FOR_INTERRUPTS()
>> so a COPY will not respond to Ctrl-C unless it finds a row matching the
>> WHEN condition. If you have a highly selective condition, that's a bit
>> inconvenient.
>>
>> It also skips
>>
>>     MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
>>
>> so I wonder what the heap_form_tuple() right after the next_record label
>> will use for tuples right after a skipped one. I'd bet it'll use the
>> oldcontext (essentially the long-lived context), essentially making it
>> a memory leak.
>>
>> So I suggest to get rid of the next_record label, and use 'continue'
>> instead of the 'goto next_record'.
>>
> fixed
>>

I've looked at this patch and tested.

When I use a function returning string in WHEN clause I got the following error:

=# copy test from '/tmp/aaa.csv' (format 'csv') when (lower(t) = 'hello');
ERROR:  could not determine which collation to use for lower() function
HINT:  Use the COLLATE clause to set the collation explicitly.
CONTEXT:  COPY hoge, line 1: "1,hoge,2018-01-01"

And then although I specified COLLATE I got an another error (127 =
T_CollateExpr):

=# copy test from '/tmp/aaa.csv' (format 'csv') when (lower(t) collate
"en_US" = 'hello');
ERROR:  unrecognized node type: 127

This error doesn't happen if I put the similar condition in WHEN
clause for triggers. I think the patch needs to produce a reasonable
error message.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: COPY FROM WHEN condition

От
"Nasby, Jim"
Дата:
On Oct 11, 2018, at 10:35 AM, David Fetter <david@fetter.org> wrote:
> 
>> It didn't get far, but you may want to take a look at a rejected patch for
>> copy_srf() (set returning function)
>> https://www.postgresql.org/message-id/CADkLM%3DdoeiWQX4AGtDNG4PsWfSXz3ai7kY%3DPZm3sUhsUeev9Bg%40mail.gmail.com
>> https://commitfest.postgresql.org/12/869/
>> 
>> Having a set returning function gives you the full expressiveness of SQL,
>> at the cost of an extra materialization step.
> 
> I wonder whether something JIT-like could elide this. A very
> interesting subset of such WHEN clauses could be pretty
> straight-forward to implement in a pretty efficient way.

Are you thinking something like having a COPY command that provides results in such a way that they could be referenced
ina FROM clause (perhaps a COPY that defines a cursor…)? 

Re: COPY FROM WHEN condition

От
David Fetter
Дата:
On Wed, Oct 31, 2018 at 11:21:33PM +0000, Nasby, Jim wrote:
> On Oct 11, 2018, at 10:35 AM, David Fetter <david@fetter.org> wrote:
> > 
> >> It didn't get far, but you may want to take a look at a rejected patch for
> >> copy_srf() (set returning function)
> >> https://www.postgresql.org/message-id/CADkLM%3DdoeiWQX4AGtDNG4PsWfSXz3ai7kY%3DPZm3sUhsUeev9Bg%40mail.gmail.com
> >> https://commitfest.postgresql.org/12/869/
> >> 
> >> Having a set returning function gives you the full expressiveness of SQL,
> >> at the cost of an extra materialization step.
> > 
> > I wonder whether something JIT-like could elide this. A very
> > interesting subset of such WHEN clauses could be pretty
> > straight-forward to implement in a pretty efficient way.
> 
> Are you thinking something like having a COPY command that provides
> results in such a way that they could be referenced in a FROM clause
> (perhaps a COPY that defines a cursor…)?

That would also be nice, but what I was thinking of was that some
highly restricted subset of cases of SQL in general could lend
themselves to levels of optimization that would be impractical in
other contexts.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: COPY FROM WHEN condition

От
Corey Huinker
Дата:
> Are you thinking something like having a COPY command that provides
> results in such a way that they could be referenced in a FROM clause
> (perhaps a COPY that defines a cursor…)?

That would also be nice, but what I was thinking of was that some
highly restricted subset of cases of SQL in general could lend
themselves to levels of optimization that would be impractical in
other contexts.

If COPY (or a syntactical equivalent) can return a result set, then the whole of SQL is available to filter and aggregate the results and we don't have to invent new syntax, or endure confusion whenCOPY-WHEN syntax behaves subtly different from a similar FROM-WHERE.

Also, what would we be saving computationally? The whole file (or program output) has to be consumed no matter what, the columns have to be parsed no matter what. At least some of the columns have to be converted to their assigned datatypes enough to know whether or not to filter the row, but we might be able push that logic inside a copy. I'm thinking of something like this:

SELECT x.a, sum(x.b)
FROM ( COPY INLINE '/path/to/foo.txt' FORMAT CSV ) as x( a integer, b numeric, c text, d date, e json) )
WHERE x.d >= '2018-11-01'

In this case, there is the opportunity to see the following optimizations:
- columns c and e are never referenced, and need never be turned into a datum (though we might do so just to confirm that they conform to the data type)
- if column d is converted first, we can filter on it and avoid converting columns a,b
- whatever optimizations we can infer from knowing that the two surviving columns will go directly into an aggregate

If we go this route, we can train the planner to notice other optimizations and add those mechanisms at that time, and then existing code gets faster.

If we go the COPY-WHEN route, then we have to make up new syntax for every possible future optimization.

Re: COPY FROM WHEN condition

От
Pavel Stehule
Дата:


pá 2. 11. 2018 v 3:57 odesílatel Corey Huinker <corey.huinker@gmail.com> napsal:
> Are you thinking something like having a COPY command that provides
> results in such a way that they could be referenced in a FROM clause
> (perhaps a COPY that defines a cursor…)?

That would also be nice, but what I was thinking of was that some
highly restricted subset of cases of SQL in general could lend
themselves to levels of optimization that would be impractical in
other contexts.

If COPY (or a syntactical equivalent) can return a result set, then the whole of SQL is available to filter and aggregate the results and we don't have to invent new syntax, or endure confusion whenCOPY-WHEN syntax behaves subtly different from a similar FROM-WHERE.

Also, what would we be saving computationally? The whole file (or program output) has to be consumed no matter what, the columns have to be parsed no matter what. At least some of the columns have to be converted to their assigned datatypes enough to know whether or not to filter the row, but we might be able push that logic inside a copy. I'm thinking of something like this:

SELECT x.a, sum(x.b)
FROM ( COPY INLINE '/path/to/foo.txt' FORMAT CSV ) as x( a integer, b numeric, c text, d date, e json) )
WHERE x.d >= '2018-11-01'

Without some special feature this example is not extra useful. It is based on copy on server that can use only super user with full rights.

What should be benefit of this feature?

Regards

Pavel


In this case, there is the opportunity to see the following optimizations:
- columns c and e are never referenced, and need never be turned into a datum (though we might do so just to confirm that they conform to the data type)
- if column d is converted first, we can filter on it and avoid converting columns a,b
- whatever optimizations we can infer from knowing that the two surviving columns will go directly into an aggregate

If we go this route, we can train the planner to notice other optimizations and add those mechanisms at that time, and then existing code gets faster.

If we go the COPY-WHEN route, then we have to make up new syntax for every possible future optimization.

Re: COPY FROM WHEN condition

От
Surafel Temesgen
Дата:
hi,

On Wed, Oct 31, 2018 at 10:54 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Tue, Oct 30, 2018 at 11:47 PM Surafel Temesgen <surafel3000@gmail.com> wrote:

I've looked at this patch and tested.

When I use a function returning string in WHEN clause I got the following error:

=# copy test from '/tmp/aaa.csv' (format 'csv') when (lower(t) = 'hello');
ERROR:  could not determine which collation to use for lower() function
HINT:  Use the COLLATE clause to set the collation explicitly.
CONTEXT:  COPY hoge, line 1: "1,hoge,2018-01-01"

And then although I specified COLLATE I got an another error (127 =
T_CollateExpr):

=# copy test from '/tmp/aaa.csv' (format 'csv') when (lower(t) collate
"en_US" = 'hello');
ERROR:  unrecognized node type: 127

This error doesn't happen if I put the similar condition in WHEN
clause for triggers. I think the patch needs to produce a reasonable
error message.

The attached patch include a fix for it .can you confirm it
regards
Surafel
Вложения

Re: COPY FROM WHEN condition

От
"Daniel Verite"
Дата:
    Pavel Stehule wrote:

> > SELECT x.a, sum(x.b)
> > FROM ( COPY INLINE '/path/to/foo.txt' FORMAT CSV ) as x( a integer, b
> > numeric, c text, d date, e json) )
> > WHERE x.d >= '2018-11-01'
> >
> >
> Without some special feature this example is not extra useful. It is based
> on copy on server that can use only super user with full rights.

And if superuser, one might use the file data wrapper [1]  to get
the same results without the need for the explicit COPY in the query.

BTW, this brings into question whether the [WHEN condition] clause
should be included in the options of file_fdw, as it supports pretty
much all COPY options.

Also, psql's \copy should gain the option too?


[1] https://www.postgresql.org/docs/current/static/file-fdw.html


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite


Re: COPY FROM WHEN condition

От
Tomas Vondra
Дата:

On 11/02/2018 03:57 AM, Corey Huinker wrote:
>     > Are you thinking something like having a COPY command that provides
>     > results in such a way that they could be referenced in a FROM clause
>     > (perhaps a COPY that defines a cursor…)?
> 
>     That would also be nice, but what I was thinking of was that some
>     highly restricted subset of cases of SQL in general could lend
>     themselves to levels of optimization that would be impractical in
>     other contexts.
> 
> 
> If COPY (or a syntactical equivalent) can return a result set, then the
> whole of SQL is available to filter and aggregate the results and we
> don't have to invent new syntax, or endure confusion whenCOPY-WHEN
> syntax behaves subtly different from a similar FROM-WHERE.
> 
> Also, what would we be saving computationally? The whole file (or
> program output) has to be consumed no matter what, the columns have to
> be parsed no matter what. At least some of the columns have to be
> converted to their assigned datatypes enough to know whether or not to
> filter the row, but we might be able push that logic inside a copy. I'm
> thinking of something like this:
> 
>     SELECT x.a, sum(x.b)
>     FROM ( COPY INLINE '/path/to/foo.txt' FORMAT CSV ) as x( a integer,
>     b numeric, c text, d date, e json) )
>     WHERE x.d >= '2018-11-01'
> 
> 
> In this case, there is the /opportunity/ to see the following optimizations:
> - columns c and e are never referenced, and need never be turned into a
> datum (though we might do so just to confirm that they conform to the
> data type)
> - if column d is converted first, we can filter on it and avoid
> converting columns a,b
> - whatever optimizations we can infer from knowing that the two
> surviving columns will go directly into an aggregate
> 
> If we go this route, we can train the planner to notice other
> optimizations and add those mechanisms at that time, and then existing
> code gets faster.
> 
> If we go the COPY-WHEN route, then we have to make up new syntax for
> every possible future optimization.

IMHO those two things address vastly different use-cases. The COPY WHEN
case deals with filtering data while importing them into a database,
while what you're describing seems to be more about querying data stored
in a CSV file. But we already do have a solution for that - FDW, and I'd
say it's working pretty well. And AFAIK it does give you tools to
implement most of what you're asking for. I don't see why should we bolt
this on top of COPY, or how is it an alternative to COPY WHEN.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: COPY FROM WHEN condition

От
David Fetter
Дата:
On Thu, Nov 01, 2018 at 10:57:25PM -0400, Corey Huinker wrote:
> >
> > > Are you thinking something like having a COPY command that provides
> > > results in such a way that they could be referenced in a FROM clause
> > > (perhaps a COPY that defines a cursor…)?
> >
> > That would also be nice, but what I was thinking of was that some
> > highly restricted subset of cases of SQL in general could lend
> > themselves to levels of optimization that would be impractical in
> > other contexts.
> 
> If COPY (or a syntactical equivalent) can return a result set, then the
> whole of SQL is available to filter and aggregate the results and we don't
> have to invent new syntax, or endure confusion whenCOPY-WHEN syntax behaves
> subtly different from a similar FROM-WHERE.

That's an excellent point.

> Also, what would we be saving computationally? The whole file (or program
> output) has to be consumed no matter what, the columns have to be parsed no
> matter what. At least some of the columns have to be converted to their
> assigned datatypes enough to know whether or not to filter the row, but we
> might be able push that logic inside a copy. I'm thinking of something like
> this:
> 
> SELECT x.a, sum(x.b)
> FROM ( COPY INLINE '/path/to/foo.txt' FORMAT CSV ) as x( a integer, b numeric, c text, d date, e json) )

Apologies for bike-shedding, but wouldn't the following be a better
fit with the current COPY?

    COPY t(a integer, b numeric, c text, d date, e json) FROM '/path/to/foo.txt' WITH (FORMAT CSV, INLINE)

> WHERE x.d >= '2018-11-01'
> 
> 
> In this case, there is the *opportunity* to see the following optimizations:
> - columns c and e are never referenced, and need never be turned into a
> datum (though we might do so just to confirm that they conform to the data
> type)

That sounds like something that could go inside the WITH extension
I'm proposing above.

[STRICT_TYPE boolean DEFAULT true]?

This might not be something that has to be in version 1.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: COPY FROM WHEN condition

От
David Fetter
Дата:
On Fri, Nov 02, 2018 at 12:58:12PM +0100, Daniel Verite wrote:
>     Pavel Stehule wrote:
> 
> > > SELECT x.a, sum(x.b)
> > > FROM ( COPY INLINE '/path/to/foo.txt' FORMAT CSV ) as x( a integer, b
> > > numeric, c text, d date, e json) )
> > > WHERE x.d >= '2018-11-01'
> > >
> > >
> > Without some special feature this example is not extra useful. It is based
> > on copy on server that can use only super user with full rights.
> 
> And if superuser, one might use the file data wrapper [1]  to get
> the same results without the need for the explicit COPY in the query.
> 
> BTW, this brings into question whether the [WHEN condition] clause
> should be included in the options of file_fdw, as it supports pretty
> much all COPY options.
> 
> Also, psql's \copy should gain the option too?

tl;dr: \copy support is a very large can of worms.

psql's \copy is something which should probably be handled separately
from COPY, as it's both a way to access the filesystem without
superuser permission and an interface to the COPY part of the
protocol.  It seems like poor design to add grammar to support a
single client, so we'd need to think about this in terms of what we
want to support on the client side independent of specific clients. It
also seems like a violation of separation of concerns to couple FEBE
to grammar, so there'd need to be some way to do those things
separately, too.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: COPY FROM WHEN condition

От
Corey Huinker
Дата:

> SELECT x.a, sum(x.b)
> FROM ( COPY INLINE '/path/to/foo.txt' FORMAT CSV ) as x( a integer, b numeric, c text, d date, e json) )

Apologies for bike-shedding, but wouldn't the following be a better
fit with the current COPY?

    COPY t(a integer, b numeric, c text, d date, e json) FROM '/path/to/foo.txt' WITH (FORMAT CSV, INLINE)

+1 Very much a better fit.

Re: COPY FROM WHEN condition

От
"Daniel Verite"
Дата:
    David Fetter wrote:

> It also seems like a violation of separation of concerns to couple
> FEBE to grammar, so there'd need to be some way to do those things
> separately, too.

After re-reading psql/copy.c, I withdraw what I said upthread:
it doesn't appear necessary to add anything to support the WHEN
condition with \copy.

\copy does have a dedicated mini-parser, but it doesn't try to
recognize every option: it's only concerned with getting the bits of
information that are needed to perform the client-side work:
- whether it's a copy from or to
- what exact form and value has the 'filename' argument immediately
 after from or to:
 '<file path>' | PROGRAM '<command>' | stdin | stdout | pstdout | pstdout

It doesn't really care what the options are, just where they are
in the buffer, so they can be copied into the COPY SQL statement.

From the code:
 * The documented syntax is:
 *    \copy tablename [(columnlist)] from|to filename [options]
 *    \copy ( query stmt ) to filename [options]

The WHEN clause would be part of the [options], which
are handled as simply as this in parse_slash_copy():

  /* Collect the rest of the line (COPY options) */
  token = strtokx(NULL, "", NULL, NULL,
      0, false, false, pset.encoding);
  if (token)
      result->after_tofrom = pg_strdup(token);

So unless there's something particular in the WHEN clause
expression that could make this strtokx() invocation error out,
this should work directly.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite


Re: COPY FROM WHEN condition

От
Adam Berlin
Дата:
As a newcomer to this patch, when I read this example:

COPY table_name WHEN (some_condition)

.. I expect COPY to only be run when the condition is true, and I do not expect the WHEN clause to filter rows. I'm
curiouswhat you think about:
 

COPY table_name WHERE (some_condition)

Users should already be familiar with the idea that WHERE performs a filter.

Re: COPY FROM WHEN condition

От
Tomas Vondra
Дата:


On 11/9/18 4:51 PM, Adam Berlin wrote:
> As a newcomer to this patch, when I read this example:
> 
> COPY table_name WHEN (some_condition)
> 
> .. I expect COPY to only be run when the condition is true, and I do
not expect the WHEN clause to filter rows. I'm curious what you think about:
> 

Hmmm. Currently we have WHEN clause in three places:

1) triggers, where it specifies arbitrary expression on rows,
determining whether the trigger should be invoked at all

2) event triggers, where it does about the same thing as for regular
triggers, but only expressions "column IN (...)" are allowed

3) CASE WHEN ... THEN  ...

I'd say 3 is rather unrelated to this discussion, and 1+2 seem to
support you objection to using WHEN for COPY, as it kinda specifies
whether the command itself should be executed.

> COPY table_name WHERE (some_condition)
> 
> Users should already be familiar with the idea that WHERE performs a filter.
> 

So, what about using FILTER here? We already use it for aggregates when
filtering rows to process.

That being said, I have no strong feelings either way. I'd be OK with
both WHEN and WHERE.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


RE: COPY FROM WHEN condition

От
"myungkyu.lim"
Дата:
>> COPY table_name WHERE (some_condition)
>>
>> Users should already be familiar with the idea that WHERE performs a filter.
>>

> So, what about using FILTER here? We already use it for aggregates when
> filtering rows to process.

> That being said, I have no strong feelings either way. I'd be OK with
> both WHEN and WHERE.

I don't think it's an important point,

In gram.y,
    where_clause:
            WHERE a_expr                            { $$ = $2; }
            | /*EMPTY*/                            { $$ = NULL; }
        ;
This is similar to the 'opt_when_clause' in this patch.

So, I think 'WHERE' is a better form.

BTW, 3rd patch worked very well in my tests.
However, some wrong code style still exists.

    Node    *whenClause= NULL;
    cstate->whenClause=whenClause;

Best regards,
Myungkyu, Lim



Re: COPY FROM WHEN condition

От
Surafel Temesgen
Дата:


On Sun, Nov 11, 2018 at 11:59 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

So, what about using FILTER here? We already use it for aggregates when
filtering rows to process.

i think its good idea and describe its purpose more. Attache is a patch that use FILTER instead
regards
Surafel  
Вложения

Re: COPY FROM WHEN condition

От
Tomas Vondra
Дата:
On 11/23/18 12:14 PM, Surafel Temesgen wrote:
> 
> 
> On Sun, Nov 11, 2018 at 11:59 PM Tomas Vondra
> <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:
> 
> 
>     So, what about using FILTER here? We already use it for aggregates when
>     filtering rows to process.
> 
> i think its good idea and describe its purpose more. Attache is a
> patch that use FILTER instead

Thanks, looks good to me. A couple of minor points:

1) While comparing this to the FILTER clause we already have for
aggregates, I've noticed the aggregate version is

    FILTER '(' WHERE a_expr ')'

while here we have

    FILTER '(' a_expr ')'

For a while I was thinking that maybe we should use the same syntax
here, but I don't think so. The WHERE bit seems rather unnecessary and
we probably implemented it only because it's required by SQL standard,
which does not apply to COPY. So I think this is fine.


2) The various parser checks emit errors like this:

    case EXPR_KIND_COPY_FILTER:
        err = _("cannot use subquery in copy from FILTER condition");
        break;

I think the "copy from" should be capitalized, to make it clear that it
refers to a COPY FROM command and not a copy of something.


3) I think there should be regression tests for these prohibited things,
i.e. for a set-returning function, for a non-existent column, etc.


4) What might be somewhat confusing for users is that the filter uses a
single snapshot to evaluate the conditions for all rows. That is, one
might do this

    create or replace function f() returns int as $$
        select count(*)::int from t;
    $$ language sql;

and hope that

    copy t from '/...' filter (f() <= 100);

only ever imports the first 100 rows - but that's not true, of course,
because f() uses the snapshot acquired at the very beginning. For
example INSERT SELECT does behave differently:

    test=# copy t from '/home/user/t.data' filter (f() < 100);
    COPY 81
    test=# insert into t select * from t where f() < 100;
    INSERT 0 19

Obviously, this is not an issue when the filter clause references only
the input row (which I assume will be the primary use case).

Not sure if this is expected / appropriate behavior, and if the patch
needs to do something else here.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: COPY FROM WHEN condition

От
Dean Rasheed
Дата:
On Sat, 24 Nov 2018 at 02:09, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
> On 11/23/18 12:14 PM, Surafel Temesgen wrote:
> > On Sun, Nov 11, 2018 at 11:59 PM Tomas Vondra
> > <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:
> >     So, what about using FILTER here? We already use it for aggregates when
> >     filtering rows to process.
> >
> > i think its good idea and describe its purpose more. Attache is a
> > patch that use FILTER instead
>

FWIW, I vote for just using WHERE here.

Right now we have 2 syntaxes for filtering rows in queries, both of
which use WHERE immediately before the condition:

1). SELECT ... FROM ... WHERE condition

2). SELECT agg_fn FILTER (WHERE condition) FROM ...

I'm not a huge fan of (2), but that's the SQL standard, so we're stuck
with it. There's a certain consistency in it's use of WHERE to
introduce the condition, and preceding that with FILTER helps to
distinguish it from any later WHERE clause. But what you'd be adding
here would be a 3rd syntax

3). COPY ... FROM ... FILTER condition

which IMO will just lead to confusion.

Regards,
Dean


Re: COPY FROM WHEN condition

От
Surafel Temesgen
Дата:


On Sat, Nov 24, 2018 at 12:02 PM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
Right now we have 2 syntaxes for filtering rows in queries, both of
which use WHERE immediately before the condition:

1). SELECT ... FROM ... WHERE condition

2). SELECT agg_fn FILTER (WHERE condition) FROM ...

I'm not a huge fan of (2), but that's the SQL standard, so we're stuck
with it. There's a certain consistency in it's use of WHERE to
introduce the condition, and preceding that with FILTER helps to
distinguish it from any later WHERE clause. But what you'd be adding
here would be a 3rd syntax

3). COPY ... FROM ... FILTER condition

which IMO will just lead to confusion.
 

your case is for retrieving data but this is for deciding which data to insert and word FILTER I think describe it more and not lead to confusion. 

regards
Surafel

Re: COPY FROM WHEN condition

От
Surafel Temesgen
Дата:


On Sat, Nov 24, 2018 at 5:09 AM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

1) While comparing this to the FILTER clause we already have for
aggregates, I've noticed the aggregate version is

    FILTER '(' WHERE a_expr ')'

while here we have

    FILTER '(' a_expr ')'

For a while I was thinking that maybe we should use the same syntax
here, but I don't think so. The WHERE bit seems rather unnecessary and
we probably implemented it only because it's required by SQL standard,
which does not apply to COPY. So I think this is fine.
 
ok

2) The various parser checks emit errors like this:

    case EXPR_KIND_COPY_FILTER:
        err = _("cannot use subquery in copy from FILTER condition");
        break;

I think the "copy from" should be capitalized, to make it clear that it
refers to a COPY FROM command and not a copy of something.


3) I think there should be regression tests for these prohibited things,
i.e. for a set-returning function, for a non-existent column, etc.


fixed

4) What might be somewhat confusing for users is that the filter uses a
single snapshot to evaluate the conditions for all rows. That is, one
might do this

    create or replace function f() returns int as $$
        select count(*)::int from t;
    $$ language sql;

and hope that

    copy t from '/...' filter (f() <= 100);

only ever imports the first 100 rows - but that's not true, of course,
because f() uses the snapshot acquired at the very beginning. For
example INSERT SELECT does behave differently:

    test=# copy t from '/home/user/t.data' filter (f() < 100);
    COPY 81
    test=# insert into t select * from t where f() < 100;
    INSERT 0 19

Obviously, this is not an issue when the filter clause references only
the input row (which I assume will be the primary use case).

Not sure if this is expected / appropriate behavior, and if the patch
needs to do something else here.

Comparing with overhead of setting snapshot before evaluating every row and considering this

kind of usage is not frequent it seems to me the behavior is acceptable  

regards
Surafel
Вложения

Re: COPY FROM WHEN condition

От
Tomas Vondra
Дата:
On 11/26/18 2:25 PM, Surafel Temesgen wrote:
> 
> 
> On Sat, Nov 24, 2018 at 5:09 AM Tomas Vondra
> <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:
> 
> 
>     1) While comparing this to the FILTER clause we already have for
>     aggregates, I've noticed the aggregate version is
> 
>         FILTER '(' WHERE a_expr ')'
> 
>     while here we have
> 
>         FILTER '(' a_expr ')'
> 
>     For a while I was thinking that maybe we should use the same syntax
>     here, but I don't think so. The WHERE bit seems rather unnecessary and
>     we probably implemented it only because it's required by SQL standard,
>     which does not apply to COPY. So I think this is fine.
> 
>  
> ok
> 
> 
>     2) The various parser checks emit errors like this:
> 
>         case EXPR_KIND_COPY_FILTER:
>             err = _("cannot use subquery in copy from FILTER condition");
>             break;
> 
>     I think the "copy from" should be capitalized, to make it clear that it
>     refers to a COPY FROM command and not a copy of something.
> 
> 
>     3) I think there should be regression tests for these prohibited things,
>     i.e. for a set-returning function, for a non-existent column, etc.
> 
> 
> fixed
> 
> 
>     4) What might be somewhat confusing for users is that the filter uses a
>     single snapshot to evaluate the conditions for all rows. That is, one
>     might do this
> 
>         create or replace function f() returns int as $$
>             select count(*)::int from t;
>         $$ language sql;
> 
>     and hope that
> 
>         copy t from '/...' filter (f() <= 100);
> 
>     only ever imports the first 100 rows - but that's not true, of course,
>     because f() uses the snapshot acquired at the very beginning. For
>     example INSERT SELECT does behave differently:
> 
>         test=# copy t from '/home/user/t.data' filter (f() < 100);
>         COPY 81
>         test=# insert into t select * from t where f() < 100;
>         INSERT 0 19
> 
>     Obviously, this is not an issue when the filter clause references only
>     the input row (which I assume will be the primary use case).
> 
>     Not sure if this is expected / appropriate behavior, and if the patch
>     needs to do something else here.
> 
> Comparing with overhead of setting snapshot before evaluating every row
> and considering this
> 
> kind of usage is not frequent it seems to me the behavior is acceptable
> 

I'm not really buying the argument that this behavior is acceptable
simply because using the feature like this will be uncommon. That seems
like a rather weak reason to accept it.

I however agree we don't want to make COPY less efficient, at least not
unless absolutely necessary. But I think we can handle this simply by
restricting what's allowed to appear the FILTER clause.

It should be fine to allow IMMUTABLE and STABLE functions, but not
VOLATILE ones. That should fix the example I shared, because f() would
not be allowed.

We could mark is as STABLE explicitly, which would make it usable in the
FILTER clause, but STABLE says "same result for all calls in the
statement" so the behavior would be expected and essentially legal (in a
way it'd be mislabeling, but we trust it on other places too).

So I think there are four options:

(a) accept the current behavior (single snapshot, same result for all
function calls)

(b) prohibit VOLATILE functions in the FILTER clause altogether

(c) allow VOLATILE functions in the FILTER clause, but change the
behavior to make the behavior sane


I definitely vote -1 on (a) unless someone presents a much better
argument for allowing it than "it's uncommon".

Which leaves us with (b) and (c). Clearly, (b) is simpler to implement,
because it (c) needs to do the detection too, and then some additional
stuff. I'm not sure how much more complex (c) is, compared to (b).

After eyeballing the copy.c code for a bit, I think it would need to use
CIM_SINGLE when there are volatile functions, similarly to volatile
default values (see volatile_defexprs), and then increment the command
ID after each insert. Of course, we don't want to do this by default,
but only when actually needed (with VOLATILE functions), because the
multi-inserts are quite a bit more efficient.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: COPY FROM WHEN condition

От
Surafel Temesgen
Дата:


On Thu, Nov 29, 2018 at 2:17 AM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

(c) allow VOLATILE functions in the FILTER clause, but change the
behavior to make the behavior sane
 
 Did changing the behavior means getting new snapshot before evaluating a tuple to ensure the function sees results of any previously executed queries or there are   other mechanism that can make the behavior sane?  

Which leaves us with (b) and (c). Clearly, (b) is simpler to implement,
because it (c) needs to do the detection too, and then some additional
stuff. I'm not sure how much more complex (c) is, compared to (b).

The attache patch implement option b prohibit VOLATILE functions but i am open to change
regards
Surafel
Вложения

Re: COPY FROM WHEN condition

От
Tomas Vondra
Дата:

On 11/30/18 2:00 PM, Surafel Temesgen wrote:
> 
> 
> On Thu, Nov 29, 2018 at 2:17 AM Tomas Vondra 
> <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:
> 
>     (c) allow VOLATILE functions in the FILTER clause, but change the
>     behavior to make the behavior sane
> 
>   Did changing the behavior means getting new snapshot before evaluating 
> a tuple to ensure the function sees results of any previously executed 
> queries or there are   other mechanism that can make the behavior sane?
> 

I think it should be enough just to switch to CIM_SINGLE and increment 
the command counter after each inserted row.

>     Which leaves us with (b) and (c). Clearly, (b) is simpler to implement,
>     because it (c) needs to do the detection too, and then some additional
>     stuff. I'm not sure how much more complex (c) is, compared to (b).
> 
> The attache patch implement option b prohibit VOLATILE functions but i 
> am open to change

OK. I'll take a look.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: COPY FROM WHEN condition

От
Alvaro Herrera
Дата:
After reading this thread, I think I like WHERE better than FILTER.
Tally:

WHERE: Adam Berlin, Lim Myungkyu, Dean Rasheed, yours truly
FILTER: Tomas Vondra, Surafel Temesgen

Couldn't find others expressing an opinion in this regard.

On 2018-Nov-30, Tomas Vondra wrote:

> I think it should be enough just to switch to CIM_SINGLE and increment the
> command counter after each inserted row.

Do we apply command counter increment per row with some other COPY
option?  Per-row CCI makes me a bit uncomfortable because with you'd get
in trouble with a large copy.  I think it's particularly nasty here,
precisely because you may want to filter out some rows of a very large
file, and the CCI may prevent that from working.
I'm not convinced by the example case of reading how many tuples you've
imported so far in the WHERE/WHEN/FILTER clause each time (that'd become
incrementally slower as it progresses).

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: COPY FROM WHEN condition

От
Tomas Vondra
Дата:
On 12/4/18 10:44 AM, Alvaro Herrera wrote:
> After reading this thread, I think I like WHERE better than FILTER.
> Tally:
> 
> WHERE: Adam Berlin, Lim Myungkyu, Dean Rasheed, yours truly
> FILTER: Tomas Vondra, Surafel Temesgen
> 
> Couldn't find others expressing an opinion in this regard.
> 

While I still like FILTER more, I won't object to using WHERE if others
thinks it's a better choice.

> On 2018-Nov-30, Tomas Vondra wrote:
> 
>> I think it should be enough just to switch to CIM_SINGLE and
>> increment the command counter after each inserted row.
> 
> Do we apply command counter increment per row with some other COPY 
> option?

I don't think we increment the command counter anywhere, most likely
because COPY is not allowed to run any queries directly so far.

> Per-row CCI makes me a bit uncomfortable because with you'd get in
> trouble with a large copy.  I think it's particularly nasty here,
> precisely because you may want to filter out some rows of a very
> large file, and the CCI may prevent that from working.
Sure.

> I'm not convinced by the example case of reading how many tuples
> you've imported so far in the WHERE/WHEN/FILTER clause each time
> (that'd become incrementally slower as it progresses).
> 

Well, not sure how else am I supposed to convince you? It's an example
of a behavior that's IMHO surprising and inconsistent with things that
might be reasonably expected to behave similarly. It may not be a
perfect example, but that's the price for simplicity.

FWIW, another way to achieve mostly the same filtering feature is a
BEFORE INSERT trigger:

  create or replace function copy_filter() returns trigger as $$
  declare
    v_c int;
  begin
    select count(*) into v_c from t;
    if v_c >= 100 then
      return null;
    end if;
    return NEW;
  end; $$ language plpgsql;

  create trigger filter before insert on t
     for each row execute procedure copy_filter();

This behaves consistently with INSERT, i.e. it enforces the total count
constraint the same way. And the COPY FILTER behaves differently.

FWIW I do realize this is not a particularly great check - for example,
it will not see effects of concurrent transactions etc. All I'm saying
is I find it annoying/strange that it behaves differently.

Also, considering the trigger does the right thing, maybe I spoke too
early about the command counter not being incremented?

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: COPY FROM WHEN condition

От
Robert Haas
Дата:
On Wed, Nov 28, 2018 at 6:17 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> > Comparing with overhead of setting snapshot before evaluating every row
> > and considering this
> >
> > kind of usage is not frequent it seems to me the behavior is acceptable
>
> I'm not really buying the argument that this behavior is acceptable
> simply because using the feature like this will be uncommon. That seems
> like a rather weak reason to accept it.
>
> I however agree we don't want to make COPY less efficient, at least not
> unless absolutely necessary. But I think we can handle this simply by
> restricting what's allowed to appear the FILTER clause.
>
> It should be fine to allow IMMUTABLE and STABLE functions, but not
> VOLATILE ones. That should fix the example I shared, because f() would
> not be allowed.

I don't think that's a very good solution.  It's perfectly sensible
for someone to want to do WHERE/FILTER random() < 0.01 to load a
smattering of rows, and this would rule that out for no very good
reason.

I think it would be fine to just document that if the filter condition
examines the state of the database, it will not see the results of the
COPY which is in progress.  I'm pretty sure there are other cases -
for example with triggers - where you can get code to run that can't
see the results of the command currently in progress, so I don't
really buy the idea that having a feature which works that way is
categorically unacceptable.

I agree that we can't justify flagrantly wrong behavior on the grounds
that correct behavior is expensive or on the grounds that the
incorrect cases will be rare. However, when there's more than one
sensible behavior, it's not unreasonable to pick one that is easier to
implement or cheaper or whatever, and that's the category into which I
would put this decision.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: COPY FROM WHEN condition

От
Tomas Vondra
Дата:

On 12/6/18 4:52 PM, Robert Haas wrote:
> On Wed, Nov 28, 2018 at 6:17 PM Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
>>> Comparing with overhead of setting snapshot before evaluating every row
>>> and considering this
>>>
>>> kind of usage is not frequent it seems to me the behavior is acceptable
>>
>> I'm not really buying the argument that this behavior is acceptable
>> simply because using the feature like this will be uncommon. That seems
>> like a rather weak reason to accept it.
>>
>> I however agree we don't want to make COPY less efficient, at least not
>> unless absolutely necessary. But I think we can handle this simply by
>> restricting what's allowed to appear the FILTER clause.
>>
>> It should be fine to allow IMMUTABLE and STABLE functions, but not
>> VOLATILE ones. That should fix the example I shared, because f() would
>> not be allowed.
> 
> I don't think that's a very good solution.  It's perfectly sensible
> for someone to want to do WHERE/FILTER random() < 0.01 to load a
> smattering of rows, and this would rule that out for no very good
> reason.
> 

Good point. I agree that's a much more plausible use case for this
feature, and forbidding volatile functions would break it.

> I think it would be fine to just document that if the filter condition
> examines the state of the database, it will not see the results of the
> COPY which is in progress.  I'm pretty sure there are other cases -
> for example with triggers - where you can get code to run that can't
> see the results of the command currently in progress, so I don't
> really buy the idea that having a feature which works that way is
> categorically unacceptable.
> 
> I agree that we can't justify flagrantly wrong behavior on the grounds
> that correct behavior is expensive or on the grounds that the
> incorrect cases will be rare. However, when there's more than one
> sensible behavior, it's not unreasonable to pick one that is easier to
> implement or cheaper or whatever, and that's the category into which I
> would put this decision.
> 

OK, makes sense. I withdraw my objections to the original behavior, and
agree it's acceptable if it's reasonably documented.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: COPY FROM WHEN condition

От
Surafel Temesgen
Дата:


On Tue, Dec 4, 2018 at 12:44 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
After reading this thread, I think I like WHERE better than FILTER.
Tally:

WHERE: Adam Berlin, Lim Myungkyu, Dean Rasheed, yours truly
FILTER: Tomas Vondra, Surafel Temesgen



accepting tally result  i change the keyword to WHERE condition and allow volatile function
with single insertion method
regards
Surafel
Вложения

Re: COPY FROM WHEN condition

От
Tomas Vondra
Дата:

On 12/12/18 7:05 AM, Surafel Temesgen wrote:
>
>
> On Tue, Dec 4, 2018 at 12:44 PM Alvaro Herrera <alvherre@2ndquadrant.com
> <mailto:alvherre@2ndquadrant.com>> wrote:
>
>     After reading this thread, I think I like WHERE better than FILTER.
>     Tally:
>
>     WHERE: Adam Berlin, Lim Myungkyu, Dean Rasheed, yours truly
>     FILTER: Tomas Vondra, Surafel Temesgen
>
>
>
> accepting tally result  i change the keyword to WHERE condition and
> allow volatile function with single insertion method

Can you also update the docs to mention that the functions called from
the WHERE clause does not see effects of the COPY itself?

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: COPY FROM WHEN condition

От
Surafel Temesgen
Дата:


On Wed, Dec 12, 2018 at 9:28 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

Can you also update the docs to mention that the functions called from
the WHERE clause does not see effects of the COPY itself?


Of course, i  also add same comment to insertion method selection
regards
Surafel
 
Вложения

Re: COPY FROM WHEN condition

От
Tomas Vondra
Дата:
On 12/13/18 8:09 AM, Surafel Temesgen wrote:
> 
> 
> On Wed, Dec 12, 2018 at 9:28 PM Tomas Vondra
> <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:
> 
> 
>     Can you also update the docs to mention that the functions called from
>     the WHERE clause does not see effects of the COPY itself?
> 
> 
> /Of course, i  also add same comment to insertion method selection
> /

FWIW I've marked this as RFC and plan to get it committed this week.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: COPY FROM WHEN condition

От
Tomas Vondra
Дата:

On 1/14/19 10:25 PM, Tomas Vondra wrote:
> On 12/13/18 8:09 AM, Surafel Temesgen wrote:
>>
>>
>> On Wed, Dec 12, 2018 at 9:28 PM Tomas Vondra
>> <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:
>>
>>
>>     Can you also update the docs to mention that the functions called from
>>     the WHERE clause does not see effects of the COPY itself?
>>
>>
>> /Of course, i  also add same comment to insertion method selection
>> /
> 
> FWIW I've marked this as RFC and plan to get it committed this week.
> 

Pushed, thanks for the patch.

cheers

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: COPY FROM WHEN condition

От
Surafel Temesgen
Дата:


On Sun, Jan 20, 2019 at 2:24 AM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

Pushed, thanks for the patch.

cheers



Thank you

regards
Surafel

Re: COPY FROM WHEN condition

От
Andres Freund
Дата:
Hi,

On 2019-01-20 00:24:05 +0100, Tomas Vondra wrote:
> On 1/14/19 10:25 PM, Tomas Vondra wrote:
> > On 12/13/18 8:09 AM, Surafel Temesgen wrote:
> >>
> >>
> >> On Wed, Dec 12, 2018 at 9:28 PM Tomas Vondra
> >> <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:
> >>
> >>
> >>     Can you also update the docs to mention that the functions called from
> >>     the WHERE clause does not see effects of the COPY itself?
> >>
> >>
> >> /Of course, i  also add same comment to insertion method selection
> >> /
> > 
> > FWIW I've marked this as RFC and plan to get it committed this week.
> > 
> 
> Pushed, thanks for the patch.

While rebasing the pluggable storage patch ontop of this I noticed that
the qual appears to be evaluated in query context. Isn't that a bad
idea? ISMT it should have been evaluated a few lines above, before the:

        /* Triggers and stuff need to be invoked in query context. */
        MemoryContextSwitchTo(oldcontext);

Yes, that'd require moving the ExecStoreHeapTuple(), but that seems ok?

Greetings,

Andres Freund


Re: COPY FROM WHEN condition

От
Tomas Vondra
Дата:

On 1/20/19 8:24 PM, Andres Freund wrote:
> Hi,
> 
> On 2019-01-20 00:24:05 +0100, Tomas Vondra wrote:
>> On 1/14/19 10:25 PM, Tomas Vondra wrote:
>>> On 12/13/18 8:09 AM, Surafel Temesgen wrote:
>>>>
>>>>
>>>> On Wed, Dec 12, 2018 at 9:28 PM Tomas Vondra
>>>> <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:
>>>>
>>>>
>>>>      Can you also update the docs to mention that the functions called from
>>>>      the WHERE clause does not see effects of the COPY itself?
>>>>
>>>>
>>>> /Of course, i  also add same comment to insertion method selection
>>>> /
>>>
>>> FWIW I've marked this as RFC and plan to get it committed this week.
>>>
>>
>> Pushed, thanks for the patch.
> 
> While rebasing the pluggable storage patch ontop of this I noticed that
> the qual appears to be evaluated in query context. Isn't that a bad
> idea? ISMT it should have been evaluated a few lines above, before the:
> 
>         /* Triggers and stuff need to be invoked in query context. */
>         MemoryContextSwitchTo(oldcontext);
> 
> Yes, that'd require moving the ExecStoreHeapTuple(), but that seems ok?
> 

Yes, I agree. It's a bit too late for me to hack and push stuff, but 
I'll fix that tomorrow.

cheers

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: COPY FROM WHEN condition

От
Andres Freund
Дата:
On 2019-01-20 21:00:21 -0500, Tomas Vondra wrote:
> 
> 
> On 1/20/19 8:24 PM, Andres Freund wrote:
> > Hi,
> > 
> > On 2019-01-20 00:24:05 +0100, Tomas Vondra wrote:
> > > On 1/14/19 10:25 PM, Tomas Vondra wrote:
> > > > On 12/13/18 8:09 AM, Surafel Temesgen wrote:
> > > > > 
> > > > > 
> > > > > On Wed, Dec 12, 2018 at 9:28 PM Tomas Vondra
> > > > > <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:
> > > > > 
> > > > > 
> > > > >      Can you also update the docs to mention that the functions called from
> > > > >      the WHERE clause does not see effects of the COPY itself?
> > > > > 
> > > > > 
> > > > > /Of course, i  also add same comment to insertion method selection
> > > > > /
> > > > 
> > > > FWIW I've marked this as RFC and plan to get it committed this week.
> > > > 
> > > 
> > > Pushed, thanks for the patch.
> > 
> > While rebasing the pluggable storage patch ontop of this I noticed that
> > the qual appears to be evaluated in query context. Isn't that a bad
> > idea? ISMT it should have been evaluated a few lines above, before the:
> > 
> >         /* Triggers and stuff need to be invoked in query context. */
> >         MemoryContextSwitchTo(oldcontext);
> > 
> > Yes, that'd require moving the ExecStoreHeapTuple(), but that seems ok?
> > 
> 
> Yes, I agree. It's a bit too late for me to hack and push stuff, but I'll
> fix that tomorrow.

NP. On second thought, the problem is probably smaller than I thought at
first, because ExecQual() switches to the econtext's per-tuple memory
context. But it's only reset once for each batch, so there's some
wastage. At least worth a comment.

Greetings,

Andres Freund


Re: COPY FROM WHEN condition

От
Andres Freund
Дата:
On 2019-01-20 18:08:05 -0800, Andres Freund wrote:
> On 2019-01-20 21:00:21 -0500, Tomas Vondra wrote:
> > 
> > 
> > On 1/20/19 8:24 PM, Andres Freund wrote:
> > > Hi,
> > > 
> > > On 2019-01-20 00:24:05 +0100, Tomas Vondra wrote:
> > > > On 1/14/19 10:25 PM, Tomas Vondra wrote:
> > > > > On 12/13/18 8:09 AM, Surafel Temesgen wrote:
> > > > > > 
> > > > > > 
> > > > > > On Wed, Dec 12, 2018 at 9:28 PM Tomas Vondra
> > > > > > <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:
> > > > > > 
> > > > > > 
> > > > > >      Can you also update the docs to mention that the functions called from
> > > > > >      the WHERE clause does not see effects of the COPY itself?
> > > > > > 
> > > > > > 
> > > > > > /Of course, i  also add same comment to insertion method selection
> > > > > > /
> > > > > 
> > > > > FWIW I've marked this as RFC and plan to get it committed this week.
> > > > > 
> > > > 
> > > > Pushed, thanks for the patch.
> > > 
> > > While rebasing the pluggable storage patch ontop of this I noticed that
> > > the qual appears to be evaluated in query context. Isn't that a bad
> > > idea? ISMT it should have been evaluated a few lines above, before the:
> > > 
> > >         /* Triggers and stuff need to be invoked in query context. */
> > >         MemoryContextSwitchTo(oldcontext);
> > > 
> > > Yes, that'd require moving the ExecStoreHeapTuple(), but that seems ok?
> > > 
> > 
> > Yes, I agree. It's a bit too late for me to hack and push stuff, but I'll
> > fix that tomorrow.
> 
> NP. On second thought, the problem is probably smaller than I thought at
> first, because ExecQual() switches to the econtext's per-tuple memory
> context. But it's only reset once for each batch, so there's some
> wastage. At least worth a comment.

I'm tired, but perhaps its actually worse - what's being reset currently
is the ESTate's per-tuple context:

        if (nBufferedTuples == 0)
        {
            /*
             * Reset the per-tuple exprcontext. We can only do this if the
             * tuple buffer is empty. (Calling the context the per-tuple
             * memory context is a bit of a misnomer now.)
             */
            ResetPerTupleExprContext(estate);
        }

but the quals are evaluated in the ExprContext's:

ExecQual(ExprState *state, ExprContext *econtext)
...
    ret = ExecEvalExprSwitchContext(state, econtext, &isnull);


which is created with:

/* Get an EState's per-output-tuple exprcontext, making it if first use */
#define GetPerTupleExprContext(estate) \
    ((estate)->es_per_tuple_exprcontext ? \
     (estate)->es_per_tuple_exprcontext : \
     MakePerTupleExprContext(estate))

and creates its own context:
    /*
     * Create working memory for expression evaluation in this context.
     */
    econtext->ecxt_per_tuple_memory =
        AllocSetContextCreate(estate->es_query_cxt,
                              "ExprContext",
                              ALLOCSET_DEFAULT_SIZES);

so this is currently just never reset. Seems just using
ExecQualAndReset() ought to be sufficient?

Greetings,

Andres Freund


Re: COPY FROM WHEN condition

От
Tomas Vondra
Дата:

On 1/21/19 3:12 AM, Andres Freund wrote:
> On 2019-01-20 18:08:05 -0800, Andres Freund wrote:
>> On 2019-01-20 21:00:21 -0500, Tomas Vondra wrote:
>>>
>>>
>>> On 1/20/19 8:24 PM, Andres Freund wrote:
>>>> Hi,
>>>>
>>>> On 2019-01-20 00:24:05 +0100, Tomas Vondra wrote:
>>>>> On 1/14/19 10:25 PM, Tomas Vondra wrote:
>>>>>> On 12/13/18 8:09 AM, Surafel Temesgen wrote:
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Dec 12, 2018 at 9:28 PM Tomas Vondra
>>>>>>> <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>      Can you also update the docs to mention that the functions called from
>>>>>>>      the WHERE clause does not see effects of the COPY itself?
>>>>>>>
>>>>>>>
>>>>>>> /Of course, i  also add same comment to insertion method selection
>>>>>>> /
>>>>>>
>>>>>> FWIW I've marked this as RFC and plan to get it committed this week.
>>>>>>
>>>>>
>>>>> Pushed, thanks for the patch.
>>>>
>>>> While rebasing the pluggable storage patch ontop of this I noticed that
>>>> the qual appears to be evaluated in query context. Isn't that a bad
>>>> idea? ISMT it should have been evaluated a few lines above, before the:
>>>>
>>>>         /* Triggers and stuff need to be invoked in query context. */
>>>>         MemoryContextSwitchTo(oldcontext);
>>>>
>>>> Yes, that'd require moving the ExecStoreHeapTuple(), but that seems ok?
>>>>
>>>
>>> Yes, I agree. It's a bit too late for me to hack and push stuff, but I'll
>>> fix that tomorrow.
>>
>> NP. On second thought, the problem is probably smaller than I thought at
>> first, because ExecQual() switches to the econtext's per-tuple memory
>> context. But it's only reset once for each batch, so there's some
>> wastage. At least worth a comment.
> 
> I'm tired, but perhaps its actually worse - what's being reset currently
> is the ESTate's per-tuple context:
> 
>         if (nBufferedTuples == 0)
>         {
>             /*
>              * Reset the per-tuple exprcontext. We can only do this if the
>              * tuple buffer is empty. (Calling the context the per-tuple
>              * memory context is a bit of a misnomer now.)
>              */
>             ResetPerTupleExprContext(estate);
>         }
> 
> but the quals are evaluated in the ExprContext's:
> 
> ExecQual(ExprState *state, ExprContext *econtext)
> ...
>     ret = ExecEvalExprSwitchContext(state, econtext, &isnull);
> 
> 
> which is created with:
> 
> /* Get an EState's per-output-tuple exprcontext, making it if first use */
> #define GetPerTupleExprContext(estate) \
>     ((estate)->es_per_tuple_exprcontext ? \
>      (estate)->es_per_tuple_exprcontext : \
>      MakePerTupleExprContext(estate))
> 
> and creates its own context:
>     /*
>      * Create working memory for expression evaluation in this context.
>      */
>     econtext->ecxt_per_tuple_memory =
>         AllocSetContextCreate(estate->es_query_cxt,
>                               "ExprContext",
>                               ALLOCSET_DEFAULT_SIZES);
> 
> so this is currently just never reset.

So context, much simple. Wow.
 > Seems just using ExecQualAndReset() ought to be sufficient?
> 

Seems like it.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: COPY FROM WHEN condition

От
Tomas Vondra
Дата:

On 1/21/19 3:12 AM, Andres Freund wrote:
> On 2019-01-20 18:08:05 -0800, Andres Freund wrote:
>> On 2019-01-20 21:00:21 -0500, Tomas Vondra wrote:
>>>
>>>
>>> On 1/20/19 8:24 PM, Andres Freund wrote:
>>>> Hi,
>>>>
>>>> On 2019-01-20 00:24:05 +0100, Tomas Vondra wrote:
>>>>> On 1/14/19 10:25 PM, Tomas Vondra wrote:
>>>>>> On 12/13/18 8:09 AM, Surafel Temesgen wrote:
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Dec 12, 2018 at 9:28 PM Tomas Vondra
>>>>>>> <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>      Can you also update the docs to mention that the functions called from
>>>>>>>      the WHERE clause does not see effects of the COPY itself?
>>>>>>>
>>>>>>>
>>>>>>> /Of course, i  also add same comment to insertion method selection
>>>>>>> /
>>>>>>
>>>>>> FWIW I've marked this as RFC and plan to get it committed this week.
>>>>>>
>>>>>
>>>>> Pushed, thanks for the patch.
>>>>
>>>> While rebasing the pluggable storage patch ontop of this I noticed that
>>>> the qual appears to be evaluated in query context. Isn't that a bad
>>>> idea? ISMT it should have been evaluated a few lines above, before the:
>>>>
>>>>         /* Triggers and stuff need to be invoked in query context. */
>>>>         MemoryContextSwitchTo(oldcontext);
>>>>
>>>> Yes, that'd require moving the ExecStoreHeapTuple(), but that seems ok?
>>>>
>>>
>>> Yes, I agree. It's a bit too late for me to hack and push stuff, but I'll
>>> fix that tomorrow.
>>
>> NP. On second thought, the problem is probably smaller than I thought at
>> first, because ExecQual() switches to the econtext's per-tuple memory
>> context. But it's only reset once for each batch, so there's some
>> wastage. At least worth a comment.
> 
> I'm tired, but perhaps its actually worse - what's being reset currently
> is the ESTate's per-tuple context:
> 
>         if (nBufferedTuples == 0)
>         {
>             /*
>              * Reset the per-tuple exprcontext. We can only do this if the
>              * tuple buffer is empty. (Calling the context the per-tuple
>              * memory context is a bit of a misnomer now.)
>              */
>             ResetPerTupleExprContext(estate);
>         }
> 
> but the quals are evaluated in the ExprContext's:
> 
> ExecQual(ExprState *state, ExprContext *econtext)
> ...
>     ret = ExecEvalExprSwitchContext(state, econtext, &isnull);
> 
> 
> which is created with:
> 
> /* Get an EState's per-output-tuple exprcontext, making it if first use */
> #define GetPerTupleExprContext(estate) \
>     ((estate)->es_per_tuple_exprcontext ? \
>      (estate)->es_per_tuple_exprcontext : \
>      MakePerTupleExprContext(estate))
> 
> and creates its own context:
>     /*
>      * Create working memory for expression evaluation in this context.
>      */
>     econtext->ecxt_per_tuple_memory =
>         AllocSetContextCreate(estate->es_query_cxt,
>                               "ExprContext",
>                               ALLOCSET_DEFAULT_SIZES);
> 
> so this is currently just never reset.

Actually, no. The ResetPerTupleExprContext boils down to

    MemoryContextReset((econtext)->ecxt_per_tuple_memory)

and ExecEvalExprSwitchContext does this

    MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);

So it's resetting the right context, although only on batch boundary.
But now I see 31f38174 does this:

    else if (cstate->whereClause != NULL ||
             contain_volatile_functions(cstate->whereClause))
    {
        ...
        insertMethod = CIM_SINGLE;
    }

so it does not do batching with WHERE. But the condition seems wrong, I
guess it should be && instead of ||. Will investigate in the morning.

> Seems just using ExecQualAndReset() ought to be sufficient?
> 

That may still be the right thing to do.


cheers

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: COPY FROM WHEN condition

От
Tomas Vondra
Дата:

On 1/21/19 4:33 AM, Tomas Vondra wrote:
> 
> 
> On 1/21/19 3:12 AM, Andres Freund wrote:
>> On 2019-01-20 18:08:05 -0800, Andres Freund wrote:
>>> On 2019-01-20 21:00:21 -0500, Tomas Vondra wrote:
>>>>
>>>>
>>>> On 1/20/19 8:24 PM, Andres Freund wrote:
>>>>> Hi,
>>>>>
>>>>> On 2019-01-20 00:24:05 +0100, Tomas Vondra wrote:
>>>>>> On 1/14/19 10:25 PM, Tomas Vondra wrote:
>>>>>>> On 12/13/18 8:09 AM, Surafel Temesgen wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Dec 12, 2018 at 9:28 PM Tomas Vondra
>>>>>>>> <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>      Can you also update the docs to mention that the functions called from
>>>>>>>>      the WHERE clause does not see effects of the COPY itself?
>>>>>>>>
>>>>>>>>
>>>>>>>> /Of course, i  also add same comment to insertion method selection
>>>>>>>> /
>>>>>>>
>>>>>>> FWIW I've marked this as RFC and plan to get it committed this week.
>>>>>>>
>>>>>>
>>>>>> Pushed, thanks for the patch.
>>>>>
>>>>> While rebasing the pluggable storage patch ontop of this I noticed that
>>>>> the qual appears to be evaluated in query context. Isn't that a bad
>>>>> idea? ISMT it should have been evaluated a few lines above, before the:
>>>>>
>>>>>         /* Triggers and stuff need to be invoked in query context. */
>>>>>         MemoryContextSwitchTo(oldcontext);
>>>>>
>>>>> Yes, that'd require moving the ExecStoreHeapTuple(), but that seems ok?
>>>>>
>>>>
>>>> Yes, I agree. It's a bit too late for me to hack and push stuff, but I'll
>>>> fix that tomorrow.
>>>
>>> NP. On second thought, the problem is probably smaller than I thought at
>>> first, because ExecQual() switches to the econtext's per-tuple memory
>>> context. But it's only reset once for each batch, so there's some
>>> wastage. At least worth a comment.
>>
>> I'm tired, but perhaps its actually worse - what's being reset currently
>> is the ESTate's per-tuple context:
>>
>>         if (nBufferedTuples == 0)
>>         {
>>             /*
>>              * Reset the per-tuple exprcontext. We can only do this if the
>>              * tuple buffer is empty. (Calling the context the per-tuple
>>              * memory context is a bit of a misnomer now.)
>>              */
>>             ResetPerTupleExprContext(estate);
>>         }
>>
>> but the quals are evaluated in the ExprContext's:
>>
>> ExecQual(ExprState *state, ExprContext *econtext)
>> ...
>>     ret = ExecEvalExprSwitchContext(state, econtext, &isnull);
>>
>>
>> which is created with:
>>
>> /* Get an EState's per-output-tuple exprcontext, making it if first use */
>> #define GetPerTupleExprContext(estate) \
>>     ((estate)->es_per_tuple_exprcontext ? \
>>      (estate)->es_per_tuple_exprcontext : \
>>      MakePerTupleExprContext(estate))
>>
>> and creates its own context:
>>     /*
>>      * Create working memory for expression evaluation in this context.
>>      */
>>     econtext->ecxt_per_tuple_memory =
>>         AllocSetContextCreate(estate->es_query_cxt,
>>                               "ExprContext",
>>                               ALLOCSET_DEFAULT_SIZES);
>>
>> so this is currently just never reset.
> 
> Actually, no. The ResetPerTupleExprContext boils down to
> 
>     MemoryContextReset((econtext)->ecxt_per_tuple_memory)
> 
> and ExecEvalExprSwitchContext does this
> 
>     MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
> 
> So it's resetting the right context, although only on batch boundary.
> But now I see 31f38174 does this:
> 
>     else if (cstate->whereClause != NULL ||
>              contain_volatile_functions(cstate->whereClause))
>     {
>         ...
>         insertMethod = CIM_SINGLE;
>     }
> 
> so it does not do batching with WHERE. But the condition seems wrong, I
> guess it should be && instead of ||. Will investigate in the morning.
> 

I think the condition can be just

    if (contain_volatile_functions(cstate->whereClause)) { ... }

Per the attached patch. Surafel, do you agree?


>> Seems just using ExecQualAndReset() ought to be sufficient?
>>
> 
> That may still be the right thing to do.
> 

Actually, no, because that would reset the context far too early (and
it's easy to trigger segfaults). So the reset would have to happen after
processing the row, not this early.

But I think the current behavior is actually OK, as it matches what we
do for defexprs. And the comment before ResetPerTupleExprContext says this:

    /*
     * Reset the per-tuple exprcontext. We can only do this if the
     * tuple buffer is empty. (Calling the context the per-tuple
     * memory context is a bit of a misnomer now.)
     */

So the per-tuple context is not quite per-tuple anyway. Sure, we might
rework that but I don't think that's an issue in this patch.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: COPY FROM WHEN condition

От
Andres Freund
Дата:
Hi,

On 2019-01-21 16:22:11 +0100, Tomas Vondra wrote:
> 
> 
> On 1/21/19 4:33 AM, Tomas Vondra wrote:
> > 
> > 
> > On 1/21/19 3:12 AM, Andres Freund wrote:
> >> On 2019-01-20 18:08:05 -0800, Andres Freund wrote:
> >>> On 2019-01-20 21:00:21 -0500, Tomas Vondra wrote:
> >>>>
> >>>>
> >>>> On 1/20/19 8:24 PM, Andres Freund wrote:
> >>>>> Hi,
> >>>>>
> >>>>> On 2019-01-20 00:24:05 +0100, Tomas Vondra wrote:
> >>>>>> On 1/14/19 10:25 PM, Tomas Vondra wrote:
> >>>>>>> On 12/13/18 8:09 AM, Surafel Temesgen wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Wed, Dec 12, 2018 at 9:28 PM Tomas Vondra
> >>>>>>>> <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>      Can you also update the docs to mention that the functions called from
> >>>>>>>>      the WHERE clause does not see effects of the COPY itself?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> /Of course, i  also add same comment to insertion method selection
> >>>>>>>> /
> >>>>>>>
> >>>>>>> FWIW I've marked this as RFC and plan to get it committed this week.
> >>>>>>>
> >>>>>>
> >>>>>> Pushed, thanks for the patch.
> >>>>>
> >>>>> While rebasing the pluggable storage patch ontop of this I noticed that
> >>>>> the qual appears to be evaluated in query context. Isn't that a bad
> >>>>> idea? ISMT it should have been evaluated a few lines above, before the:
> >>>>>
> >>>>>         /* Triggers and stuff need to be invoked in query context. */
> >>>>>         MemoryContextSwitchTo(oldcontext);
> >>>>>
> >>>>> Yes, that'd require moving the ExecStoreHeapTuple(), but that seems ok?
> >>>>>
> >>>>
> >>>> Yes, I agree. It's a bit too late for me to hack and push stuff, but I'll
> >>>> fix that tomorrow.
> >>>
> >>> NP. On second thought, the problem is probably smaller than I thought at
> >>> first, because ExecQual() switches to the econtext's per-tuple memory
> >>> context. But it's only reset once for each batch, so there's some
> >>> wastage. At least worth a comment.
> >>
> >> I'm tired, but perhaps its actually worse - what's being reset currently
> >> is the ESTate's per-tuple context:
> >>
> >>         if (nBufferedTuples == 0)
> >>         {
> >>             /*
> >>              * Reset the per-tuple exprcontext. We can only do this if the
> >>              * tuple buffer is empty. (Calling the context the per-tuple
> >>              * memory context is a bit of a misnomer now.)
> >>              */
> >>             ResetPerTupleExprContext(estate);
> >>         }
> >>
> >> but the quals are evaluated in the ExprContext's:
> >>
> >> ExecQual(ExprState *state, ExprContext *econtext)
> >> ...
> >>     ret = ExecEvalExprSwitchContext(state, econtext, &isnull);
> >>
> >>
> >> which is created with:
> >>
> >> /* Get an EState's per-output-tuple exprcontext, making it if first use */
> >> #define GetPerTupleExprContext(estate) \
> >>     ((estate)->es_per_tuple_exprcontext ? \
> >>      (estate)->es_per_tuple_exprcontext : \
> >>      MakePerTupleExprContext(estate))
> >>
> >> and creates its own context:
> >>     /*
> >>      * Create working memory for expression evaluation in this context.
> >>      */
> >>     econtext->ecxt_per_tuple_memory =
> >>         AllocSetContextCreate(estate->es_query_cxt,
> >>                               "ExprContext",
> >>                               ALLOCSET_DEFAULT_SIZES);
> >>
> >> so this is currently just never reset.
> > 
> > Actually, no. The ResetPerTupleExprContext boils down to
> > 
> >     MemoryContextReset((econtext)->ecxt_per_tuple_memory)
> > 
> > and ExecEvalExprSwitchContext does this
> > 
> >     MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
> > 
> > So it's resetting the right context, although only on batch boundary.

> >> Seems just using ExecQualAndReset() ought to be sufficient?
> >>
> > 
> > That may still be the right thing to do.
> > 
> 
> Actually, no, because that would reset the context far too early (and
> it's easy to trigger segfaults). So the reset would have to happen after
> processing the row, not this early.

Yea, sorry, I was too tired yesterday evening. I'd spent 10h splitting
up the pluggable storage patch into individual pieces...


> But I think the current behavior is actually OK, as it matches what we
> do for defexprs. And the comment before ResetPerTupleExprContext says this:
> 
>     /*
>      * Reset the per-tuple exprcontext. We can only do this if the
>      * tuple buffer is empty. (Calling the context the per-tuple
>      * memory context is a bit of a misnomer now.)
>      */
> 
> So the per-tuple context is not quite per-tuple anyway. Sure, we might
> rework that but I don't think that's an issue in this patch.

I'm *not* convinced by this. I think it's bad enough that we do this for
normal COPY, but for WHEN, we could end up *never* resetting before the
end. Consider a case where a single tuple is inserted, and then *all*
rows are filtered.  I think this needs a separate econtext that's reset
every round. Or alternatively you could fix the code not to rely on
per-tuple not being reset when tuples are buffered - that actually ought
to be fairly simple.

Greetings,

Andres Freund


Re: COPY FROM WHEN condition

От
Tomas Vondra
Дата:

On 1/21/19 7:51 PM, Andres Freund wrote:
> Hi,
> 
> On 2019-01-21 16:22:11 +0100, Tomas Vondra wrote:
>>
>>
>> On 1/21/19 4:33 AM, Tomas Vondra wrote:
>>>
>>>
>>> On 1/21/19 3:12 AM, Andres Freund wrote:
>>>> On 2019-01-20 18:08:05 -0800, Andres Freund wrote:
>>>>> On 2019-01-20 21:00:21 -0500, Tomas Vondra wrote:
>>>>>>
>>>>>>
>>>>>> On 1/20/19 8:24 PM, Andres Freund wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> On 2019-01-20 00:24:05 +0100, Tomas Vondra wrote:
>>>>>>>> On 1/14/19 10:25 PM, Tomas Vondra wrote:
>>>>>>>>> On 12/13/18 8:09 AM, Surafel Temesgen wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Dec 12, 2018 at 9:28 PM Tomas Vondra
>>>>>>>>>> <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>      Can you also update the docs to mention that the functions called from
>>>>>>>>>>      the WHERE clause does not see effects of the COPY itself?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> /Of course, i  also add same comment to insertion method selection
>>>>>>>>>> /
>>>>>>>>>
>>>>>>>>> FWIW I've marked this as RFC and plan to get it committed this week.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Pushed, thanks for the patch.
>>>>>>>
>>>>>>> While rebasing the pluggable storage patch ontop of this I noticed that
>>>>>>> the qual appears to be evaluated in query context. Isn't that a bad
>>>>>>> idea? ISMT it should have been evaluated a few lines above, before the:
>>>>>>>
>>>>>>>         /* Triggers and stuff need to be invoked in query context. */
>>>>>>>         MemoryContextSwitchTo(oldcontext);
>>>>>>>
>>>>>>> Yes, that'd require moving the ExecStoreHeapTuple(), but that seems ok?
>>>>>>>
>>>>>>
>>>>>> Yes, I agree. It's a bit too late for me to hack and push stuff, but I'll
>>>>>> fix that tomorrow.
>>>>>
>>>>> NP. On second thought, the problem is probably smaller than I thought at
>>>>> first, because ExecQual() switches to the econtext's per-tuple memory
>>>>> context. But it's only reset once for each batch, so there's some
>>>>> wastage. At least worth a comment.
>>>>
>>>> I'm tired, but perhaps its actually worse - what's being reset currently
>>>> is the ESTate's per-tuple context:
>>>>
>>>>         if (nBufferedTuples == 0)
>>>>         {
>>>>             /*
>>>>              * Reset the per-tuple exprcontext. We can only do this if the
>>>>              * tuple buffer is empty. (Calling the context the per-tuple
>>>>              * memory context is a bit of a misnomer now.)
>>>>              */
>>>>             ResetPerTupleExprContext(estate);
>>>>         }
>>>>
>>>> but the quals are evaluated in the ExprContext's:
>>>>
>>>> ExecQual(ExprState *state, ExprContext *econtext)
>>>> ...
>>>>     ret = ExecEvalExprSwitchContext(state, econtext, &isnull);
>>>>
>>>>
>>>> which is created with:
>>>>
>>>> /* Get an EState's per-output-tuple exprcontext, making it if first use */
>>>> #define GetPerTupleExprContext(estate) \
>>>>     ((estate)->es_per_tuple_exprcontext ? \
>>>>      (estate)->es_per_tuple_exprcontext : \
>>>>      MakePerTupleExprContext(estate))
>>>>
>>>> and creates its own context:
>>>>     /*
>>>>      * Create working memory for expression evaluation in this context.
>>>>      */
>>>>     econtext->ecxt_per_tuple_memory =
>>>>         AllocSetContextCreate(estate->es_query_cxt,
>>>>                               "ExprContext",
>>>>                               ALLOCSET_DEFAULT_SIZES);
>>>>
>>>> so this is currently just never reset.
>>>
>>> Actually, no. The ResetPerTupleExprContext boils down to
>>>
>>>     MemoryContextReset((econtext)->ecxt_per_tuple_memory)
>>>
>>> and ExecEvalExprSwitchContext does this
>>>
>>>     MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
>>>
>>> So it's resetting the right context, although only on batch boundary.
> 
>>>> Seems just using ExecQualAndReset() ought to be sufficient?
>>>>
>>>
>>> That may still be the right thing to do.
>>>
>>
>> Actually, no, because that would reset the context far too early (and
>> it's easy to trigger segfaults). So the reset would have to happen after
>> processing the row, not this early.
> 
> Yea, sorry, I was too tired yesterday evening. I'd spent 10h splitting
> up the pluggable storage patch into individual pieces...
> 
> 
>> But I think the current behavior is actually OK, as it matches what we
>> do for defexprs. And the comment before ResetPerTupleExprContext says this:
>>
>>     /*
>>      * Reset the per-tuple exprcontext. We can only do this if the
>>      * tuple buffer is empty. (Calling the context the per-tuple
>>      * memory context is a bit of a misnomer now.)
>>      */
>>
>> So the per-tuple context is not quite per-tuple anyway. Sure, we might
>> rework that but I don't think that's an issue in this patch.
> 
> I'm *not* convinced by this. I think it's bad enough that we do this for
> normal COPY, but for WHEN, we could end up *never* resetting before the
> end. Consider a case where a single tuple is inserted, and then *all*
> rows are filtered.  I think this needs a separate econtext that's reset
> every round. Or alternatively you could fix the code not to rely on
> per-tuple not being reset when tuples are buffered - that actually ought
> to be fairly simple.
> 

I think separating the per-tuple and per-batch contexts is the right
thing to do, here. It seems the batching was added somewhat later and
using the per-tuple context is rather confusing.

cheers

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: COPY FROM WHEN condition

От
Surafel Temesgen
Дата:


On Mon, Jan 21, 2019 at 6:22 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

I think the condition can be just

    if (contain_volatile_functions(cstate->whereClause)) { ... }



yes it can be. 

regards
Surafel

Re: COPY FROM WHEN condition

От
Tomas Vondra
Дата:

On 1/21/19 11:15 PM, Tomas Vondra wrote:
> 
> 
> On 1/21/19 7:51 PM, Andres Freund wrote:
>> Hi,
>>
>> On 2019-01-21 16:22:11 +0100, Tomas Vondra wrote:
>>>
>>>
>>> On 1/21/19 4:33 AM, Tomas Vondra wrote:
>>>>
>>>>
>>>> On 1/21/19 3:12 AM, Andres Freund wrote:
>>>>> On 2019-01-20 18:08:05 -0800, Andres Freund wrote:
>>>>>> On 2019-01-20 21:00:21 -0500, Tomas Vondra wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 1/20/19 8:24 PM, Andres Freund wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> On 2019-01-20 00:24:05 +0100, Tomas Vondra wrote:
>>>>>>>>> On 1/14/19 10:25 PM, Tomas Vondra wrote:
>>>>>>>>>> On 12/13/18 8:09 AM, Surafel Temesgen wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Dec 12, 2018 at 9:28 PM Tomas Vondra
>>>>>>>>>>> <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>      Can you also update the docs to mention that the functions called from
>>>>>>>>>>>      the WHERE clause does not see effects of the COPY itself?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> /Of course, i  also add same comment to insertion method selection
>>>>>>>>>>> /
>>>>>>>>>>
>>>>>>>>>> FWIW I've marked this as RFC and plan to get it committed this week.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Pushed, thanks for the patch.
>>>>>>>>
>>>>>>>> While rebasing the pluggable storage patch ontop of this I noticed that
>>>>>>>> the qual appears to be evaluated in query context. Isn't that a bad
>>>>>>>> idea? ISMT it should have been evaluated a few lines above, before the:
>>>>>>>>
>>>>>>>>         /* Triggers and stuff need to be invoked in query context. */
>>>>>>>>         MemoryContextSwitchTo(oldcontext);
>>>>>>>>
>>>>>>>> Yes, that'd require moving the ExecStoreHeapTuple(), but that seems ok?
>>>>>>>>
>>>>>>>
>>>>>>> Yes, I agree. It's a bit too late for me to hack and push stuff, but I'll
>>>>>>> fix that tomorrow.
>>>>>>
>>>>>> NP. On second thought, the problem is probably smaller than I thought at
>>>>>> first, because ExecQual() switches to the econtext's per-tuple memory
>>>>>> context. But it's only reset once for each batch, so there's some
>>>>>> wastage. At least worth a comment.
>>>>>
>>>>> I'm tired, but perhaps its actually worse - what's being reset currently
>>>>> is the ESTate's per-tuple context:
>>>>>
>>>>>         if (nBufferedTuples == 0)
>>>>>         {
>>>>>             /*
>>>>>              * Reset the per-tuple exprcontext. We can only do this if the
>>>>>              * tuple buffer is empty. (Calling the context the per-tuple
>>>>>              * memory context is a bit of a misnomer now.)
>>>>>              */
>>>>>             ResetPerTupleExprContext(estate);
>>>>>         }
>>>>>
>>>>> but the quals are evaluated in the ExprContext's:
>>>>>
>>>>> ExecQual(ExprState *state, ExprContext *econtext)
>>>>> ...
>>>>>     ret = ExecEvalExprSwitchContext(state, econtext, &isnull);
>>>>>
>>>>>
>>>>> which is created with:
>>>>>
>>>>> /* Get an EState's per-output-tuple exprcontext, making it if first use */
>>>>> #define GetPerTupleExprContext(estate) \
>>>>>     ((estate)->es_per_tuple_exprcontext ? \
>>>>>      (estate)->es_per_tuple_exprcontext : \
>>>>>      MakePerTupleExprContext(estate))
>>>>>
>>>>> and creates its own context:
>>>>>     /*
>>>>>      * Create working memory for expression evaluation in this context.
>>>>>      */
>>>>>     econtext->ecxt_per_tuple_memory =
>>>>>         AllocSetContextCreate(estate->es_query_cxt,
>>>>>                               "ExprContext",
>>>>>                               ALLOCSET_DEFAULT_SIZES);
>>>>>
>>>>> so this is currently just never reset.
>>>>
>>>> Actually, no. The ResetPerTupleExprContext boils down to
>>>>
>>>>     MemoryContextReset((econtext)->ecxt_per_tuple_memory)
>>>>
>>>> and ExecEvalExprSwitchContext does this
>>>>
>>>>     MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
>>>>
>>>> So it's resetting the right context, although only on batch boundary.
>>
>>>>> Seems just using ExecQualAndReset() ought to be sufficient?
>>>>>
>>>>
>>>> That may still be the right thing to do.
>>>>
>>>
>>> Actually, no, because that would reset the context far too early (and
>>> it's easy to trigger segfaults). So the reset would have to happen after
>>> processing the row, not this early.
>>
>> Yea, sorry, I was too tired yesterday evening. I'd spent 10h splitting
>> up the pluggable storage patch into individual pieces...
>>
>>
>>> But I think the current behavior is actually OK, as it matches what we
>>> do for defexprs. And the comment before ResetPerTupleExprContext says this:
>>>
>>>     /*
>>>      * Reset the per-tuple exprcontext. We can only do this if the
>>>      * tuple buffer is empty. (Calling the context the per-tuple
>>>      * memory context is a bit of a misnomer now.)
>>>      */
>>>
>>> So the per-tuple context is not quite per-tuple anyway. Sure, we might
>>> rework that but I don't think that's an issue in this patch.
>>
>> I'm *not* convinced by this. I think it's bad enough that we do this for
>> normal COPY, but for WHEN, we could end up *never* resetting before the
>> end. Consider a case where a single tuple is inserted, and then *all*
>> rows are filtered.  I think this needs a separate econtext that's reset
>> every round. Or alternatively you could fix the code not to rely on
>> per-tuple not being reset when tuples are buffered - that actually ought
>> to be fairly simple.
>>
> 
> I think separating the per-tuple and per-batch contexts is the right
> thing to do, here. It seems the batching was added somewhat later and
> using the per-tuple context is rather confusing.
> 

OK, here is a WIP patch doing that. It creates a new "batch" context,
and allocates tuples in it (instead of the per-tuple context). The
per-tuple context is now reset always, irrespectedly of nBufferedTuples.
And the batch context is reset every time the batch is emptied.

It turned out to be a tad more complex due to partitioning, because when
we find the partitions do not match, the tuple is already allocated in
the "current" context (be it per-tuple or batch). So we can't just free
the whole context at that point. The old code worked around this by
alternating two contexts, but that seems a bit too cumbersome to me, so
the patch simply copies the tuple to the new context. That allows us to
reset the batch context always, right after emptying the buffer. I need
to do some benchmarking to see if the extra copy causes any regression.

Overall, separating the contexts makes it quite a bit clearer. I'm not
entirely happy about the per-tuple context being "implicit" (hidden in
executor context) while the batch context being explicitly created, but
there's not much I can do about that.

The patch also includes the fix correcting the volatility check on WHERE
clause, although that shall be committed separately.

regards
-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: COPY FROM WHEN condition

От
Andres Freund
Дата:
Hi,

On 2019-01-22 18:35:21 +0100, Tomas Vondra wrote:
> On 1/21/19 11:15 PM, Tomas Vondra wrote:
> > On 1/21/19 7:51 PM, Andres Freund wrote:
> >> I'm *not* convinced by this. I think it's bad enough that we do this for
> >> normal COPY, but for WHEN, we could end up *never* resetting before the
> >> end. Consider a case where a single tuple is inserted, and then *all*
> >> rows are filtered.  I think this needs a separate econtext that's reset
> >> every round. Or alternatively you could fix the code not to rely on
> >> per-tuple not being reset when tuples are buffered - that actually ought
> >> to be fairly simple.
> >>
> > 
> > I think separating the per-tuple and per-batch contexts is the right
> > thing to do, here. It seems the batching was added somewhat later and
> > using the per-tuple context is rather confusing.
> > 
> 
> OK, here is a WIP patch doing that. It creates a new "batch" context,
> and allocates tuples in it (instead of the per-tuple context). The
> per-tuple context is now reset always, irrespectedly of nBufferedTuples.
> And the batch context is reset every time the batch is emptied.
> 
> It turned out to be a tad more complex due to partitioning, because when
> we find the partitions do not match, the tuple is already allocated in
> the "current" context (be it per-tuple or batch). So we can't just free
> the whole context at that point. The old code worked around this by
> alternating two contexts, but that seems a bit too cumbersome to me, so
> the patch simply copies the tuple to the new context. That allows us to
> reset the batch context always, right after emptying the buffer. I need
> to do some benchmarking to see if the extra copy causes any regression.
> 
> Overall, separating the contexts makes it quite a bit clearer. I'm not
> entirely happy about the per-tuple context being "implicit" (hidden in
> executor context) while the batch context being explicitly created, but
> there's not much I can do about that.

I think the extra copy is probably OK for now - as part of the pluggable
storage series I've converted COPY to use slots, which should make that
less of an issue - I've not done that, but I actually assume we could
remove the whole batch context afterwards.

Greetings,

Andres Freund


Re: COPY FROM WHEN condition

От
Tomas Vondra
Дата:

On 1/22/19 10:00 AM, Surafel Temesgen wrote:
> 
> 
> On Mon, Jan 21, 2019 at 6:22 PM Tomas Vondra
> <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:
> 
> 
>     I think the condition can be just
> 
>         if (contain_volatile_functions(cstate->whereClause)) { ... }
> 
> 

I've pushed a fix for the volatility check.

Attached is a patch for the other issue, creating a separate batch
context long the lines outlined in the previous email. It's a bit too
late for me to push it now, especially right before a couple of days
off. So I'll push that in a couple of days.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: COPY FROM WHEN condition

От
David Rowley
Дата:
On Wed, 23 Jan 2019 at 06:35, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
> It turned out to be a tad more complex due to partitioning, because when
> we find the partitions do not match, the tuple is already allocated in
> the "current" context (be it per-tuple or batch). So we can't just free
> the whole context at that point. The old code worked around this by
> alternating two contexts, but that seems a bit too cumbersome to me, so
> the patch simply copies the tuple to the new context. That allows us to
> reset the batch context always, right after emptying the buffer. I need
> to do some benchmarking to see if the extra copy causes any regression.

I agree that fixing the problem mentioned by separating out tuple and
batch contexts is a good idea, but I disagree with removing the
alternating batch context idea.  FWIW the reason the alternating
contexts went in wasn't just for fun, it was on performance grounds.
There's a lengthy discussion in [1] about not adding any new
performance regressions to COPY. To be more specific, Peter complains
about a regression of 5% in [2].

It's pretty disheartening to see the work there being partially undone.

Here are my performance tests of with and without your change to the
memory contexts (I missed where you posted your results).

$ cat bench.pl
for (my $i=0; $i < 8912891; $i++) {
print "1\n1\n2\n2\n";
}
36a1281f86c: (with your change)

postgres=# copy listp from program $$perl ~/bench.pl$$ delimiter '|';
COPY 35651564
Time: 26825.142 ms (00:26.825)
Time: 27202.117 ms (00:27.202)
Time: 26266.705 ms (00:26.267)

4be058fe9ec: (before your change)

postgres=# copy listp from program $$perl ~/bench.pl$$ delimiter '|';
COPY 35651564
Time: 25645.460 ms (00:25.645)
Time: 25698.193 ms (00:25.698)
Time: 25737.251 ms (00:25.737)

So looks like your change slows this code down 4% for this test case.
That's about twice as bad as the 2.2% regression that I had to solve
for the multi-insert partition patch (of which you've removed much of
the code from)

I'd really like to see the alternating batch context code being put
back in to fix this. Of course, if you have a better idea, then we can
look into that, but just removing code that was carefully written and
benchmarked without any new benchmarks to prove that it's okay to do
so seems wrong to me.

[1]
https://www.postgresql.org/message-id/flat/CAKJS1f93DeHN%2B9RrD9jYn0iF_o89w2B%2BU8-Ao5V1kd8Cf7oSGQ%40mail.gmail.com
[2] https://www.postgresql.org/message-id/2b40468d-6be0-e795-c363-0015bc9fa747%402ndquadrant.com

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: COPY FROM WHEN condition

От
Tomas Vondra
Дата:

On 1/29/19 8:18 AM, David Rowley wrote:
> On Wed, 23 Jan 2019 at 06:35, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>> It turned out to be a tad more complex due to partitioning, because when
>> we find the partitions do not match, the tuple is already allocated in
>> the "current" context (be it per-tuple or batch). So we can't just free
>> the whole context at that point. The old code worked around this by
>> alternating two contexts, but that seems a bit too cumbersome to me, so
>> the patch simply copies the tuple to the new context. That allows us to
>> reset the batch context always, right after emptying the buffer. I need
>> to do some benchmarking to see if the extra copy causes any regression.
> 
> I agree that fixing the problem mentioned by separating out tuple and
> batch contexts is a good idea, but I disagree with removing the
> alternating batch context idea.  FWIW the reason the alternating
> contexts went in wasn't just for fun, it was on performance grounds.
> There's a lengthy discussion in [1] about not adding any new
> performance regressions to COPY. To be more specific, Peter complains
> about a regression of 5% in [2].
> 
> It's pretty disheartening to see the work there being partially undone.
> 

Whoops :-(

> Here are my performance tests of with and without your change to the
> memory contexts (I missed where you posted your results).
> 
> $ cat bench.pl
> for (my $i=0; $i < 8912891; $i++) {
> print "1\n1\n2\n2\n";
> }
> 36a1281f86c: (with your change)
> 
> postgres=# copy listp from program $$perl ~/bench.pl$$ delimiter '|';
> COPY 35651564
> Time: 26825.142 ms (00:26.825)
> Time: 27202.117 ms (00:27.202)
> Time: 26266.705 ms (00:26.267)
> 
> 4be058fe9ec: (before your change)
> 
> postgres=# copy listp from program $$perl ~/bench.pl$$ delimiter '|';
> COPY 35651564
> Time: 25645.460 ms (00:25.645)
> Time: 25698.193 ms (00:25.698)
> Time: 25737.251 ms (00:25.737)
> 
> So looks like your change slows this code down 4% for this test case.
> That's about twice as bad as the 2.2% regression that I had to solve
> for the multi-insert partition patch (of which you've removed much of
> the code from)
> 
> I'd really like to see the alternating batch context code being put
> back in to fix this. Of course, if you have a better idea, then we can
> look into that, but just removing code that was carefully written and
> benchmarked without any new benchmarks to prove that it's okay to do
> so seems wrong to me.
> 

Yeah, that's not very nice. Do you suggest to revert 36a1281f86c0f, or
shall we try to massage it a bit first? ISTM we could simply create two
batch memory contexts and alternate those, just like with the expression
contexts before.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: COPY FROM WHEN condition

От
Tomas Vondra
Дата:
On 1/29/19 8:18 AM, David Rowley wrote:
> ...
> Here are my performance tests of with and without your change to the
> memory contexts (I missed where you posted your results).
> 
> $ cat bench.pl
> for (my $i=0; $i < 8912891; $i++) {
> print "1\n1\n2\n2\n";
> }
> 36a1281f86c: (with your change)
> 
> postgres=# copy listp from program $$perl ~/bench.pl$$ delimiter '|';
> COPY 35651564
> Time: 26825.142 ms (00:26.825)
> Time: 27202.117 ms (00:27.202)
> Time: 26266.705 ms (00:26.267)
> 
> 4be058fe9ec: (before your change)
> 
> postgres=# copy listp from program $$perl ~/bench.pl$$ delimiter '|';
> COPY 35651564
> Time: 25645.460 ms (00:25.645)
> Time: 25698.193 ms (00:25.698)
> Time: 25737.251 ms (00:25.737)
> 

How do I reproduce this? I don't see this test in the thread you linked,
so I'm not sure how many partitions you were using, what's the structure
of the table etc.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: COPY FROM WHEN condition

От
Andres Freund
Дата:
Hi,

On January 29, 2019 6:25:59 AM PST, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>On 1/29/19 8:18 AM, David Rowley wrote:
>> ...
>> Here are my performance tests of with and without your change to the
>> memory contexts (I missed where you posted your results).
>>
>> $ cat bench.pl
>> for (my $i=0; $i < 8912891; $i++) {
>> print "1\n1\n2\n2\n";
>> }
>> 36a1281f86c: (with your change)
>>
>> postgres=# copy listp from program $$perl ~/bench.pl$$ delimiter '|';
>> COPY 35651564
>> Time: 26825.142 ms (00:26.825)
>> Time: 27202.117 ms (00:27.202)
>> Time: 26266.705 ms (00:26.267)
>>
>> 4be058fe9ec: (before your change)
>>
>> postgres=# copy listp from program $$perl ~/bench.pl$$ delimiter '|';
>> COPY 35651564
>> Time: 25645.460 ms (00:25.645)
>> Time: 25698.193 ms (00:25.698)
>> Time: 25737.251 ms (00:25.737)
>>
>
>How do I reproduce this? I don't see this test in the thread you
>linked,
>so I'm not sure how many partitions you were using, what's the
>structure
>of the table etc.

I think I might have a patch addressing the problem incidentally. For pluggable storage I slotified copy.c, which also
removesthe first heap_form_tuple. Quite possible that nothing more is needed. I've removed the batch context altogether
inyesterday's rebase, there was no need anymore. 

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: COPY FROM WHEN condition

От
David Rowley
Дата:
On Wed, 30 Jan 2019 at 03:26, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>
> On 1/29/19 8:18 AM, David Rowley wrote:
> > (details about performance regression)
>
> How do I reproduce this? I don't see this test in the thread you linked,
> so I'm not sure how many partitions you were using, what's the structure
> of the table etc.

The setup is:

create table listp (a int) partition by list(a);
create table listp1 partition of listp for values in(1);
create table listp2 partition of listp for values in(2);


-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: COPY FROM WHEN condition

От
David Rowley
Дата:
On Wed, 30 Jan 2019 at 04:22, Andres Freund <andres@anarazel.de> wrote:
> I think I might have a patch addressing the problem incidentally. For pluggable storage I slotified copy.c, which
alsoremoves the first heap_form_tuple. Quite possible that nothing more is needed. I've removed the batch context
altogetherin yesterday's rebase, there was no need anymore. 

In your patch, where do the batched tuples get stored before the heap
insert is done?

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: COPY FROM WHEN condition

От
David Rowley
Дата:
On Tue, 29 Jan 2019 at 22:51, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>
> On 1/29/19 8:18 AM, David Rowley wrote:
> > So looks like your change slows this code down 4% for this test case.
> > That's about twice as bad as the 2.2% regression that I had to solve
> > for the multi-insert partition patch (of which you've removed much of
> > the code from)
> >
> > I'd really like to see the alternating batch context code being put
> > back in to fix this. Of course, if you have a better idea, then we can
> > look into that, but just removing code that was carefully written and
> > benchmarked without any new benchmarks to prove that it's okay to do
> > so seems wrong to me.
> >
>
> Yeah, that's not very nice. Do you suggest to revert 36a1281f86c0f, or
> shall we try to massage it a bit first? ISTM we could simply create two
> batch memory contexts and alternate those, just like with the expression
> contexts before.

I don't think a revert should be done. I didn't check in detail, but
it seems what you did fixed the memory leak issue for when many tuples
are filtered out.  I'd like to have more detail on what Andres is
proposing, but otherwise, I'd imagine we can just have two batch
contexts and alternate between them, as before, but also keep the
per-tuple context too.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: COPY FROM WHEN condition

От
Andres Freund
Дата:
Hi,

On 2019-01-30 10:05:35 +1300, David Rowley wrote:
> On Wed, 30 Jan 2019 at 04:22, Andres Freund <andres@anarazel.de> wrote:
> > I think I might have a patch addressing the problem incidentally. For pluggable storage I slotified copy.c, which
alsoremoves the first heap_form_tuple. Quite possible that nothing more is needed. I've removed the batch context
altogetherin yesterday's rebase, there was no need anymore.
 
> 
> In your patch, where do the batched tuples get stored before the heap
> insert is done?

There's one slot for each batched tuple (they are reused). Before
materialization the tuples solely exist in tts_isnull/values into which
NextCopyFrom() directly parses the values.  Tuples never get extracted
from the slot in copy.c itself anymore, table_multi_insert() accepts
slots.  Not quite sure whether I've answered your question?

Greetings,

Andres Freund


Re: COPY FROM WHEN condition

От
David Rowley
Дата:
On Wed, 30 Jan 2019 at 10:12, Andres Freund <andres@anarazel.de> wrote:
>
> On 2019-01-30 10:05:35 +1300, David Rowley wrote:
> > On Wed, 30 Jan 2019 at 04:22, Andres Freund <andres@anarazel.de> wrote:
> > > I think I might have a patch addressing the problem incidentally. For pluggable storage I slotified copy.c, which
alsoremoves the first heap_form_tuple. Quite possible that nothing more is needed. I've removed the batch context
altogetherin yesterday's rebase, there was no need anymore. 
> >
> > In your patch, where do the batched tuples get stored before the heap
> > insert is done?
>
> There's one slot for each batched tuple (they are reused). Before
> materialization the tuples solely exist in tts_isnull/values into which
> NextCopyFrom() directly parses the values.  Tuples never get extracted
> from the slot in copy.c itself anymore, table_multi_insert() accepts
> slots.  Not quite sure whether I've answered your question?

I think so.  I imagine that should also speed up COPY WHERE too as
it'll no longer form a tuple before possibly discarding it.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: COPY FROM WHEN condition

От
Andres Freund
Дата:
Hi,

On 2019-01-30 10:33:30 +1300, David Rowley wrote:
> On Wed, 30 Jan 2019 at 10:12, Andres Freund <andres@anarazel.de> wrote:
> >
> > On 2019-01-30 10:05:35 +1300, David Rowley wrote:
> > > On Wed, 30 Jan 2019 at 04:22, Andres Freund <andres@anarazel.de> wrote:
> > > > I think I might have a patch addressing the problem incidentally. For pluggable storage I slotified copy.c,
whichalso removes the first heap_form_tuple. Quite possible that nothing more is needed. I've removed the batch context
altogetherin yesterday's rebase, there was no need anymore.
 
> > >
> > > In your patch, where do the batched tuples get stored before the heap
> > > insert is done?
> >
> > There's one slot for each batched tuple (they are reused). Before
> > materialization the tuples solely exist in tts_isnull/values into which
> > NextCopyFrom() directly parses the values.  Tuples never get extracted
> > from the slot in copy.c itself anymore, table_multi_insert() accepts
> > slots.  Not quite sure whether I've answered your question?
> 
> I think so.  I imagine that should also speed up COPY WHERE too as
> it'll no longer form a tuple before possibly discarding it.

Right.

I found some issues in my patch (stupid implementation of copying from
one slot to the other), but after fixing that I get:

master:
Time: 16013.509 ms (00:16.014)
Time: 16836.110 ms (00:16.836)
Time: 16636.796 ms (00:16.637)

pluggable storage:
Time: 15974.243 ms (00:15.974)
Time: 16183.442 ms (00:16.183)
Time: 16055.192 ms (00:16.055)

(with a truncate between each run)

So that seems a bit better. Albeit at the cost of having a few, on
demand creatd, empty slots for each encountered partition.

I'm pretty sure we can optimize that further...


Greetings,

Andres Freund


Re: COPY FROM WHEN condition

От
David Rowley
Дата:
On Wed, 27 Mar 2019 at 18:49, Andres Freund <andres@anarazel.de> wrote:
> Here's a version that applies onto HEAD. It needs a bit more cleanup
> (I'm not sure I like the addition of ri_batchInsertSlots to store slots
> for each partition, there's some duplicated code around slot array and
> slot creation, docs for the multi insert callback).
>
> When measuring inserting into unlogged partitions (to remove the
> overhead of WAL logging, which makes it harder to see the raw
> performance difference), I see a few percent gains of the slotified
> version over the current code.  Same with insertions that have fewer
> partition switches.
>
> Sorry for posting this here - David, it'd be cool if you could take a
> look anyway. You've played a lot with this code.

Hi,

I had a look at this and performance has improved again, thanks.
However, I'm not sure if the patch is exactly what we need, let me
explain.

When I was working on 0d5f05cde01, I think my original idea was just
to use the bufferedTuples[] array and always multi insert into
partitioned tables unless something like on insert triggers stopped
that.  The reason I ended up with all this CIM_MULTI and
CIM_MULTI_CONDITIONAL stuff is due to Melanie finding a regression
when the target partition changed on each tuple.  At the time I
wondered if it might be worth trying to have multiple buffers or to
partition the existing buffer and store tuples belonging to different
partitions, then just flush them when they're full.  Looking at the
patch it seems that you have added an array of slots per partition, so
does that not kinda make all that CIM_MULTI / CIM_MULTI_CONDITIONAL
code redundant? ... we could just flush each partition's buffer when
it gets full.  There's not much of a need to flush the buffer when the
partition changes since we're using a different buffer for the next
partition anyway.

I also wonder about memory usage here. If we're copying to a
partitioned table with 10k partitions and we get over
MAX_BUFFERED_TUPLES in a row for each partition, we'll end up with
quite a lot of slots.

So, in short. I think the patch either is going to cause possible
memory problems during COPY FROM, or it leaves code that is pretty
much redundant. ... Or I've just misunderstood the patch :)

Does this make sense?

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: COPY FROM WHEN condition

От
Andres Freund
Дата:
Hi,

On 2019-03-28 20:48:47 +1300, David Rowley wrote:
> I had a look at this and performance has improved again, thanks.
> However, I'm not sure if the patch is exactly what we need, let me
> explain.

I'm not entirely sure either, I just haven't really seen an alternative
that's convincing.


> When I was working on 0d5f05cde01, I think my original idea was just
> to use the bufferedTuples[] array and always multi insert into
> partitioned tables unless something like on insert triggers stopped
> that.  The reason I ended up with all this CIM_MULTI and
> CIM_MULTI_CONDITIONAL stuff is due to Melanie finding a regression
> when the target partition changed on each tuple.  At the time I
> wondered if it might be worth trying to have multiple buffers or to
> partition the existing buffer and store tuples belonging to different
> partitions, then just flush them when they're full.  Looking at the
> patch it seems that you have added an array of slots per partition, so
> does that not kinda make all that CIM_MULTI / CIM_MULTI_CONDITIONAL
> code redundant? ... we could just flush each partition's buffer when
> it gets full.  There's not much of a need to flush the buffer when the
> partition changes since we're using a different buffer for the next
> partition anyway.

Well, there is in a way - that'd lead to pretty significant memory
usage if the tuples are a bit wider.  I think there's a separate
optimization, where we keep track of the partitions with unflushed
changes and track the buffer size across all partitions. But that seems
like a followup patch.


> I also wonder about memory usage here. If we're copying to a
> partitioned table with 10k partitions and we get over
> MAX_BUFFERED_TUPLES in a row for each partition, we'll end up with
> quite a lot of slots.

We do - but they're not actually that large. In comparison to the
catcache, relcache, partition routing etc information per partition it's
not a huge amount of memory.

I'd earlier just destroyed the slots on partition change, but that's not
exactly free either...


Greetings,

Andres Freund



Re: COPY FROM WHEN condition

От
David Rowley
Дата:
On Fri, 29 Mar 2019 at 01:15, Andres Freund <andres@anarazel.de> wrote:
> On 2019-03-28 20:48:47 +1300, David Rowley wrote:
> > I had a look at this and performance has improved again, thanks.
> > However, I'm not sure if the patch is exactly what we need, let me
> > explain.
>
> I'm not entirely sure either, I just haven't really seen an alternative
> that's convincing.

I wonder if instead of having the array of slots in ResultRelInfo,
have a struct that's local to copy.c containing the array and the
number of tuples stored so far.  For partitioned tables, we could
store this struct in a hashtable by partition Oid. When the partition
changes check if we've got this partition Oid in the hash table and
keep adding tuples until the buffer fills.   We could keep a global
count of the number of tuple stored in all the slot arrays and flush
all of them when it gets full.

The trade-off here would be that instead of flushing on each partition
change, we'd do a hash table lookup on each partition change and
possibly create a new array of slots.   This would allow us to get rid
of the code that conditionally switches on/off the batching based on
how often the partition is changing. The key to it being better would
hang on the hash lookup + multi-row-inserts being faster than
single-row-inserts.

I'm just not too sure about how to handle getting rid of the slots
when we flush all the tuples.  Getting rid of them might be a waste,
but it might also stop the code creating tens of millions of slots in
the worst case.  Maybe to fix that we could get rid of the slots in
arrays that didn't get any use at all when we flush the tuples, as
indicated by a 0 tuple count.  This would require a hash seq scan, but
maybe we could keep that cheap by flushing early if we get too many
distinct partitions. That would save the table from getting bloated if
there happened to be a point in the copy stream where we saw high
numbers of distinct partitions with just a few tuples each.
Multi-inserts won't help much in that case anyway.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: COPY FROM WHEN condition

От
Andres Freund
Дата:
Hi,

On 2019-04-01 02:00:26 +1300, David Rowley wrote:
> On Fri, 29 Mar 2019 at 01:15, Andres Freund <andres@anarazel.de> wrote:
> > On 2019-03-28 20:48:47 +1300, David Rowley wrote:
> > > I had a look at this and performance has improved again, thanks.
> > > However, I'm not sure if the patch is exactly what we need, let me
> > > explain.
> >
> > I'm not entirely sure either, I just haven't really seen an alternative
> > that's convincing.
> 
> I wonder if instead of having the array of slots in ResultRelInfo,
> have a struct that's local to copy.c containing the array and the
> number of tuples stored so far.  For partitioned tables, we could
> store this struct in a hashtable by partition Oid. When the partition
> changes check if we've got this partition Oid in the hash table and
> keep adding tuples until the buffer fills.   We could keep a global
> count of the number of tuple stored in all the slot arrays and flush
> all of them when it gets full.
> 
> The trade-off here would be that instead of flushing on each partition
> change, we'd do a hash table lookup on each partition change and
> possibly create a new array of slots.   This would allow us to get rid
> of the code that conditionally switches on/off the batching based on
> how often the partition is changing. The key to it being better would
> hang on the hash lookup + multi-row-inserts being faster than
> single-row-inserts.

It's not clear to me why this separate hashtable is useful /
necessary. We're essentially already doing such a lookup for the
partition routing - what's the point of having a repetition of the same
mapping?  I don't see any benefit of storing the slots separately from
that.


> I'm just not too sure about how to handle getting rid of the slots
> when we flush all the tuples.  Getting rid of them might be a waste,
> but it might also stop the code creating tens of millions of slots in
> the worst case.  Maybe to fix that we could get rid of the slots in
> arrays that didn't get any use at all when we flush the tuples, as
> indicated by a 0 tuple count.  This would require a hash seq scan, but
> maybe we could keep that cheap by flushing early if we get too many
> distinct partitions.

I'd suspect a better approach to this might be to just have a linked
list of partitions with slots in them, that'd avoid unnecessarily going
through all the partitions that got copied into them.  I'm not convinced
this is necessary, but if I were to implement this, I'd leave the slots
in the ResultRelInfo, but additionally keep track of how many slots have
been created. When creating new slots, and some limit has been reached,
I'd just go to the front of that list and delete those slots (probably
delaying doing so to the the point of flushing pending insertions).
That seems like it'd not actually be that much changed code over my
current patch, and would avoid deleting slots in the common case.

I'll work on pushing all the other pending tableam patches today -
leaving COPY the last non pluggable part. You'd written in a private
email that you might try to work on this on Monday, so I think I'll give
this a shot on Tuesday if you've not gotten around till then? I'd like
to push this sooner than the exact end of the freeze...


> That would save the table from getting bloated if
> there happened to be a point in the copy stream where we saw high
> numbers of distinct partitions with just a few tuples each.
> Multi-inserts won't help much in that case anyway.

I thought your benchmarking showed that you saw benefits after even two
tuples?

Greetings,

Andres Freund



Re: COPY FROM WHEN condition

От
David Rowley
Дата:
On Mon, 1 Apr 2019 at 08:05, Andres Freund <andres@anarazel.de> wrote:
> I'll work on pushing all the other pending tableam patches today -
> leaving COPY the last non pluggable part. You'd written in a private
> email that you might try to work on this on Monday, so I think I'll give
> this a shot on Tuesday if you've not gotten around till then? I'd like
> to push this sooner than the exact end of the freeze...

I worked on this, but I've not got the patch finished yet.

Things not done:

1. Fails regression tests. Possibly due to a bug in tuple conversion.
2. Thought about ERROR line numbers in copy stream. It's possible we
may need an array of uint64s to store the line number per slot.
3. Testing.
4. Code comments.
5. Code sanity check.
6. Thought about if it's a problem if we ERROR during the copy after
having already inserted tuples that come after the tuple causing the
error. (It's possible that the errors would become out of order.)

However, the performance looks pretty good.

$ cat bench.pl
for (my $i=0; $i < 8912891; $i++) {
print "1\n1\n2\n2\n";
}
$ cat bench_same.pl
for (my $i=0; $i < 8912891; $i++) {
print "1\n1\n1\n1\n";
}

create table listp(a int) partition by list(a);
create table listp1 partition of listp for values in(1);
create table listp2 partition of listp for values in(2);

-- Test 1: Change partition every 2nd tuple.

master + v23-0001-tableam-multi_insert-and-slotify-COPY.patch

# copy listp from program $$perl ~/bench.pl$$ delimiter '|';
Time: 17894.625 ms (00:17.895)

master + attached

# copy listp from program $$perl ~/bench.pl$$ delimiter '|';
Time: 10615.761 ms (00:10.616)

-- Test 2: Same partition each time.

master + v23-0001-tableam-multi_insert-and-slotify-COPY.patch
# copy listp from program $$perl ~/bench_same.pl$$ delimiter '|';
Time: 19234.960 ms (00:19.235)

master + attached

# copy listp from program $$perl ~/bench_same.pl$$ delimiter '|';
Time: 9064.802 ms (00:09.065)

Of course, it is possible that some of the bugs account for some of
the improved time...  but the rows did seem to be in the table
afterwards.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: COPY FROM WHEN condition

От
Andres Freund
Дата:
Hi,

On 2019-04-02 04:23:20 +1300, David Rowley wrote:
> On Mon, 1 Apr 2019 at 08:05, Andres Freund <andres@anarazel.de> wrote:
> > I'll work on pushing all the other pending tableam patches today -
> > leaving COPY the last non pluggable part. You'd written in a private
> > email that you might try to work on this on Monday, so I think I'll give
> > this a shot on Tuesday if you've not gotten around till then? I'd like
> > to push this sooner than the exact end of the freeze...
> 
> I worked on this, but I've not got the patch finished yet.

Thanks! I'm not quite clear whether you planning to continue working on
this, or whether this is a handoff? Either is fine with me, just trying
to avoid unnecessary work / delay.


> Of course, it is possible that some of the bugs account for some of
> the improved time...  but the rows did seem to be in the table
> afterwards.

The speedup likely seems to be from having much larger batches, right?
Unless we insert into enough partitions that batching doesn't ever
encounter two tuples, we'll now efficiently use multi_insert even if
there's no consecutive tuples.


>      if (cstate->rel)
>      {
> -        Datum       *values;
> -        bool       *nulls;
> +        TupleTableSlot *slot;
>          TableScanDesc scandesc;
> -        HeapTuple    tuple;
> -
> -        values = (Datum *) palloc(num_phys_attrs * sizeof(Datum));
> -        nulls = (bool *) palloc(num_phys_attrs * sizeof(bool));
>  
>          scandesc = table_beginscan(cstate->rel, GetActiveSnapshot(), 0, NULL);
> +        slot = table_slot_create(cstate->rel, NULL);
>  
>          processed = 0;
> -        while ((tuple = heap_getnext(scandesc, ForwardScanDirection)) != NULL)
> +        while (table_scan_getnextslot(scandesc, ForwardScanDirection, slot))
>          {
>              CHECK_FOR_INTERRUPTS();
>  
> -            /* Deconstruct the tuple ... faster than repeated heap_getattr */
> -            heap_deform_tuple(tuple, tupDesc, values, nulls);
> +            /* Deconstruct the tuple ... */
> +            slot_getallattrs(slot);
>  
>              /* Format and send the data */
> -            CopyOneRowTo(cstate, values, nulls);
> +            CopyOneRowTo(cstate, slot);
>              processed++;
>          }
>  
> +        ExecDropSingleTupleTableSlot(slot);
>          table_endscan(scandesc);
> -
> -        pfree(values);
> -        pfree(nulls);
>      }

Hm, I should just have committed this part separately. Not sure why I
didn't...


> +#define MAX_BUFFERED_TUPLES        1000
> +#define MAX_BUFFERED_BYTES        65535
> +#define MAX_PARTITION_BUFFERS    16
> +
> +typedef struct {
> +    Oid        relid;
> +    TupleTableSlot **slots;
> +    ResultRelInfo *resultRelInfo;
> +    int    nused;
> +} CopyMultiInsertBuffer;
> +
> +typedef struct {
> +    HTAB *multiInsertBufferTab;
> +    CopyMultiInsertBuffer *buffer;
> +    int bufferedTuples;
> +    int bufferedBytes;
> +    int nbuffers;
> +} CopyMultiInsertInfo;

To me it seems a bit better to have this as generic facility - we really
should use multi_insert in more places (like inserting the pg_attribute
rows). But admittedly that can just be done later.


> +static inline void
> +CopyMultiInsertBuffer_RemoveBuffer(CopyMultiInsertInfo *miinfo, CopyMultiInsertBuffer *buffer)
> +{
> +    Oid        relid = buffer->relid;
> +    int        i = 0;
> +
> +    while (buffer->slots[i] != NULL)
> +        ExecClearTuple(buffer->slots[i++]);
> +    pfree(buffer->slots);
> +
> +    hash_search(miinfo->multiInsertBufferTab, (void *) &relid, HASH_REMOVE,
> +                NULL);
> +    miinfo->nbuffers--;
> +}

Hm, this leaves dangling slots, right?



> +static inline void
> +CopyMultiInsertInfo_SetCurrentBuffer(CopyMultiInsertInfo *miinfo,
> +                                     ResultRelInfo *rri)
> +{
> +    CopyMultiInsertBuffer *buffer;
> +    Oid        relid = RelationGetRelid(rri->ri_RelationDesc);
> +    bool    found;
> +
> +    Assert(miinfo->multiInsertBufferTab != NULL);
> +
> +    buffer = hash_search(miinfo->multiInsertBufferTab, &relid, HASH_ENTER, &found);
> +
> +    if (!found)
> +    {
> +        buffer->relid = relid;
> +        buffer->slots = palloc0(MAX_BUFFERED_TUPLES * sizeof(TupleTableSlot *));
> +        buffer->resultRelInfo = rri;
> +        buffer->nused = 0;
> +        miinfo->nbuffers++;
> +    }
> +
> +    miinfo->buffer = buffer;
> +}

It still seems wrong to me to just perform a second hashtable search
here, givent that we've already done the partition dispatch.



>          if (proute)
>              insertMethod = CIM_MULTI_CONDITIONAL;
>          else
>              insertMethod = CIM_MULTI;

Hm, why do we still have CIM_MULTI and CIM_MULTI_CONDITIONAL?


>                  /*
> -                 * Tests have shown that using multi-inserts when the
> -                 * partition changes on every tuple slightly decreases the
> -                 * performance, however, there are benefits even when only
> -                 * some batches have just 2 tuples, so let's enable
> -                 * multi-inserts even when the average is quite low.
> +                 * Enable multi-inserts when the partition has BEFORE/INSTEAD
> +                 * OF triggers, or if the partition is a foreign partition.
>                   */
>                  leafpart_use_multi_insert = insertMethod == CIM_MULTI_CONDITIONAL &&
> -                    avgTuplesPerPartChange >= 1.3 &&
> -                    !has_before_insert_row_trig &&
> -                    !has_instead_insert_row_trig &&
> -                    resultRelInfo->ri_FdwRoutine == NULL;
> +                                            !has_before_insert_row_trig &&
> +                                            !has_instead_insert_row_trig &&
> +                                            resultRelInfo->ri_FdwRoutine == NULL;

Not for now / this commit, but I kinda feel like we should determine
whether it's safe to multi-insert at a more centralized location.

Greetings,

Andres Freund



Re: COPY FROM WHEN condition

От
David Rowley
Дата:
On Tue, 2 Apr 2019 at 05:19, Andres Freund <andres@anarazel.de> wrote:
>
> On 2019-04-02 04:23:20 +1300, David Rowley wrote:
> > On Mon, 1 Apr 2019 at 08:05, Andres Freund <andres@anarazel.de> wrote:
> > > I'll work on pushing all the other pending tableam patches today -
> > > leaving COPY the last non pluggable part. You'd written in a private
> > > email that you might try to work on this on Monday, so I think I'll give
> > > this a shot on Tuesday if you've not gotten around till then? I'd like
> > > to push this sooner than the exact end of the freeze...
> >
> > I worked on this, but I've not got the patch finished yet.
>
> Thanks! I'm not quite clear whether you planning to continue working on
> this, or whether this is a handoff? Either is fine with me, just trying
> to avoid unnecessary work / delay.

I can, if you've not. I was hoping to gauge if you thought the
approach was worth pursuing.

> > Of course, it is possible that some of the bugs account for some of
> > the improved time...  but the rows did seem to be in the table
> > afterwards.
>
> The speedup likely seems to be from having much larger batches, right?
> Unless we insert into enough partitions that batching doesn't ever
> encounter two tuples, we'll now efficiently use multi_insert even if
> there's no consecutive tuples.

I imagine that's part of it, but I was surprised that the test that
inserts into the same partition also was improved fairly
significantly.

> > +static inline void
> > +CopyMultiInsertBuffer_RemoveBuffer(CopyMultiInsertInfo *miinfo, CopyMultiInsertBuffer *buffer)
> > +{
> > +     Oid             relid = buffer->relid;
> > +     int             i = 0;
> > +
> > +     while (buffer->slots[i] != NULL)
> > +             ExecClearTuple(buffer->slots[i++]);
> > +     pfree(buffer->slots);
> > +
> > +     hash_search(miinfo->multiInsertBufferTab, (void *) &relid, HASH_REMOVE,
> > +                             NULL);
> > +     miinfo->nbuffers--;
> > +}
>
> Hm, this leaves dangling slots, right?

Probably.  I didn't quite finish figuring out how a slot should have
all its memory released.

> It still seems wrong to me to just perform a second hashtable search
> here, givent that we've already done the partition dispatch.

The reason I thought this was a good idea is that if we use the
ResultRelInfo to buffer the tuples then there's no end to how many
tuple slots can exist as the code in copy.c has no control over how
many ResultRelInfos are created.

> >               if (proute)
> >                       insertMethod = CIM_MULTI_CONDITIONAL;
> >               else
> >                       insertMethod = CIM_MULTI;
>
> Hm, why do we still have CIM_MULTI and CIM_MULTI_CONDITIONAL?

We don't know what partitions will be foreign or have triggers that
don't allow us to multi-insert. Multi-inserts are still conditional
based on that.

> Not for now / this commit, but I kinda feel like we should determine
> whether it's safe to multi-insert at a more centralized location.

Yeah, I was thinking that might be nice but teaching those
Multi-insert functions about that means they'd need to handle
non-multi inserts too. That might only be a problem that highlights
that the functions I added are not named very well.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: COPY FROM WHEN condition

От
Andres Freund
Дата:
Hi,

On 2019-04-02 13:41:57 +1300, David Rowley wrote:
> On Tue, 2 Apr 2019 at 05:19, Andres Freund <andres@anarazel.de> wrote:
> >
> > On 2019-04-02 04:23:20 +1300, David Rowley wrote:
> > > On Mon, 1 Apr 2019 at 08:05, Andres Freund <andres@anarazel.de> wrote:
> > > > I'll work on pushing all the other pending tableam patches today -
> > > > leaving COPY the last non pluggable part. You'd written in a private
> > > > email that you might try to work on this on Monday, so I think I'll give
> > > > this a shot on Tuesday if you've not gotten around till then? I'd like
> > > > to push this sooner than the exact end of the freeze...
> > >
> > > I worked on this, but I've not got the patch finished yet.
> >
> > Thanks! I'm not quite clear whether you planning to continue working on
> > this, or whether this is a handoff? Either is fine with me, just trying
> > to avoid unnecessary work / delay.
> 
> I can, if you've not. I was hoping to gauge if you thought the
> approach was worth pursuing.

I think it's worth pursuing, with the caveats below. I'm going to focus
on docs the not-very-long rest of today, but I definitely could work on
this afterwards. But I also would welcome any help. Let me know...



> > > +static inline void
> > > +CopyMultiInsertBuffer_RemoveBuffer(CopyMultiInsertInfo *miinfo, CopyMultiInsertBuffer *buffer)
> > > +{
> > > +     Oid             relid = buffer->relid;
> > > +     int             i = 0;
> > > +
> > > +     while (buffer->slots[i] != NULL)
> > > +             ExecClearTuple(buffer->slots[i++]);
> > > +     pfree(buffer->slots);
> > > +
> > > +     hash_search(miinfo->multiInsertBufferTab, (void *) &relid, HASH_REMOVE,
> > > +                             NULL);
> > > +     miinfo->nbuffers--;
> > > +}
> >
> > Hm, this leaves dangling slots, right?
> 
> Probably.  I didn't quite finish figuring out how a slot should have
> all its memory released.

ExecDropSingleTupleTableSlot() ought to do the job (if not added to a
tuptable/estate - which we shouldn't as currently one-by-one removal
from therein is expensive).


> > It still seems wrong to me to just perform a second hashtable search
> > here, givent that we've already done the partition dispatch.
> 
> The reason I thought this was a good idea is that if we use the
> ResultRelInfo to buffer the tuples then there's no end to how many
> tuple slots can exist as the code in copy.c has no control over how
> many ResultRelInfos are created.

To me those aren't contradictory - we're going to have a ResultRelInfo
for each partition either way, but there's nothing preventing copy.c
from cleaning up subsidiary data in it.  What I was thinking is that
we'd just keep track of a list of ResultRelInfos with bulk insert slots,
and occasionally clean them up. That way we avoid the secondary lookup,
while also managing the amount of slots.


Greetings,

Andres Freund



Re: COPY FROM WHEN condition

От
David Rowley
Дата:
On Tue, 2 Apr 2019 at 13:59, Andres Freund <andres@anarazel.de> wrote:
>
> On 2019-04-02 13:41:57 +1300, David Rowley wrote:
> > On Tue, 2 Apr 2019 at 05:19, Andres Freund <andres@anarazel.de> wrote:
> > > Thanks! I'm not quite clear whether you planning to continue working on
> > > this, or whether this is a handoff? Either is fine with me, just trying
> > > to avoid unnecessary work / delay.
> >
> > I can, if you've not. I was hoping to gauge if you thought the
> > approach was worth pursuing.
>
> I think it's worth pursuing, with the caveats below. I'm going to focus
> on docs the not-very-long rest of today, but I definitely could work on
> this afterwards. But I also would welcome any help. Let me know...

I'm looking now. I'll post something when I get it into some better
shape than it us now.

> > > It still seems wrong to me to just perform a second hashtable search
> > > here, givent that we've already done the partition dispatch.
> >
> > The reason I thought this was a good idea is that if we use the
> > ResultRelInfo to buffer the tuples then there's no end to how many
> > tuple slots can exist as the code in copy.c has no control over how
> > many ResultRelInfos are created.
>
> To me those aren't contradictory - we're going to have a ResultRelInfo
> for each partition either way, but there's nothing preventing copy.c
> from cleaning up subsidiary data in it.  What I was thinking is that
> we'd just keep track of a list of ResultRelInfos with bulk insert slots,
> and occasionally clean them up. That way we avoid the secondary lookup,
> while also managing the amount of slots.

The problem that I see with that is you can't just add to that list
when the partition changes. You must check if the ResultRelInfo is
already in the list or not since we could change partitions and change
back again. For a list with just a few elements checking
list_member_ptr should be pretty cheap, but I randomly did choose that
we try to keep just the last 16 partitions worth of buffers. I don't
think checking list_member_ptr in a 16 element list is likely to be
faster than a hash table lookup, do you?

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: COPY FROM WHEN condition

От
Andres Freund
Дата:
Hi,

On 2019-04-02 14:06:52 +1300, David Rowley wrote:
> On Tue, 2 Apr 2019 at 13:59, Andres Freund <andres@anarazel.de> wrote:
> >
> > On 2019-04-02 13:41:57 +1300, David Rowley wrote:
> > > On Tue, 2 Apr 2019 at 05:19, Andres Freund <andres@anarazel.de> wrote:
> > > > Thanks! I'm not quite clear whether you planning to continue working on
> > > > this, or whether this is a handoff? Either is fine with me, just trying
> > > > to avoid unnecessary work / delay.
> > >
> > > I can, if you've not. I was hoping to gauge if you thought the
> > > approach was worth pursuing.
> >
> > I think it's worth pursuing, with the caveats below. I'm going to focus
> > on docs the not-very-long rest of today, but I definitely could work on
> > this afterwards. But I also would welcome any help. Let me know...
> 
> I'm looking now. I'll post something when I get it into some better
> shape than it us now.

Cool.


> > > > It still seems wrong to me to just perform a second hashtable search
> > > > here, givent that we've already done the partition dispatch.
> > >
> > > The reason I thought this was a good idea is that if we use the
> > > ResultRelInfo to buffer the tuples then there's no end to how many
> > > tuple slots can exist as the code in copy.c has no control over how
> > > many ResultRelInfos are created.
> >
> > To me those aren't contradictory - we're going to have a ResultRelInfo
> > for each partition either way, but there's nothing preventing copy.c
> > from cleaning up subsidiary data in it.  What I was thinking is that
> > we'd just keep track of a list of ResultRelInfos with bulk insert slots,
> > and occasionally clean them up. That way we avoid the secondary lookup,
> > while also managing the amount of slots.
> 
> The problem that I see with that is you can't just add to that list
> when the partition changes. You must check if the ResultRelInfo is
> already in the list or not since we could change partitions and change
> back again. For a list with just a few elements checking
> list_member_ptr should be pretty cheap, but I randomly did choose that
> we try to keep just the last 16 partitions worth of buffers. I don't
> think checking list_member_ptr in a 16 element list is likely to be
> faster than a hash table lookup, do you?

Why do we need that list membership check? If you append the
ResultRelInfo to the list when creating the ResultRelInfo's slots array,
you don't need to touch the list after a partition lookup - you know
it's a member if the ResultRelInfo has a slot array.  Then you only need
to iterate the list when you want to drop slots to avoid using too much
memory - and that's also a sequential scan of the hash table in your
approach, right?

Greetings,

Andres Freund



Re: COPY FROM WHEN condition

От
David Rowley
Дата:
On Tue, 2 Apr 2019 at 14:11, Andres Freund <andres@anarazel.de> wrote:
> Why do we need that list membership check? If you append the
> ResultRelInfo to the list when creating the ResultRelInfo's slots array,
> you don't need to touch the list after a partition lookup - you know
> it's a member if the ResultRelInfo has a slot array.  Then you only need
> to iterate the list when you want to drop slots to avoid using too much
> memory - and that's also a sequential scan of the hash table in your
> approach, right?

Okay, you're right, we could just determine if we've got a new
ResultRelInfo by the lack of any bulk insert slots being set in it.
However, I've ended up not doing it that way as the patch requires
more than just an array of TupleTableSlots to be stored in the
ResultRelInfo, it'll pretty much need all of what I have in
CopyMultiInsertBuffer, which includes line numbers for error
reporting, a BulkInsertState and a counter to track how many of the
slots are used.  I had thoughts that we could just tag the whole
CopyMultiInsertBuffer in ResultRelInfo, but that requires moving it
somewhere a bit more generic. Another thought would be that we have
something like "void *extra;" in ResultRelInfo that we can just borrow
for copy.c.  It may be interesting to try this out to see if it saves
much in the way of performance.

I've got the patch into a working state now and wrote a bunch of
comments. I've not really done a thorough read back of the code again.
I normally wait until the light is coming from a different angle
before doing that, but there does not seem to be much time to wait for
that in this case, so here's v2.  Performance seems to be about the
same as what I reported yesterday.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: COPY FROM WHEN condition

От
Andres Freund
Дата:
On 2019-04-03 06:41:49 +1300, David Rowley wrote:
> However, I've ended up not doing it that way as the patch requires
> more than just an array of TupleTableSlots to be stored in the
> ResultRelInfo, it'll pretty much need all of what I have in
> CopyMultiInsertBuffer, which includes line numbers for error
> reporting, a BulkInsertState and a counter to track how many of the
> slots are used.  I had thoughts that we could just tag the whole
> CopyMultiInsertBuffer in ResultRelInfo, but that requires moving it
> somewhere a bit more generic. Another thought would be that we have
> something like "void *extra;" in ResultRelInfo that we can just borrow
> for copy.c.  It may be interesting to try this out to see if it saves
> much in the way of performance.

Hm, we could just forwad declare struct CopyMultiInsertBuffer in a
header, and only define it in copy.c. That doesn't sound insane to me.


> I've got the patch into a working state now and wrote a bunch of
> comments. I've not really done a thorough read back of the code again.
> I normally wait until the light is coming from a different angle
> before doing that, but there does not seem to be much time to wait for
> that in this case, so here's v2.  Performance seems to be about the
> same as what I reported yesterday.

Cool.


> +/*
> + * Multi-Insert handling code for COPY FROM.  Inserts are performed in
> + * batches of up to MAX_BUFFERED_TUPLES.

This should probably be moved to the top of the file.


> + * When COPY FROM is used on a partitioned table, we attempt to maintain
> + * only MAX_PARTITION_BUFFERS buffers at once.  Buffers that are completely
> + * unused in each batch are removed so that we end up not keeping buffers for
> + * partitions that we might not insert into again.
> + */
> +#define MAX_BUFFERED_TUPLES        1000
> +#define MAX_BUFFERED_BYTES        65535
> +#define MAX_PARTITION_BUFFERS    15

> +/*
> + * CopyMultiInsertBuffer
> + *        Stores multi-insert data related to a single relation in CopyFrom.
> + */
> +typedef struct

Please don't create anonymous structs - they can't be forward declared,
and some debugging tools handle them worse than named structs. And it
makes naming CopyMultiInsertBuffer in the comment less necessary.


> +{
> +    Oid            relid;            /* Relation ID this insert data is for */
> +    TupleTableSlot **slots;        /* Array of MAX_BUFFERED_TUPLES to store
> +                                 * tuples */
> +    uint64       *linenos;        /* Line # of tuple in copy stream */

It could make sense to allocate those two together, or even as part of
the CopyMultiInsertBuffer itself, to reduce allocator overhead.


> +    else
> +    {
> +        CopyMultiInsertBuffer *buffer = (CopyMultiInsertBuffer *) palloc(sizeof(CopyMultiInsertBuffer));
> +
> +        /* Non-partitioned table.  Just setup the buffer for the table. */
> +        buffer->relid = RelationGetRelid(rri->ri_RelationDesc);
> +        buffer->slots = palloc0(MAX_BUFFERED_TUPLES * sizeof(TupleTableSlot *));
> +        buffer->linenos = palloc(MAX_BUFFERED_TUPLES * sizeof(uint64));
> +        buffer->resultRelInfo = rri;
> +        buffer->bistate = GetBulkInsertState();
> +        buffer->nused = 0;
> +        miinfo->multiInsertBufferTab = NULL;
> +        miinfo->buffer = buffer;
> +        miinfo->nbuffers = 1;
> +    }

Can this be moved into a helper function?


> +    /*
> +     * heap_multi_insert leaks memory, so switch to short-lived memory context
> +     * before calling it.
> +     */

s/heap_multi_insert/table_multi_insert/



> +    /*
> +     * If there are any indexes, update them for all the inserted tuples, and
> +     * run AFTER ROW INSERT triggers.
> +     */
> +    if (resultRelInfo->ri_NumIndices > 0)
> +    {
> +        for (i = 0; i < nBufferedTuples; i++)
> +        {
> +            List       *recheckIndexes;
> +
> +            cstate->cur_lineno = buffer->linenos[i];
> +            recheckIndexes =
> +                ExecInsertIndexTuples(buffer->slots[i], estate, false, NULL,
> +                                      NIL);
> +            ExecARInsertTriggers(estate, resultRelInfo,
> +                                 buffer->slots[i],
> +                                 recheckIndexes, cstate->transition_capture);
> +            list_free(recheckIndexes);
> +        }
> +    }
> +
> +    /*
> +     * There's no indexes, but see if we need to run AFTER ROW INSERT triggers
> +     * anyway.
> +     */
> +    else if (resultRelInfo->ri_TrigDesc != NULL &&
> +             (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
> +              resultRelInfo->ri_TrigDesc->trig_insert_new_table))
> +    {
> +        for (i = 0; i < nBufferedTuples; i++)
> +        {
> +            cstate->cur_lineno = buffer->linenos[i];
> +            ExecARInsertTriggers(estate, resultRelInfo,
> +                                 bufferedSlots[i],
> +                                 NIL, cstate->transition_capture);
> +        }
> +    }

> +    for (i = 0; i < nBufferedTuples; i++)
> +        ExecClearTuple(bufferedSlots[i]);

I wonder about combining these loops somehow. But it's probably ok.


> +/*
> + * CopyMultiInsertBuffer_Cleanup
> + *        Drop used slots and free member for this buffer.  The buffer
> + *        must be flushed before cleanup.
> + */
> +static inline void
> +CopyMultiInsertBuffer_Cleanup(CopyMultiInsertBuffer *buffer)
> +{
> +    int            i;
> +
> +    ReleaseBulkInsertStatePin(buffer->bistate);

Shouldn't this FreeBulkInsertState() rather than
ReleaseBulkInsertStatePin()?


> +
> +/*
> + * CopyMultiInsertBuffer_RemoveBuffer
> + *        Remove a buffer from being tracked by miinfo
> + */
> +static inline void
> +CopyMultiInsertBuffer_RemoveBuffer(CopyMultiInsertInfo *miinfo,
> +                                   CopyMultiInsertBuffer *buffer)
> +{
> +    Oid            relid = buffer->relid;
> +
> +    CopyMultiInsertBuffer_Cleanup(buffer);
> +
> +    hash_search(miinfo->multiInsertBufferTab, (void *) &relid, HASH_REMOVE,
> +                NULL);
> +    miinfo->nbuffers--;

Aren't we leaking the CopyMultiInsertBuffer itself here?

> +}
> +
> +/*
> + * CopyMultiInsertInfo_Flush
> + *        Write out all stored tuples in all buffers out to the tables.
> + *
> + * To save us from ending up with buffers for 1000s of partitions we remove
> + * buffers belonging to partitions that we've seen no tuples for in this batch
> + */
> +static inline void
> +CopyMultiInsertInfo_Flush(CopyMultiInsertInfo *miinfo, CopyState cstate,
> +                          EState *estate, CommandId mycid, int ti_options)
> +{
> +    CopyMultiInsertBuffer *buffer;
> +
> +    /*
> +     * If just storing buffers for a non-partitioned table, then just flush
> +     * that buffer.
> +     */
> +    if (miinfo->multiInsertBufferTab == NULL)
> +    {
> +        buffer = miinfo->buffer;
> +
> +        CopyMultiInsertInfo_FlushSingleBuffer(buffer, cstate, estate, mycid,
> +                                              ti_options);
> +    }
> +    else
> +    {
> +        HASH_SEQ_STATUS status;
> +
> +        /*
> +         * Otherwise make a pass over the hash table and flush all buffers
> +         * that have any tuples stored in them.
> +         */
> +        hash_seq_init(&status, miinfo->multiInsertBufferTab);
> +
> +        while ((buffer = (CopyMultiInsertBuffer *) hash_seq_search(&status)) != NULL)
> +        {
> +            if (buffer->nused > 0)
> +            {
> +                /* Flush the buffer if it was used */
> +                CopyMultiInsertInfo_FlushSingleBuffer(buffer, cstate, estate,
> +                                                      mycid, ti_options);
> +            }
> +            else
> +            {
> +                /*
> +                 * Otherwise just remove it.  If we saw no tuples for it this
> +                 * batch, then likely its best to make way for buffers for
> +                 * other partitions.
> +                 */
> +                CopyMultiInsertBuffer_RemoveBuffer(miinfo, buffer);
> +            }
> +        }
> +    }
> +
> +    miinfo->bufferedTuples = 0;
> +    miinfo->bufferedBytes = 0;
> +}
> +
> +/*
> + * CopyMultiInsertInfo_Cleanup
> + *        Cleanup allocated buffers and free memory
> + */
> +static inline void
> +CopyMultiInsertInfo_Cleanup(CopyMultiInsertInfo *miinfo)
> +{
> +    if (miinfo->multiInsertBufferTab == NULL)
> +        CopyMultiInsertBuffer_Cleanup(miinfo->buffer);
> +    else
> +    {
> +        HASH_SEQ_STATUS status;
> +        CopyMultiInsertBuffer *buffer;
> +
> +        hash_seq_init(&status, miinfo->multiInsertBufferTab);
> +
> +        while ((buffer = (CopyMultiInsertBuffer *) hash_seq_search(&status)) != NULL)
> +        {
> +            Assert(buffer->nused == 0);
> +            CopyMultiInsertBuffer_Cleanup(buffer);
> +        }
> +
> +        hash_destroy(miinfo->multiInsertBufferTab);
> +    }
> +}
> +
> +/*
> + * CopyMultiInsertInfo_NextFreeSlot
> + *        Get the next TupleTableSlot that the next tuple should be stored in.
> + *
> + * Callers must ensure that the buffer is not full.
> + */
> +static inline TupleTableSlot *
> +CopyMultiInsertInfo_NextFreeSlot(CopyMultiInsertInfo *miinfo,
> +                                 ResultRelInfo *rri)
> +{
> +    CopyMultiInsertBuffer *buffer = miinfo->buffer;
> +    int            nused = buffer->nused;
> +
> +    Assert(nused < MAX_BUFFERED_TUPLES);
> +
> +    if (buffer->slots[nused] == NULL)
> +        buffer->slots[nused] = table_slot_create(rri->ri_RelationDesc, NULL);
> +    return buffer->slots[nused];
> +}
> +
> +/*
> + * CopyMultiInsertInfo_Store
> + *        Consume the previously reserved TupleTableSlot that was reserved by
> + *        CopyMultiInsertInfo_NextFreeSlot.
> + */
> +static inline void
> +CopyMultiInsertInfo_Store(CopyMultiInsertInfo *miinfo, TupleTableSlot *slot,
> +                          int tuplen, uint64 lineno)
> +{
> +    CopyMultiInsertBuffer *buffer = miinfo->buffer;
> +
> +    Assert(slot == buffer->slots[buffer->nused]);
> +
> +    /* Store the line number so we can properly report any errors later */
> +    buffer->linenos[buffer->nused] = lineno;
> +
> +    /* Record this slot as being used */
> +    buffer->nused++;
> +
> +    /* Update how many tuples are stored and their size */
> +    miinfo->bufferedTuples++;
> +    miinfo->bufferedBytes += tuplen;
> +}
> +
>  /*
>   * Copy FROM file to relation.
>   */
>  uint64
>  CopyFrom(CopyState cstate)
>  {
> -    HeapTuple    tuple;
> -    TupleDesc    tupDesc;
> -    Datum       *values;
> -    bool       *nulls;
>      ResultRelInfo *resultRelInfo;
>      ResultRelInfo *target_resultRelInfo;
>      ResultRelInfo *prevResultRelInfo = NULL;
>      EState       *estate = CreateExecutorState(); /* for ExecConstraints() */
>      ModifyTableState *mtstate;
>      ExprContext *econtext;
> -    TupleTableSlot *myslot;
> +    TupleTableSlot *singleslot = NULL;
>      MemoryContext oldcontext = CurrentMemoryContext;
> -    MemoryContext batchcontext;
>  
>      PartitionTupleRouting *proute = NULL;
>      ErrorContextCallback errcallback;
>      CommandId    mycid = GetCurrentCommandId(true);
>      int            ti_options = 0; /* start with default table_insert options */
> -    BulkInsertState bistate;
> +    BulkInsertState bistate = NULL;
>      CopyInsertMethod insertMethod;
> +    CopyMultiInsertInfo multiInsertInfo;
>      uint64        processed = 0;
> -    int            nBufferedTuples = 0;
>      bool        has_before_insert_row_trig;
>      bool        has_instead_insert_row_trig;
>      bool        leafpart_use_multi_insert = false;
>  
> -#define MAX_BUFFERED_TUPLES 1000
> -#define RECHECK_MULTI_INSERT_THRESHOLD 1000
> -    HeapTuple  *bufferedTuples = NULL;    /* initialize to silence warning */
> -    Size        bufferedTuplesSize = 0;
> -    uint64        firstBufferedLineNo = 0;
> -    uint64        lastPartitionSampleLineNo = 0;
> -    uint64        nPartitionChanges = 0;
> -    double        avgTuplesPerPartChange = 0;
> -
>      Assert(cstate->rel);
>  
> +    memset(&multiInsertInfo, 0, sizeof(CopyMultiInsertInfo));
> +
>      /*
>       * The target must be a plain, foreign, or partitioned relation, or have
>       * an INSTEAD OF INSERT row trigger.  (Currently, such triggers are only
> @@ -2382,8 +2767,6 @@ CopyFrom(CopyState cstate)
>                              RelationGetRelationName(cstate->rel))));
>      }
>  
> -    tupDesc = RelationGetDescr(cstate->rel);
> -
>      /*----------
>       * Check to see if we can avoid writing WAL
>       *
> @@ -2467,8 +2850,8 @@ CopyFrom(CopyState cstate)
>          if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
>          {
>              ereport(ERROR,
> -                (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> -                    errmsg("cannot perform FREEZE on a partitioned table")));
> +                    (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> +                     errmsg("cannot perform FREEZE on a partitioned table")));
>          }
>  
>          /*
> @@ -2518,10 +2901,6 @@ CopyFrom(CopyState cstate)
>  
>      ExecInitRangeTable(estate, cstate->range_table);
>  
> -    /* Set up a tuple slot too */
> -    myslot = ExecInitExtraTupleSlot(estate, tupDesc,
> -                                    &TTSOpsHeapTuple);
> -
>      /*
>       * Set up a ModifyTableState so we can let FDW(s) init themselves for
>       * foreign-table result relation(s).
> @@ -2565,10 +2944,11 @@ CopyFrom(CopyState cstate)
>                                                  &mtstate->ps);
>  
>      /*
> -     * It's more efficient to prepare a bunch of tuples for insertion, and
> -     * insert them in one heap_multi_insert() call, than call heap_insert()
> -     * separately for every tuple. However, there are a number of reasons why
> -     * we might not be able to do this.  These are explained below.
> +     * It's generally more efficient to prepare a bunch of tuples for
> +     * insertion, and insert them in one table_multi_insert() call, than call
> +     * table_insert() separately for every tuple. However, there are a number
> +     * of reasons why we might not be able to do this.  These are explained
> +     * below.
>       */
>      if (resultRelInfo->ri_TrigDesc != NULL &&
>          (resultRelInfo->ri_TrigDesc->trig_insert_before_row ||
> @@ -2589,8 +2969,8 @@ CopyFrom(CopyState cstate)
>           * For partitioned tables we can't support multi-inserts when there
>           * are any statement level insert triggers. It might be possible to
>           * allow partitioned tables with such triggers in the future, but for
> -         * now, CopyFromInsertBatch expects that any before row insert and
> -         * statement level insert triggers are on the same relation.
> +         * now, CopyMultiInsertInfo_Flush expects that any before row insert
> +         * and statement level insert triggers are on the same relation.
>           */
>          insertMethod = CIM_SINGLE;
>      }
> @@ -2622,8 +3002,7 @@ CopyFrom(CopyState cstate)
>      {
>          /*
>           * For partitioned tables, we may still be able to perform bulk
> -         * inserts for sets of consecutive tuples which belong to the same
> -         * partition.  However, the possibility of this depends on which types
> +         * inserts.  However, the possibility of this depends on which types
>           * of triggers exist on the partition.  We must disable bulk inserts
>           * if the partition is a foreign table or it has any before row insert
>           * or insert instead triggers (same as we checked above for the parent
> @@ -2632,18 +3011,27 @@ CopyFrom(CopyState cstate)
>           * have the intermediate insert method of CIM_MULTI_CONDITIONAL to
>           * flag that we must later determine if we can use bulk-inserts for
>           * the partition being inserted into.
> -         *
> -         * Normally, when performing bulk inserts we just flush the insert
> -         * buffer whenever it becomes full, but for the partitioned table
> -         * case, we flush it whenever the current tuple does not belong to the
> -         * same partition as the previous tuple.
>           */
>          if (proute)
>              insertMethod = CIM_MULTI_CONDITIONAL;
>          else
>              insertMethod = CIM_MULTI;
>  
> -        bufferedTuples = palloc(MAX_BUFFERED_TUPLES * sizeof(HeapTuple));
> +        CopyMultiInsertInfo_Init(&multiInsertInfo, resultRelInfo,
> +                                 proute != NULL);
> +    }
> +
> +    /*
> +     * If not using batch mode (which allocates slots as needed) set up a
> +     * tuple slot too. When inserting into a partitioned table, we also need
> +     * one, even if we might batch insert, to read the tuple in the root
> +     * partition's form.
> +     */
> +    if (insertMethod == CIM_SINGLE || insertMethod == CIM_MULTI_CONDITIONAL)
> +    {
> +        singleslot = table_slot_create(resultRelInfo->ri_RelationDesc,
> +                                       &estate->es_tupleTable);
> +        bistate = GetBulkInsertState();
>      }
>  
>      has_before_insert_row_trig = (resultRelInfo->ri_TrigDesc &&
> @@ -2660,10 +3048,6 @@ CopyFrom(CopyState cstate)
>       */
>      ExecBSInsertTriggers(estate, resultRelInfo);
>  
> -    values = (Datum *) palloc(tupDesc->natts * sizeof(Datum));
> -    nulls = (bool *) palloc(tupDesc->natts * sizeof(bool));
> -
> -    bistate = GetBulkInsertState();
>      econtext = GetPerTupleExprContext(estate);
>  
>      /* Set up callback to identify error line number */
> @@ -2672,17 +3056,9 @@ CopyFrom(CopyState cstate)
>      errcallback.previous = error_context_stack;
>      error_context_stack = &errcallback;
>  
> -    /*
> -     * Set up memory context for batches. For cases without batching we could
> -     * use the per-tuple context, but it's simpler to just use it every time.
> -     */
> -    batchcontext = AllocSetContextCreate(CurrentMemoryContext,
> -                                         "batch context",
> -                                         ALLOCSET_DEFAULT_SIZES);
> -
>      for (;;)
>      {
> -        TupleTableSlot *slot;
> +        TupleTableSlot *myslot;
>          bool        skip_tuple;
>  
>          CHECK_FOR_INTERRUPTS();
> @@ -2693,20 +3069,33 @@ CopyFrom(CopyState cstate)
>           */
>          ResetPerTupleExprContext(estate);
>  
> +        if (insertMethod == CIM_SINGLE || proute)
> +        {
> +            myslot = singleslot;
> +            Assert(myslot != NULL);
> +        }
> +        else
> +        {
> +            Assert(resultRelInfo == target_resultRelInfo);
> +            Assert(insertMethod == CIM_MULTI);
> +
> +            myslot = CopyMultiInsertInfo_NextFreeSlot(&multiInsertInfo,
> +                                                      resultRelInfo);
> +        }
> +
>          /*
>           * Switch to per-tuple context before calling NextCopyFrom, which does
>           * evaluate default expressions etc. and requires per-tuple context.
>           */
>          MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
>  
> -        if (!NextCopyFrom(cstate, econtext, values, nulls))
> -            break;
> +        ExecClearTuple(myslot);
>  
> -        /* Switch into per-batch memory context before forming the tuple. */
> -        MemoryContextSwitchTo(batchcontext);
> +        /* Directly store the values/nulls array in the slot */
> +        if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
> +            break;
>  
> -        /* And now we can form the input tuple. */
> -        tuple = heap_form_tuple(tupDesc, values, nulls);
> +        ExecStoreVirtualTuple(myslot);
>  
>          /*
>           * Constraints might reference the tableoid column, so (re-)initialize
> @@ -2717,18 +3106,15 @@ CopyFrom(CopyState cstate)
>          /* Triggers and stuff need to be invoked in query context. */
>          MemoryContextSwitchTo(oldcontext);
>  
> -        /* Place tuple in tuple slot --- but slot shouldn't free it */
> -        slot = myslot;
> -        ExecStoreHeapTuple(tuple, slot, false);
> -
>          if (cstate->whereClause)
>          {
>              econtext->ecxt_scantuple = myslot;
> +            /* Skip items that don't match the COPY's WHERE clause */
>              if (!ExecQual(cstate->qualexpr, econtext))
>                  continue;
>          }
>  
> -        /* Determine the partition to heap_insert the tuple into */
> +        /* Determine the partition to table_insert the tuple into */
>          if (proute)
>          {
>              TupleConversionMap *map;
> @@ -2739,80 +3125,10 @@ CopyFrom(CopyState cstate)
>               * if the found partition is not suitable for INSERTs.
>               */
>              resultRelInfo = ExecFindPartition(mtstate, target_resultRelInfo,
> -                                              proute, slot, estate);
> +                                              proute, myslot, estate);
>  
>              if (prevResultRelInfo != resultRelInfo)
>              {
> -                /* Check if we can multi-insert into this partition */
> -                if (insertMethod == CIM_MULTI_CONDITIONAL)
> -                {
> -                    /*
> -                     * When performing bulk-inserts into partitioned tables we
> -                     * must insert the tuples seen so far to the heap whenever
> -                     * the partition changes.
> -                     */
> -                    if (nBufferedTuples > 0)
> -                    {
> -                        MemoryContext    oldcontext;
> -
> -                        CopyFromInsertBatch(cstate, estate, mycid, ti_options,
> -                                            prevResultRelInfo, myslot, bistate,
> -                                            nBufferedTuples, bufferedTuples,
> -                                            firstBufferedLineNo);
> -                        nBufferedTuples = 0;
> -                        bufferedTuplesSize = 0;
> -
> -                        /*
> -                         * The tuple is already allocated in the batch context, which
> -                         * we want to reset.  So to keep the tuple we copy it into the
> -                         * short-lived (per-tuple) context, reset the batch context
> -                         * and then copy it back into the per-batch one.
> -                         */
> -                        oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
> -                        tuple = heap_copytuple(tuple);
> -                        MemoryContextSwitchTo(oldcontext);
> -
> -                        /* cleanup the old batch */
> -                        MemoryContextReset(batchcontext);
> -
> -                        /* copy the tuple back to the per-batch context */
> -                        oldcontext = MemoryContextSwitchTo(batchcontext);
> -                        tuple = heap_copytuple(tuple);
> -                        MemoryContextSwitchTo(oldcontext);
> -
> -                        /*
> -                         * Also push the tuple copy to the slot (resetting the context
> -                         * invalidated the slot contents).
> -                         */
> -                        ExecStoreHeapTuple(tuple, slot, false);
> -                    }
> -
> -                    nPartitionChanges++;
> -
> -                    /*
> -                     * Here we adaptively enable multi-inserts based on the
> -                     * average number of tuples from recent multi-insert
> -                     * batches.  We recalculate the average every
> -                     * RECHECK_MULTI_INSERT_THRESHOLD tuples instead of taking
> -                     * the average over the whole copy.  This allows us to
> -                     * enable multi-inserts when we get periods in the copy
> -                     * stream that have tuples commonly belonging to the same
> -                     * partition, and disable when the partition is changing
> -                     * too often.
> -                     */
> -                    if (unlikely(lastPartitionSampleLineNo <= (cstate->cur_lineno -
> -                                                               RECHECK_MULTI_INSERT_THRESHOLD)
> -                                 && cstate->cur_lineno >= RECHECK_MULTI_INSERT_THRESHOLD))
> -                    {
> -                        avgTuplesPerPartChange =
> -                            (cstate->cur_lineno - lastPartitionSampleLineNo) /
> -                            (double) nPartitionChanges;
> -
> -                        lastPartitionSampleLineNo = cstate->cur_lineno;
> -                        nPartitionChanges = 0;
> -                    }
> -                }
> -
>                  /* Determine which triggers exist on this partition */
>                  has_before_insert_row_trig = (resultRelInfo->ri_TrigDesc &&
>                                                resultRelInfo->ri_TrigDesc->trig_insert_before_row);
> @@ -2821,22 +3137,19 @@ CopyFrom(CopyState cstate)
>                                                 resultRelInfo->ri_TrigDesc->trig_insert_instead_row);
>  
>                  /*
> -                 * Tests have shown that using multi-inserts when the
> -                 * partition changes on every tuple slightly decreases the
> -                 * performance, however, there are benefits even when only
> -                 * some batches have just 2 tuples, so let's enable
> -                 * multi-inserts even when the average is quite low.
> +                 * Enable multi-inserts when the partition has BEFORE/INSTEAD
> +                 * OF triggers, or if the partition is a foreign partition.
>                   */
>                  leafpart_use_multi_insert = insertMethod == CIM_MULTI_CONDITIONAL &&
> -                    avgTuplesPerPartChange >= 1.3 &&
>                      !has_before_insert_row_trig &&
>                      !has_instead_insert_row_trig &&
>                      resultRelInfo->ri_FdwRoutine == NULL;
>  
> -                /*
> -                 * We'd better make the bulk insert mechanism gets a new
> -                 * buffer when the partition being inserted into changes.
> -                 */
> +                /* Set the multi-insert buffer to use for this partition. */
> +                if (leafpart_use_multi_insert)
> +                    CopyMultiInsertInfo_SetCurrentBuffer(&multiInsertInfo,
> +                                                         resultRelInfo);
> +
>                  ReleaseBulkInsertStatePin(bistate);
>                  prevResultRelInfo = resultRelInfo;
>              }
> @@ -2879,26 +3192,48 @@ CopyFrom(CopyState cstate)
>               * rowtype.
>               */
>              map = resultRelInfo->ri_PartitionInfo->pi_RootToPartitionMap;
> -            if (map != NULL)
> +            if (insertMethod == CIM_SINGLE || !leafpart_use_multi_insert)
>              {
> -                TupleTableSlot *new_slot;
> -                MemoryContext oldcontext;
> -
> -                new_slot = resultRelInfo->ri_PartitionInfo->pi_PartitionTupleSlot;
> -                Assert(new_slot != NULL);
> -
> -                slot = execute_attr_map_slot(map->attrMap, slot, new_slot);
> +                /* non batch insert */
> +                if (map != NULL)
> +                {
> +                    TupleTableSlot *new_slot;
>  
> +                    new_slot = resultRelInfo->ri_PartitionInfo->pi_PartitionTupleSlot;
> +                    myslot = execute_attr_map_slot(map->attrMap, myslot, new_slot);
> +                }
> +            }
> +            else
> +            {
>                  /*
> -                 * Get the tuple in the per-batch context, so that it will be
> -                 * freed after each batch insert.
> +                 * Batch insert into partitioned table.
>                   */
> -                oldcontext = MemoryContextSwitchTo(batchcontext);
> -                tuple = ExecCopySlotHeapTuple(slot);
> -                MemoryContextSwitchTo(oldcontext);
> +                TupleTableSlot *nextslot;
> +
> +                /* no other path available for partitioned table */
> +                Assert(insertMethod == CIM_MULTI_CONDITIONAL);
> +
> +                nextslot = CopyMultiInsertInfo_NextFreeSlot(&multiInsertInfo,
> +                                                            resultRelInfo);
> +
> +                if (map != NULL)
> +                    myslot = execute_attr_map_slot(map->attrMap, myslot, nextslot);
> +                else
> +                {
> +                    /*
> +                     * This looks more expensive than it is (Believe me, I
> +                     * optimized it away. Twice). The input is in virtual
> +                     * form, and we'll materialize the slot below - for most
> +                     * slot types the copy performs the work materialization
> +                     * would later require anyway.
> +                     */
> +                    ExecCopySlot(nextslot, myslot);
> +                    myslot = nextslot;
> +                }
>              }
>  
> -            slot->tts_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
> +            /* ensure that triggers etc see the right relation  */
> +            myslot->tts_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
>          }
>  
>          skip_tuple = false;
> @@ -2906,7 +3241,7 @@ CopyFrom(CopyState cstate)
>          /* BEFORE ROW INSERT Triggers */
>          if (has_before_insert_row_trig)
>          {
> -            if (!ExecBRInsertTriggers(estate, resultRelInfo, slot))
> +            if (!ExecBRInsertTriggers(estate, resultRelInfo, myslot))
>                  skip_tuple = true;    /* "do nothing" */
>          }
>  
> @@ -2919,7 +3254,7 @@ CopyFrom(CopyState cstate)
>               */
>              if (has_instead_insert_row_trig)
>              {
> -                ExecIRInsertTriggers(estate, resultRelInfo, slot);
> +                ExecIRInsertTriggers(estate, resultRelInfo, myslot);
>              }
>              else
>              {
> @@ -2931,12 +3266,7 @@ CopyFrom(CopyState cstate)
>                   */
>                  if (resultRelInfo->ri_RelationDesc->rd_att->constr &&
>                      resultRelInfo->ri_RelationDesc->rd_att->constr->has_generated_stored)
> -                {
> -                    ExecComputeStoredGenerated(estate, slot);
> -                    MemoryContextSwitchTo(batchcontext);
> -                    tuple = ExecCopySlotHeapTuple(slot);
> -                    MemoryContextSwitchTo(oldcontext);
> -                }
> +                    ExecComputeStoredGenerated(estate, myslot);
>  
>                  /*
>                   * If the target is a plain table, check the constraints of
> @@ -2944,7 +3274,7 @@ CopyFrom(CopyState cstate)
>                   */
>                  if (resultRelInfo->ri_FdwRoutine == NULL &&
>                      resultRelInfo->ri_RelationDesc->rd_att->constr)
> -                    ExecConstraints(resultRelInfo, slot, estate);
> +                    ExecConstraints(resultRelInfo, myslot, estate);
>  
>                  /*
>                   * Also check the tuple against the partition constraint, if
> @@ -2954,7 +3284,7 @@ CopyFrom(CopyState cstate)
>                   */
>                  if (resultRelInfo->ri_PartitionCheck &&
>                      (proute == NULL || has_before_insert_row_trig))
> -                    ExecPartitionCheck(resultRelInfo, slot, estate, true);
> +                    ExecPartitionCheck(resultRelInfo, myslot, estate, true);
>  
>                  /*
>                   * Perform multi-inserts when enabled, or when loading a
> @@ -2963,31 +3293,21 @@ CopyFrom(CopyState cstate)
>                   */
>                  if (insertMethod == CIM_MULTI || leafpart_use_multi_insert)
>                  {
> -                    /* Add this tuple to the tuple buffer */
> -                    if (nBufferedTuples == 0)
> -                        firstBufferedLineNo = cstate->cur_lineno;
> -                    bufferedTuples[nBufferedTuples++] = tuple;
> -                    bufferedTuplesSize += tuple->t_len;
> -
>                      /*
> -                     * If the buffer filled up, flush it.  Also flush if the
> -                     * total size of all the tuples in the buffer becomes
> -                     * large, to avoid using large amounts of memory for the
> -                     * buffer when the tuples are exceptionally wide.
> +                     * The slot previously might point into the per-tuple
> +                     * context. For batching it needs to be longer lived.
>                       */
> -                    if (nBufferedTuples == MAX_BUFFERED_TUPLES ||
> -                        bufferedTuplesSize > 65535)
> -                    {
> -                        CopyFromInsertBatch(cstate, estate, mycid, ti_options,
> -                                            resultRelInfo, myslot, bistate,
> -                                            nBufferedTuples, bufferedTuples,
> -                                            firstBufferedLineNo);
> -                        nBufferedTuples = 0;
> -                        bufferedTuplesSize = 0;
> -
> -                        /* free memory occupied by tuples from the batch */
> -                        MemoryContextReset(batchcontext);
> -                    }
> +                    ExecMaterializeSlot(myslot);
> +
> +                    /* Add this tuple to the tuple buffer */
> +                    CopyMultiInsertInfo_Store(&multiInsertInfo, myslot,
> +                                              cstate->line_buf.len,
> +                                              cstate->cur_lineno);
> +
> +                    /* If the buffer filled up, flush it. */
> +                    if (CopyMultiInsertInfo_IsFull(&multiInsertInfo))
> +                        CopyMultiInsertInfo_Flush(&multiInsertInfo, cstate,
> +                                                  estate, mycid, ti_options);
>                  }
>                  else
>                  {
> @@ -2996,12 +3316,12 @@ CopyFrom(CopyState cstate)
>                      /* OK, store the tuple */
>                      if (resultRelInfo->ri_FdwRoutine != NULL)
>                      {
> -                        slot = resultRelInfo->ri_FdwRoutine->ExecForeignInsert(estate,
> -                                                                               resultRelInfo,
> -                                                                               slot,
> -                                                                               NULL);
> +                        myslot = resultRelInfo->ri_FdwRoutine->ExecForeignInsert(estate,
> +                                                                                 resultRelInfo,
> +                                                                                 myslot,
> +                                                                                 NULL);
>  
> -                        if (slot == NULL)    /* "do nothing" */
> +                        if (myslot == NULL) /* "do nothing" */
>                              continue;    /* next tuple please */
>  
>                          /*
> @@ -3009,27 +3329,26 @@ CopyFrom(CopyState cstate)
>                           * column, so (re-)initialize tts_tableOid before
>                           * evaluating them.
>                           */
> -                        slot->tts_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
> +                        myslot->tts_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
>                      }
>                      else
>                      {
> -                        tuple = ExecFetchSlotHeapTuple(slot, true, NULL);
> -                        heap_insert(resultRelInfo->ri_RelationDesc, tuple,
> -                                    mycid, ti_options, bistate);
> -                        ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
> -                        slot->tts_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
> +                        /* OK, store the tuple and create index entries for it */
> +                        table_insert(resultRelInfo->ri_RelationDesc, myslot,
> +                                     mycid, ti_options, bistate);
>                      }
>  
> +
>                      /* And create index entries for it */
>                      if (resultRelInfo->ri_NumIndices > 0)
> -                        recheckIndexes = ExecInsertIndexTuples(slot,
> +                        recheckIndexes = ExecInsertIndexTuples(myslot,
>                                                                 estate,
>                                                                 false,
>                                                                 NULL,
>                                                                 NIL);
>  
>                      /* AFTER ROW INSERT Triggers */
> -                    ExecARInsertTriggers(estate, resultRelInfo, slot,
> +                    ExecARInsertTriggers(estate, resultRelInfo, myslot,
>                                           recheckIndexes, cstate->transition_capture);
>  
>                      list_free(recheckIndexes);
> @@ -3045,32 +3364,25 @@ CopyFrom(CopyState cstate)
>          }
>      }
>  
> -    /* Flush any remaining buffered tuples */
> -    if (nBufferedTuples > 0)
> +    if (insertMethod != CIM_SINGLE)
>      {
> -        if (insertMethod == CIM_MULTI_CONDITIONAL)
> -        {
> -            CopyFromInsertBatch(cstate, estate, mycid, ti_options,
> -                                prevResultRelInfo, myslot, bistate,
> -                                nBufferedTuples, bufferedTuples,
> -                                firstBufferedLineNo);
> -        }
> -        else
> -            CopyFromInsertBatch(cstate, estate, mycid, ti_options,
> -                                resultRelInfo, myslot, bistate,
> -                                nBufferedTuples, bufferedTuples,
> -                                firstBufferedLineNo);
> +        /* Flush any remaining buffered tuples */
> +        if (!CopyMultiInsertInfo_IsEmpty(&multiInsertInfo))
> +            CopyMultiInsertInfo_Flush(&multiInsertInfo, cstate, estate, mycid,
> +                                      ti_options);
> +
> +        /* Tear down the multi-insert data */
> +        CopyMultiInsertInfo_Cleanup(&multiInsertInfo);
>      }
>  
>      /* Done, clean up */
>      error_context_stack = errcallback.previous;
>  
> -    FreeBulkInsertState(bistate);
> +    if (bistate != NULL)
> +        ReleaseBulkInsertStatePin(bistate);
>  
>      MemoryContextSwitchTo(oldcontext);
>  
> -    MemoryContextDelete(batchcontext);
> -
>      /*
>       * In the old protocol, tell pqcomm that we can process normal protocol
>       * messages again.
> @@ -3084,9 +3396,6 @@ CopyFrom(CopyState cstate)
>      /* Handle queued AFTER triggers */
>      AfterTriggerEndQuery(estate);
>  
> -    pfree(values);
> -    pfree(nulls);
> -
>      ExecResetTupleTable(estate->es_tupleTable, false);
>  
>      /* Allow the FDW to shut down */
> @@ -3111,88 +3420,6 @@ CopyFrom(CopyState cstate)
>      return processed;
>  }
>  
> -/*
> - * A subroutine of CopyFrom, to write the current batch of buffered heap
> - * tuples to the heap. Also updates indexes and runs AFTER ROW INSERT
> - * triggers.
> - */
> -static void
> -CopyFromInsertBatch(CopyState cstate, EState *estate, CommandId mycid,
> -                    int ti_options, ResultRelInfo *resultRelInfo,
> -                    TupleTableSlot *myslot, BulkInsertState bistate,
> -                    int nBufferedTuples, HeapTuple *bufferedTuples,
> -                    uint64 firstBufferedLineNo)
> -{
> -    MemoryContext oldcontext;
> -    int            i;
> -    uint64        save_cur_lineno;
> -    bool        line_buf_valid = cstate->line_buf_valid;
> -
> -    /*
> -     * Print error context information correctly, if one of the operations
> -     * below fail.
> -     */
> -    cstate->line_buf_valid = false;
> -    save_cur_lineno = cstate->cur_lineno;
> -
> -    /*
> -     * heap_multi_insert leaks memory, so switch to short-lived memory context
> -     * before calling it.
> -     */
> -    oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
> -    heap_multi_insert(resultRelInfo->ri_RelationDesc,
> -                      bufferedTuples,
> -                      nBufferedTuples,
> -                      mycid,
> -                      ti_options,
> -                      bistate);
> -    MemoryContextSwitchTo(oldcontext);
> -
> -    /*
> -     * If there are any indexes, update them for all the inserted tuples, and
> -     * run AFTER ROW INSERT triggers.
> -     */
> -    if (resultRelInfo->ri_NumIndices > 0)
> -    {
> -        for (i = 0; i < nBufferedTuples; i++)
> -        {
> -            List       *recheckIndexes;
> -
> -            cstate->cur_lineno = firstBufferedLineNo + i;
> -            ExecStoreHeapTuple(bufferedTuples[i], myslot, false);
> -            recheckIndexes =
> -                ExecInsertIndexTuples(myslot,
> -                                      estate, false, NULL, NIL);
> -            ExecARInsertTriggers(estate, resultRelInfo,
> -                                 myslot,
> -                                 recheckIndexes, cstate->transition_capture);
> -            list_free(recheckIndexes);
> -        }
> -    }
> -
> -    /*
> -     * There's no indexes, but see if we need to run AFTER ROW INSERT triggers
> -     * anyway.
> -     */
> -    else if (resultRelInfo->ri_TrigDesc != NULL &&
> -             (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
> -              resultRelInfo->ri_TrigDesc->trig_insert_new_table))
> -    {
> -        for (i = 0; i < nBufferedTuples; i++)
> -        {
> -            cstate->cur_lineno = firstBufferedLineNo + i;
> -            ExecStoreHeapTuple(bufferedTuples[i], myslot, false);
> -            ExecARInsertTriggers(estate, resultRelInfo,
> -                                 myslot,
> -                                 NIL, cstate->transition_capture);
> -        }
> -    }
> -
> -    /* reset cur_lineno and line_buf_valid to what they were */
> -    cstate->line_buf_valid = line_buf_valid;
> -    cstate->cur_lineno = save_cur_lineno;
> -}
> -
>  /*
>   * Setup to read tuples from a file for COPY FROM.
>   *
> @@ -4990,11 +5217,8 @@ copy_dest_receive(TupleTableSlot *slot, DestReceiver *self)
>      DR_copy    *myState = (DR_copy *) self;
>      CopyState    cstate = myState->cstate;
>  
> -    /* Make sure the tuple is fully deconstructed */
> -    slot_getallattrs(slot);
> -
> -    /* And send the data */
> -    CopyOneRowTo(cstate, slot->tts_values, slot->tts_isnull);
> +    /* Send the data */
> +    CopyOneRowTo(cstate, slot);
>      myState->processed++;
>  
>      return true;
> diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
> index 4c077755d5..ed0e2de144 100644
> --- a/src/include/access/heapam.h
> +++ b/src/include/access/heapam.h
> @@ -36,6 +36,7 @@
>  #define HEAP_INSERT_SPECULATIVE 0x0010
>  
>  typedef struct BulkInsertStateData *BulkInsertState;
> +struct TupleTableSlot;
>  
>  #define MaxLockTupleMode    LockTupleExclusive
>  
> @@ -143,7 +144,7 @@ extern void ReleaseBulkInsertStatePin(BulkInsertState bistate);
>  
>  extern void heap_insert(Relation relation, HeapTuple tup, CommandId cid,
>              int options, BulkInsertState bistate);
> -extern void heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
> +extern void heap_multi_insert(Relation relation, struct TupleTableSlot **slots, int ntuples,
>                    CommandId cid, int options, BulkInsertState bistate);
>  extern TM_Result heap_delete(Relation relation, ItemPointer tid,
>              CommandId cid, Snapshot crosscheck, bool wait,
> diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
> index 4efe178ed1..c2fdedc551 100644
> --- a/src/include/access/tableam.h
> +++ b/src/include/access/tableam.h
> @@ -328,6 +328,9 @@ typedef struct TableAmRoutine
>       * ------------------------------------------------------------------------
>       */
>  
> +    void        (*multi_insert) (Relation rel, TupleTableSlot **slots, int nslots,
> +                                 CommandId cid, int options, struct BulkInsertStateData *bistate);
> +
>      /* see table_insert() for reference about parameters */
>      void        (*tuple_insert) (Relation rel, TupleTableSlot *slot,
>                                   CommandId cid, int options,
> @@ -1157,6 +1160,17 @@ table_update(Relation rel, ItemPointer otid, TupleTableSlot *slot,
>                                           lockmode, update_indexes);
>  }
>  
> +/*
> + *    table_multi_insert    - insert multiple tuple into a table
> + */
> +static inline void
> +table_multi_insert(Relation rel, TupleTableSlot **slots, int nslots,
> +                   CommandId cid, int options, struct BulkInsertStateData *bistate)
> +{
> +    rel->rd_tableam->multi_insert(rel, slots, nslots,
> +                                  cid, options, bistate);
> +}
> +
>  /*
>   * Lock a tuple in the specified mode.
>   *

Greetings,

Andres Freund




Re: COPY FROM WHEN condition

От
Andres Freund
Дата:
On 2019-04-03 06:41:49 +1300, David Rowley wrote:
> However, I've ended up not doing it that way as the patch requires
> more than just an array of TupleTableSlots to be stored in the
> ResultRelInfo, it'll pretty much need all of what I have in
> CopyMultiInsertBuffer, which includes line numbers for error
> reporting, a BulkInsertState and a counter to track how many of the
> slots are used.  I had thoughts that we could just tag the whole
> CopyMultiInsertBuffer in ResultRelInfo, but that requires moving it
> somewhere a bit more generic. Another thought would be that we have
> something like "void *extra;" in ResultRelInfo that we can just borrow
> for copy.c.  It may be interesting to try this out to see if it saves
> much in the way of performance.

Hm, we could just forwad declare struct CopyMultiInsertBuffer in a
header, and only define it in copy.c. That doesn't sound insane to me.


> I've got the patch into a working state now and wrote a bunch of
> comments. I've not really done a thorough read back of the code again.
> I normally wait until the light is coming from a different angle
> before doing that, but there does not seem to be much time to wait for
> that in this case, so here's v2.  Performance seems to be about the
> same as what I reported yesterday.

Cool.


> +/*
> + * Multi-Insert handling code for COPY FROM.  Inserts are performed in
> + * batches of up to MAX_BUFFERED_TUPLES.

This should probably be moved to the top of the file.


> + * When COPY FROM is used on a partitioned table, we attempt to maintain
> + * only MAX_PARTITION_BUFFERS buffers at once.  Buffers that are completely
> + * unused in each batch are removed so that we end up not keeping buffers for
> + * partitions that we might not insert into again.
> + */
> +#define MAX_BUFFERED_TUPLES        1000
> +#define MAX_BUFFERED_BYTES        65535
> +#define MAX_PARTITION_BUFFERS    15

> +/*
> + * CopyMultiInsertBuffer
> + *        Stores multi-insert data related to a single relation in CopyFrom.
> + */
> +typedef struct

Please don't create anonymous structs - they can't be forward declared,
and some debugging tools handle them worse than named structs. And it
makes naming CopyMultiInsertBuffer in the comment less necessary.


> +{
> +    Oid            relid;            /* Relation ID this insert data is for */
> +    TupleTableSlot **slots;        /* Array of MAX_BUFFERED_TUPLES to store
> +                                 * tuples */
> +    uint64       *linenos;        /* Line # of tuple in copy stream */

It could make sense to allocate those two together, or even as part of
the CopyMultiInsertBuffer itself, to reduce allocator overhead.


> +    else
> +    {
> +        CopyMultiInsertBuffer *buffer = (CopyMultiInsertBuffer *) palloc(sizeof(CopyMultiInsertBuffer));
> +
> +        /* Non-partitioned table.  Just setup the buffer for the table. */
> +        buffer->relid = RelationGetRelid(rri->ri_RelationDesc);
> +        buffer->slots = palloc0(MAX_BUFFERED_TUPLES * sizeof(TupleTableSlot *));
> +        buffer->linenos = palloc(MAX_BUFFERED_TUPLES * sizeof(uint64));
> +        buffer->resultRelInfo = rri;
> +        buffer->bistate = GetBulkInsertState();
> +        buffer->nused = 0;
> +        miinfo->multiInsertBufferTab = NULL;
> +        miinfo->buffer = buffer;
> +        miinfo->nbuffers = 1;
> +    }

Can this be moved into a helper function?


> +    /*
> +     * heap_multi_insert leaks memory, so switch to short-lived memory context
> +     * before calling it.
> +     */

s/heap_multi_insert/table_multi_insert/



> +    /*
> +     * If there are any indexes, update them for all the inserted tuples, and
> +     * run AFTER ROW INSERT triggers.
> +     */
> +    if (resultRelInfo->ri_NumIndices > 0)
> +    {
> +        for (i = 0; i < nBufferedTuples; i++)
> +        {
> +            List       *recheckIndexes;
> +
> +            cstate->cur_lineno = buffer->linenos[i];
> +            recheckIndexes =
> +                ExecInsertIndexTuples(buffer->slots[i], estate, false, NULL,
> +                                      NIL);
> +            ExecARInsertTriggers(estate, resultRelInfo,
> +                                 buffer->slots[i],
> +                                 recheckIndexes, cstate->transition_capture);
> +            list_free(recheckIndexes);
> +        }
> +    }
> +
> +    /*
> +     * There's no indexes, but see if we need to run AFTER ROW INSERT triggers
> +     * anyway.
> +     */
> +    else if (resultRelInfo->ri_TrigDesc != NULL &&
> +             (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
> +              resultRelInfo->ri_TrigDesc->trig_insert_new_table))
> +    {
> +        for (i = 0; i < nBufferedTuples; i++)
> +        {
> +            cstate->cur_lineno = buffer->linenos[i];
> +            ExecARInsertTriggers(estate, resultRelInfo,
> +                                 bufferedSlots[i],
> +                                 NIL, cstate->transition_capture);
> +        }
> +    }

> +    for (i = 0; i < nBufferedTuples; i++)
> +        ExecClearTuple(bufferedSlots[i]);

I wonder about combining these loops somehow. But it's probably ok.


> +/*
> + * CopyMultiInsertBuffer_Cleanup
> + *        Drop used slots and free member for this buffer.  The buffer
> + *        must be flushed before cleanup.
> + */
> +static inline void
> +CopyMultiInsertBuffer_Cleanup(CopyMultiInsertBuffer *buffer)
> +{
> +    int            i;
> +
> +    ReleaseBulkInsertStatePin(buffer->bistate);

Shouldn't this FreeBulkInsertState() rather than
ReleaseBulkInsertStatePin()?


> +
> +/*
> + * CopyMultiInsertBuffer_RemoveBuffer
> + *        Remove a buffer from being tracked by miinfo
> + */
> +static inline void
> +CopyMultiInsertBuffer_RemoveBuffer(CopyMultiInsertInfo *miinfo,
> +                                   CopyMultiInsertBuffer *buffer)
> +{
> +    Oid            relid = buffer->relid;
> +
> +    CopyMultiInsertBuffer_Cleanup(buffer);
> +
> +    hash_search(miinfo->multiInsertBufferTab, (void *) &relid, HASH_REMOVE,
> +                NULL);
> +    miinfo->nbuffers--;

Aren't we leaking the CopyMultiInsertBuffer itself here?



> +/*
> + * CopyMultiInsertInfo_Flush
> + *        Write out all stored tuples in all buffers out to the tables.
> + *
> + * To save us from ending up with buffers for 1000s of partitions we remove
> + * buffers belonging to partitions that we've seen no tuples for in this batch

That seems a little naive (imagine you have like 5 partitions, and we
constantly cycle through 2-3 of them per batch).  It's probably OK for
this version.   I guess I'd only do that cleanup if we're actually
flushing due to the number of partitions.



>          if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
>          {
>              ereport(ERROR,
> -                (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> -                    errmsg("cannot perform FREEZE on a partitioned table")));
> +                    (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> +                     errmsg("cannot perform FREEZE on a partitioned table")));
>          }

accidental hunk?

Greetings,

Andres Freund



Re: COPY FROM WHEN condition

От
Andres Freund
Дата:
Gah, ignore this mail - I've somehow accidentally sent this early. I'm
not quite sure, but I miskeyed, and emacs sent it early, without going
through mutt...



Re: COPY FROM WHEN condition

От
David Rowley
Дата:
(Fixed all of what you mentioned)

On Wed, 3 Apr 2019 at 06:57, Andres Freund <andres@anarazel.de> wrote:
> > +/*
> > + * CopyMultiInsertInfo_Flush
> > + *           Write out all stored tuples in all buffers out to the tables.
> > + *
> > + * To save us from ending up with buffers for 1000s of partitions we remove
> > + * buffers belonging to partitions that we've seen no tuples for in this batch
>
> That seems a little naive (imagine you have like 5 partitions, and we
> constantly cycle through 2-3 of them per batch).  It's probably OK for
> this version.   I guess I'd only do that cleanup if we're actually
> flushing due to the number of partitions.

hmm good point.  It seems like being smarter there would be a good thing.

I've ended up getting rid of the hash table in favour of the List that
you mentioned and storing the buffer in ResultRelInfo. I also changed
the logic that removes buffers once we reach the limit.  Instead of
getting rid of buffers that were not used on this run, I've changed it
so it just gets rid of the buffers starting with the oldest one first,
but stops once the number of buffers is at the maximum again.  This
can mean that we end up with MAX_BUFFERED_TUPLES buffers instead of
MAX_PARTITION_BUFFERS if there is only 1 tuple per buffer.  My current
thinking is that this does not matter since only 1 slot will be
allocated per buffer.  We'll remove all of the excess buffers during
the flush and keep just MAX_PARTITION_BUFFERS of the newest buffers.

Also, after changing CopyMultiInsertBuffer to use fixed sized arrays
instead of allocating them with another palloc the performance has
improved a bit more.

Using the perl files mentioned in [1]

Master + Patched:
# copy listp from program $$perl ~/bench_same.pl$$ delimiter '|';
COPY 35651564
Time: 9106.776 ms (00:09.107)
# truncate table listp;
TRUNCATE TABLE
# copy listp from program $$perl ~/bench.pl$$ delimiter '|';
COPY 35651564
Time: 10154.196 ms (00:10.154)


Master only:
# copy listp from program $$perl ~/bench_same.pl$$ delimiter '|';
COPY 35651564
Time: 22200.535 ms (00:22.201)
# truncate table listp;
TRUNCATE TABLE
# copy listp from program $$perl ~/bench.pl$$ delimiter '|';
COPY 35651564
Time: 18592.107 ms (00:18.592)

> >               if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
> >               {
> >                       ereport(ERROR,
> > -                             (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> > -                                     errmsg("cannot perform FREEZE on a partitioned table")));
> > +                                     (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> > +                                      errmsg("cannot perform FREEZE on a partitioned table")));
> >               }
>
> accidental hunk?

It was left over from a pgindent run. Now removed.

[1] https://www.postgresql.org/message-id/CAKJS1f98Fa+QRTGKwqbtz0M=Cy1EHYR8Q-W08cpA78tOy4euKQ@mail.gmail.com

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: COPY FROM WHEN condition

От
Andres Freund
Дата:
On 2019-04-03 18:44:32 +1300, David Rowley wrote:
> (Fixed all of what you mentioned)
> 
> On Wed, 3 Apr 2019 at 06:57, Andres Freund <andres@anarazel.de> wrote:
> > > +/*
> > > + * CopyMultiInsertInfo_Flush
> > > + *           Write out all stored tuples in all buffers out to the tables.
> > > + *
> > > + * To save us from ending up with buffers for 1000s of partitions we remove
> > > + * buffers belonging to partitions that we've seen no tuples for in this batch
> >
> > That seems a little naive (imagine you have like 5 partitions, and we
> > constantly cycle through 2-3 of them per batch).  It's probably OK for
> > this version.   I guess I'd only do that cleanup if we're actually
> > flushing due to the number of partitions.
> 
> hmm good point.  It seems like being smarter there would be a good thing.
> 
> I've ended up getting rid of the hash table in favour of the List that
> you mentioned and storing the buffer in ResultRelInfo.


Cool.

> I also changed
> the logic that removes buffers once we reach the limit.  Instead of
> getting rid of buffers that were not used on this run, I've changed it
> so it just gets rid of the buffers starting with the oldest one first,
> but stops once the number of buffers is at the maximum again.  This
> can mean that we end up with MAX_BUFFERED_TUPLES buffers instead of
> MAX_PARTITION_BUFFERS if there is only 1 tuple per buffer.  My current
> thinking is that this does not matter since only 1 slot will be
> allocated per buffer.  We'll remove all of the excess buffers during
> the flush and keep just MAX_PARTITION_BUFFERS of the newest buffers.

Yea, that makes sense to me.


> Also, after changing CopyMultiInsertBuffer to use fixed sized arrays
> instead of allocating them with another palloc the performance has
> improved a bit more.
> 
> Using the perl files mentioned in [1]
> 
> Master + Patched:
> # copy listp from program $$perl ~/bench_same.pl$$ delimiter '|';
> COPY 35651564
> Time: 9106.776 ms (00:09.107)
> # truncate table listp;
> TRUNCATE TABLE
> # copy listp from program $$perl ~/bench.pl$$ delimiter '|';
> COPY 35651564
> Time: 10154.196 ms (00:10.154)
> 
> 
> Master only:
> # copy listp from program $$perl ~/bench_same.pl$$ delimiter '|';
> COPY 35651564
> Time: 22200.535 ms (00:22.201)
> # truncate table listp;
> TRUNCATE TABLE
> # copy listp from program $$perl ~/bench.pl$$ delimiter '|';
> COPY 35651564
> Time: 18592.107 ms (00:18.592)

Awesome. I'm probably too tired to give you meaningful feedback
tonight. But I'll do a quick readthrough.



> +/* Class the buffer full if there are >= this many bytes of tuples stored */
> +#define MAX_BUFFERED_BYTES        65535

Random aside: This seems pretty small (but should be changed separately.


> +/* Trim the list of buffers back down to this number after flushing */
> +#define MAX_PARTITION_BUFFERS    32
> +
> +/* Stores multi-insert data related to a single relation in CopyFrom. */
> +typedef struct CopyMultiInsertBuffer
> +{
> +    TupleTableSlot *slots[MAX_BUFFERED_TUPLES]; /* Array to store tuples */
> +    ResultRelInfo *resultRelInfo;    /* ResultRelInfo for 'relid' */
> +    BulkInsertState bistate;    /* BulkInsertState for this rel */
> +    int            nused;            /* number of 'slots' containing tuples */
> +    uint64        linenos[MAX_BUFFERED_TUPLES];    /* Line # of tuple in copy
> +                                                 * stream */
> +} CopyMultiInsertBuffer;

I don't think it's needed now, but you'd probably achieve a bit better
locality by storing slots + linenos in a single array of (slot,lineno).

> +/*
> + * CopyMultiInsertBuffer_Init
> + *        Allocate memory and initialize a new CopyMultiInsertBuffer for this
> + *        ResultRelInfo.
> + */
> +static CopyMultiInsertBuffer *
> +CopyMultiInsertBuffer_Init(ResultRelInfo *rri)
> +{
> +    CopyMultiInsertBuffer *buffer;
> +
> +    buffer = (CopyMultiInsertBuffer *) palloc(sizeof(CopyMultiInsertBuffer));
> +    memset(buffer->slots, 0, sizeof(TupleTableSlot *) * MAX_BUFFERED_TUPLES);

Is there a reason to not just directly palloc0?


> +/*
> + * CopyMultiInsertBuffer_Cleanup
> + *        Drop used slots and free member for this buffer.  The buffer
> + *        must be flushed before cleanup.
> + */
> +static inline void
> +CopyMultiInsertBuffer_Cleanup(CopyMultiInsertBuffer *buffer)
> +{
> +    int            i;
> +
> +    /* Ensure buffer was flushed */
> +    Assert(buffer->nused == 0);
> +
> +    /* Remove back-link to ourself */
> +    buffer->resultRelInfo->ri_CopyMultiInsertBuffer = NULL;
> +
> +    ReleaseBulkInsertStatePin(buffer->bistate);

Hm, afaict this still leaks the bistate itself?


Greetings,

Andres Freund



Re: COPY FROM WHEN condition

От
David Rowley
Дата:
On Wed, 3 Apr 2019 at 18:56, Andres Freund <andres@anarazel.de> wrote:
> > +/* Class the buffer full if there are >= this many bytes of tuples stored */
> > +#define MAX_BUFFERED_BYTES           65535
>
> Random aside: This seems pretty small (but should be changed separately.

Yeah, my fingers were hovering over this. I was close to making it
1MB, but thought I'd better not.

> > +typedef struct CopyMultiInsertBuffer
> > +{
> > +     TupleTableSlot *slots[MAX_BUFFERED_TUPLES]; /* Array to store tuples */
> > +     ResultRelInfo *resultRelInfo;   /* ResultRelInfo for 'relid' */
> > +     BulkInsertState bistate;        /* BulkInsertState for this rel */
> > +     int                     nused;                  /* number of 'slots' containing tuples */
> > +     uint64          linenos[MAX_BUFFERED_TUPLES];   /* Line # of tuple in copy
> > +                                                                                              * stream */
> > +} CopyMultiInsertBuffer;
>
> I don't think it's needed now, but you'd probably achieve a bit better
> locality by storing slots + linenos in a single array of (slot,lineno).

You're right, but right now I only zero the slots array and leave the
linenos uninitialised. Merging them to one would double the area of
memory to zero.  I'm not sure if that's worse or not, but I do know if
the number of partitions exceeds MAX_PARTITION_BUFFERS and we change
partitions quite a lot during the copy then we could end up
initialising buffers quite often. I wanted to keep that as cheap as
possible.

> > +/*
> > + * CopyMultiInsertBuffer_Init
> > + *           Allocate memory and initialize a new CopyMultiInsertBuffer for this
> > + *           ResultRelInfo.
> > + */
> > +static CopyMultiInsertBuffer *
> > +CopyMultiInsertBuffer_Init(ResultRelInfo *rri)
> > +{
> > +     CopyMultiInsertBuffer *buffer;
> > +
> > +     buffer = (CopyMultiInsertBuffer *) palloc(sizeof(CopyMultiInsertBuffer));
> > +     memset(buffer->slots, 0, sizeof(TupleTableSlot *) * MAX_BUFFERED_TUPLES);
>
> Is there a reason to not just directly palloc0?

Yeah, I'm too cheap to zero the entire thing, per what I mentioned
above. linenos does not really need anything meaningful put there
during init. It'll get set when we consume the slots.

> > +
> > +     /* Remove back-link to ourself */
> > +     buffer->resultRelInfo->ri_CopyMultiInsertBuffer = NULL;
> > +
> > +     ReleaseBulkInsertStatePin(buffer->bistate);
>
> Hm, afaict this still leaks the bistate itself?

Oops, I forgot about that one. v4 attached.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: COPY FROM WHEN condition

От
Andres Freund
Дата:
Hi,

On 2019-04-03 20:00:09 +1300, David Rowley wrote:
> Oops, I forgot about that one. v4 attached.

I'm pretty happy with this. I'm doing some minor changes (e.g. don't
like the function comment formatting that much, the tableam callback
needs docs, stuff like that), and then I'm going to push it tomorrow.

I'm planning to attribute it to you, me, and Haribabu Kommi.


Oh, btw, is there a reason you're memset(0)'ing multiInsertInfo? Seems
unnecessary, given that in all cases we're using it we're going to do
CopyMultiInsertInfo_Init(). And IME valgrind and the compiler are more
helpful when you don't just default initialize, because then they can
tell when you forgot to initialize a field (say like
CopyMultiInsertInfo_Init not initializing nbuffers).

Greetings,

Andres Freund



Re: COPY FROM WHEN condition

От
David Rowley
Дата:
On Thu, 4 Apr 2019 at 18:20, Andres Freund <andres@anarazel.de> wrote:
> I'm pretty happy with this. I'm doing some minor changes (e.g. don't
> like the function comment formatting that much, the tableam callback
> needs docs, stuff like that), and then I'm going to push it tomorrow.
>
> I'm planning to attribute it to you, me, and Haribabu Kommi.

Sounds good.

>
> Oh, btw, is there a reason you're memset(0)'ing multiInsertInfo? Seems
> unnecessary, given that in all cases we're using it we're going to do
> CopyMultiInsertInfo_Init(). And IME valgrind and the compiler are more
> helpful when you don't just default initialize, because then they can
> tell when you forgot to initialize a field (say like
> CopyMultiInsertInfo_Init not initializing nbuffers).

At some point when I was hacking at it, I was getting a warning about
it being possibly uninitialised. I don't recall exactly how the code
was then, but I just tried removing it and I don't get any warning
now. So probably fine to remove.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: COPY FROM WHEN condition

От
Andres Freund
Дата:
On 2019-04-03 22:20:00 -0700, Andres Freund wrote:
> Hi,
> 
> On 2019-04-03 20:00:09 +1300, David Rowley wrote:
> > Oops, I forgot about that one. v4 attached.
> 
> I'm pretty happy with this. I'm doing some minor changes (e.g. don't
> like the function comment formatting that much, the tableam callback
> needs docs, stuff like that), and then I'm going to push it tomorrow.

After another read, I also think I'm going to rename the functions a
bit. CopyMultiInsertInfo_SetupBuffer, with its mix of camel-case and
underscores, just doesn't seem to mix well with copy.c and also just
generally other postgres code. I feel we already have too many different
naming conventions...

Greetings,

Andres Freund



Re: COPY FROM WHEN condition

От
Andres Freund
Дата:
Hi,

On 2019-04-04 12:04:28 -0700, Andres Freund wrote:
> On 2019-04-03 22:20:00 -0700, Andres Freund wrote:
> > On 2019-04-03 20:00:09 +1300, David Rowley wrote:
> > > Oops, I forgot about that one. v4 attached.
> > 
> > I'm pretty happy with this. I'm doing some minor changes (e.g. don't
> > like the function comment formatting that much, the tableam callback
> > needs docs, stuff like that), and then I'm going to push it tomorrow.
> 
> After another read, I also think I'm going to rename the functions a
> bit. CopyMultiInsertInfo_SetupBuffer, with its mix of camel-case and
> underscores, just doesn't seem to mix well with copy.c and also just
> generally other postgres code. I feel we already have too many different
> naming conventions...

I've pushed this now. Besides those naming changes, I'd to re-add the
zero initialization (I did end up seing compiler warnings when compiling
with optimizations).  Also some comment fixes.

I added one more flush location, when inserting into a partition that
cannot use batching - there might e.g. be triggers looking at rows in
other partitions or such.

I found some pretty minor slowdown for COPYing narrow rows into an
unlogged unpartitioned table. That got better by combining the loops in
CopyMultiInsertBufferFlush() (previously there were stalls due to the
indirect function calls for ExecCleartuple()). I think there's still a
tiny slowdown left in that scenario, but given that it's unlogged and
very narrow rows, I think that's ok.

Greetings,

Andres Freund



Re: COPY FROM WHEN condition

От
David Rowley
Дата:
On Fri, 5 Apr 2019 at 12:32, Andres Freund <andres@anarazel.de> wrote:
> I've pushed this now. Besides those naming changes, I'd to re-add the
> zero initialization (I did end up seing compiler warnings when compiling
> with optimizations).  Also some comment fixes.

Thanks for pushing.

> I added one more flush location, when inserting into a partition that
> cannot use batching - there might e.g. be triggers looking at rows in
> other partitions or such.

Good catch.

I read through the final commit to see what had changed and I noticed
that I had forgotten to remove nbuffers from CopyMultiInsertInfo when
changing from a hash table to a List.

Patch to fix is attached.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: COPY FROM WHEN condition

От
Andres Freund
Дата:
Hi,

On 2019-04-05 12:59:06 +1300, David Rowley wrote:
> I read through the final commit to see what had changed and I noticed
> that I had forgotten to remove nbuffers from CopyMultiInsertInfo when
> changing from a hash table to a List.
> 
> Patch to fix is attached.

Pushed, thanks.