Обсуждение: ModifyTable overheads in generic plans

Поиск
Список
Период
Сортировка

ModifyTable overheads in generic plans

От
Amit Langote
Дата:
Hi,

I would like to discuss a refactoring patch that builds on top of the
patches at [1] to address $subject.  To get an idea for what
eliminating these overheads looks like, take a look at the following
benchmarking results.

Note 1: I've forced the use of generic plan by setting plan_cache_mode
to 'force_generic_plan'

Note 2: The individual TPS figures as measured are quite noisy, though
I just want to show the rough trend with increasing number of
partitions.

pgbench -i -s 10 --partitions={0, 10, 100, 1000}
pgbench -T120 -f test.sql -M prepared

test.sql:
\set aid random(1, 1000000)
update pgbench_accounts set abalance = abalance + 1 where aid = :aid;

Without any of the patches:

0       tps = 13045.485121 (excluding connections establishing)
10      tps = 9358.157433 (excluding connections establishing)
100     tps = 1878.274500 (excluding connections establishing)
1000    tps = 84.684695 (excluding connections establishing)

The slowdown as the partition count increases can be explained by the
fact that UPDATE and DELETE can't currently use runtime partition
pruning.  So, even if any given transaction is only updating a single
tuple in a single partition, the plans for *all* partitions are being
initialized and also the ResultRelInfos.  That is, a lot of useless
work being done in InitPlan() and ExecInitModifyTable().

With the patches at [1] (latest 0001+0002 posted there), whereby the
generic plan for UPDATE can now perform runtime pruning, numbers can
be seen to improve, slightly:

0       tps = 12743.487196 (excluding connections establishing)
10      tps = 12644.240748 (excluding connections establishing)
100     tps = 4158.123345 (excluding connections establishing)
1000    tps = 391.248067 (excluding connections establishing)

So even though runtime pruning enabled by those patches ensures that
the useless plans are left untouched by the executor, the
ResultRelInfos are still being made assuming *all* result relations
will be processed.  With the attached patches (0001+0002+0003) that I
want to discuss here in this thread, numbers are further improved:

0       tps = 13419.283168 (excluding connections establishing)
10      tps = 12588.016095 (excluding connections establishing)
100     tps = 8560.824225 (excluding connections establishing)
1000    tps = 1926.553901 (excluding connections establishing)

0001 and 0002 are preparatory patches.  0003 teaches nodeModifyTable.c
to make the ResultRelInfo for a given result relation lazily, that is,
when the plan producing tuples to be updated/deleted actually produces
one that belongs to that relation.  So, if a transaction only updates
one tuple, then only one ResultRelInfo would be made.  For larger
partition counts, that saves significant amount of work.

However, there's one new loop in ExecInitModifyTable() added by the
patches at [1] that loops over all partitions, which I haven't been
able to eliminate so far and I'm seeing it cause significant
bottleneck at higher partition counts.  The loop is meant to create a
hash table that maps result relation OIDs to their offsets in the
PlannedStmt.resultRelations list.  We need this mapping, because the
ResultRelInfos are accessed from the query-global array using that
offset.  One approach that was mentioned by David Rowley at [1] to not
have do this mapping is to make the result relation's scan node's
targetlist emit the relation's RT index or ordinal position to begin
with, instead of the table OID, but I haven't figured out a way to do
that.

Having taken care of the ModifyTable overheads (except the one
mentioned in the last paragraph), a few more bottlenecks are seen to
pop up at higher partition counts.  Basically, they result from doing
some pre-execution actions on relations contained in the plan by
traversing the flat range table in whole.

1. AcquireExecutorLocks(): locks *all* partitions before executing the
plan tree but runtime pruning allows to skip scanning all but one

2. ExecCheckRTPerms(): checks permissions of *all* partitions before
executing the plan tree, but maybe it's okay to check only the ones
that will be accessed

Problem 1 has been discussed before and David Rowley even developed a
patch that was discussed at [2].  The approach taken in the patch was
to delay locking of the partitions contained in a generic plan that
are potentially runtime pruneable, although as also described in the
linked thread, that approach has a race condition whereby a concurrent
session may invalidate the generic plan by altering a partition in the
window between when AcquireExecutorLocks() runs on the plan and the
plan is executed.

Another solution suggested to me by Robert Haas in an off-list
discussion is to teach AcquireExecutorLocks() or the nearby code to
perform EXTERN parameter based pruning before passing the plan tree to
the executor and lock partitions that survive that pruning.  It's
perhaps doable if we refactor the ExecFindInitialMatchingSubPlans() to
not require a full-blown execution context.  Or maybe we could do
something more invasive by rewriting AcquireExecutorLocks() to walk
the plan tree instead of the flat range table, looking for scan nodes
and nodes that support runtime pruning to lock the appropriate
relations.

Regarding problem 2, I wonder if we shouldn't simply move the
permission check to ExecGetRangeTableRelation(), which will be
performed the first time a given relation is accessed during
execution.

Anyway, applying David's aforementioned patch gives the following numbers:

0       tps = 12325.890487 (excluding connections establishing)
10      tps = 12146.420443 (excluding connections establishing)
100     tps = 12807.850709 (excluding connections establishing)
1000    tps = 7578.652893 (excluding connections establishing)

which suggests that it might be worthwhile try to find a solution for this.

Finally, there's one more place that shows up in perf profiles at
higher partition counts and that is LockReleaseAll().  David,
Tsunakawa-san had worked on a patch [3], which still applies and can
be shown to be quite beneficial when generic plans are involved.  I
couldn't get it to show major improvement over the above numbers in
this case (for UPDATE that is), but maybe that's because the loop in
ExecInitModifyTable() mentioned above is still in the way.

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

[1] https://commitfest.postgresql.org/28/2575/
[2]
https://www.postgresql.org/message-id/flat/CAKJS1f_kfRQ3ZpjQyHC7%3DPK9vrhxiHBQFZ%2Bhc0JCwwnRKkF3hg%40mail.gmail.com
[3]
https://www.postgresql.org/message-id/flat/CAKJS1f-7T9F1xLw5PqgOApcV6YX3WYC4XJHHCpxh8hzcZsA-xA%40mail.gmail.com#c57f2f592484bca76310f4c551d4de15

Вложения

Re: ModifyTable overheads in generic plans

От
Amit Langote
Дата:
On Fri, Jun 26, 2020 at 9:36 PM Amit Langote <amitlangote09@gmail.com> wrote:
> I would like to discuss a refactoring patch that builds on top of the
> patches at [1] to address $subject.

I've added this to the next CF: https://commitfest.postgresql.org/28/2621/

-- 
Amit Langote
EnterpriseDB: http://www.enterprisedb.com



Re: ModifyTable overheads in generic plans

От
David Rowley
Дата:
On Sat, 27 Jun 2020 at 00:36, Amit Langote <amitlangote09@gmail.com> wrote:
> 2. ExecCheckRTPerms(): checks permissions of *all* partitions before
> executing the plan tree, but maybe it's okay to check only the ones
> that will be accessed

I don't think it needs to be quite as complex as that.
expand_single_inheritance_child will set the
RangeTblEntry.requiredPerms to 0, so we never need to check
permissions on a partition.  The overhead of permission checking when
there are many partitions is just down to the fact that
ExecCheckRTPerms() loops over the entire rangetable and calls
ExecCheckRTEPerms for each one.  ExecCheckRTEPerms() does have very
little work to do when requiredPerms is 0, but the loop itself and the
function call overhead show up when you remove the other bottlenecks.

I have a patch somewhere that just had the planner add the RTindexes
with a non-zero requiredPerms and set that in the plan so that
ExecCheckRTPerms could just look at the ones that actually needed
something checked.   There's a slight disadvantage there that for
queries to non-partitioned tables that we need to build a Bitmapset
that has all items from the rangetable.  That's likely a small
overhead, but not free, so perhaps there is a better way.

David



Re: ModifyTable overheads in generic plans

От
Amit Langote
Дата:
On Mon, Jun 29, 2020 at 10:39 AM David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Sat, 27 Jun 2020 at 00:36, Amit Langote <amitlangote09@gmail.com> wrote:
> > 2. ExecCheckRTPerms(): checks permissions of *all* partitions before
> > executing the plan tree, but maybe it's okay to check only the ones
> > that will be accessed
>
> I don't think it needs to be quite as complex as that.
> expand_single_inheritance_child will set the
> RangeTblEntry.requiredPerms to 0, so we never need to check
> permissions on a partition.  The overhead of permission checking when
> there are many partitions is just down to the fact that
> ExecCheckRTPerms() loops over the entire rangetable and calls
> ExecCheckRTEPerms for each one.  ExecCheckRTEPerms() does have very
> little work to do when requiredPerms is 0, but the loop itself and the
> function call overhead show up when you remove the other bottlenecks.

I had forgotten that we set requiredPerms to 0 for the inheritance child tables.

> I have a patch somewhere that just had the planner add the RTindexes
> with a non-zero requiredPerms and set that in the plan so that
> ExecCheckRTPerms could just look at the ones that actually needed
> something checked.   There's a slight disadvantage there that for
> queries to non-partitioned tables that we need to build a Bitmapset
> that has all items from the rangetable.  That's likely a small
> overhead, but not free, so perhaps there is a better way.

I can't think of anything for this that doesn't involve having one
more list of RTEs or bitmapset of RT indexes in PlannedStmt.



--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com



Re: ModifyTable overheads in generic plans

От
Amit Langote
Дата:
On Fri, Jun 26, 2020 at 9:36 PM Amit Langote <amitlangote09@gmail.com> wrote:
> I would like to discuss a refactoring patch that builds on top of the
> patches at [1] to address $subject.

I forgot to update a place in postgres_fdw causing one of its tests to crash.

Fixed in the attached updated version.

-- 
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

Вложения

Re: ModifyTable overheads in generic plans

От
Daniel Gustafsson
Дата:
> On 1 Jul 2020, at 08:30, Amit Langote <amitlangote09@gmail.com> wrote:
>
> On Fri, Jun 26, 2020 at 9:36 PM Amit Langote <amitlangote09@gmail.com> wrote:
>> I would like to discuss a refactoring patch that builds on top of the
>> patches at [1] to address $subject.
>
> I forgot to update a place in postgres_fdw causing one of its tests to crash.
>
> Fixed in the attached updated version.

The attached 0003 fails to apply to current HEAD, please submit another rebased
version.  Marking the entry as Waiting on Author in the meantime.

cheers ./daniel


Re: ModifyTable overheads in generic plans

От
Amit Langote
Дата:
Hi Daniel,

On Wed, Jul 1, 2020 at 6:50 PM Daniel Gustafsson <daniel@yesql.se> wrote:
> > On 1 Jul 2020, at 08:30, Amit Langote <amitlangote09@gmail.com> wrote:
> >
> > On Fri, Jun 26, 2020 at 9:36 PM Amit Langote <amitlangote09@gmail.com> wrote:
> >> I would like to discuss a refactoring patch that builds on top of the
> >> patches at [1] to address $subject.
> >
> > I forgot to update a place in postgres_fdw causing one of its tests to crash.
> >
> > Fixed in the attached updated version.
>
> The attached 0003 fails to apply to current HEAD, please submit another rebased
> version.  Marking the entry as Waiting on Author in the meantime.

Thank you for the heads up.

Actually, as I noted in the first email, the patches here are to be
applied on top of patches of another thread that I chose not to post
here.  But I can see how that is inconvenient both for the CF bot and
other humans, so I'm attaching all of the patches.

Another thing I could do is decouple the patches to discuss here from
the patches of the other thread, which should be possible and might be
good to avoid back and forth between the two threads.

-- 
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

Вложения

Re: ModifyTable overheads in generic plans

От
Daniel Gustafsson
Дата:
> On 1 Jul 2020, at 15:38, Amit Langote <amitlangote09@gmail.com> wrote:

> Another thing I could do is decouple the patches to discuss here from
> the patches of the other thread, which should be possible and might be
> good to avoid back and forth between the two threads.

It sounds like it would make it easier for reviewers, so if it's possible with
a reasonable effort it might be worth it.  I've moved this entry to the next CF
for now.

cheers ./daniel


Re: ModifyTable overheads in generic plans

От
Robert Haas
Дата:
On Fri, Jun 26, 2020 at 8:36 AM Amit Langote <amitlangote09@gmail.com> wrote:
> 0001 and 0002 are preparatory patches.

I read through these patches a bit but it's really unclear what the
point of them is. I think they need better commit messages, or better
comments, or both.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: ModifyTable overheads in generic plans

От
Amit Langote
Дата:
On Sat, Aug 1, 2020 at 4:46 AM Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Jun 26, 2020 at 8:36 AM Amit Langote <amitlangote09@gmail.com> wrote:
> > 0001 and 0002 are preparatory patches.
>
> I read through these patches a bit but it's really unclear what the
> point of them is. I think they need better commit messages, or better
> comments, or both.

Thanks for taking a look.  Sorry about the lack of good commentary,
which I have tried to address in the attached updated version. I
extracted one more part as preparatory from the earlier 0003 patch, so
there are 4 patches now.

Also as discussed with Daniel, I have changed the patches so that they
can be applied on plain HEAD instead of having to first apply the
patches at [1].  Without runtime pruning for UPDATE/DELETE proposed in
[1], optimizing ResultRelInfo creation by itself does not improve the
performance/scalability by that much, but the benefit of lazily
creating ResultRelInfos seems clear so I think maybe it's okay to
pursue this independently.

-- 
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

[1] https://www.postgresql.org/message-id/CA%2BHiwqHpHdqdDn48yCEhynnniahH78rwcrv1rEX65-fsZGBOLQ%40mail.gmail.com

Вложения

Re: ModifyTable overheads in generic plans

От
Amit Langote
Дата:
On Tue, Aug 4, 2020 at 3:15 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Sat, Aug 1, 2020 at 4:46 AM Robert Haas <robertmhaas@gmail.com> wrote:
> > On Fri, Jun 26, 2020 at 8:36 AM Amit Langote <amitlangote09@gmail.com> wrote:
> > > 0001 and 0002 are preparatory patches.
> >
> > I read through these patches a bit but it's really unclear what the
> > point of them is. I think they need better commit messages, or better
> > comments, or both.
>
> Thanks for taking a look.  Sorry about the lack of good commentary,
> which I have tried to address in the attached updated version. I
> extracted one more part as preparatory from the earlier 0003 patch, so
> there are 4 patches now.
>
> Also as discussed with Daniel, I have changed the patches so that they
> can be applied on plain HEAD instead of having to first apply the
> patches at [1].  Without runtime pruning for UPDATE/DELETE proposed in
> [1], optimizing ResultRelInfo creation by itself does not improve the
> performance/scalability by that much, but the benefit of lazily
> creating ResultRelInfos seems clear so I think maybe it's okay to
> pursue this independently.

Per cfbot's automatic patch tester, there were some issues in the 0004 patch:

nodeModifyTable.c: In function ‘ExecModifyTable’:
1529nodeModifyTable.c:2484:24: error: ‘junkfilter’ may be used
uninitialized in this function [-Werror=maybe-uninitialized]
1530              junkfilter->jf_junkAttNo,
1531                        ^
1532nodeModifyTable.c:2309:14: note: ‘junkfilter’ was declared here
1533  JunkFilter *junkfilter;
1534              ^
1535cc1: all warnings being treated as errors
1536<builtin>: recipe for target 'nodeModifyTable.o' failed
1537make[3]: *** [nodeModifyTable.o] Error 1

Fixed in the attached updated version

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

Вложения

Re: ModifyTable overheads in generic plans

От
Amit Langote
Дата:
Hello,

On Fri, Aug 7, 2020 at 9:26 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Tue, Aug 4, 2020 at 3:15 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > On Sat, Aug 1, 2020 at 4:46 AM Robert Haas <robertmhaas@gmail.com> wrote:
> > > On Fri, Jun 26, 2020 at 8:36 AM Amit Langote <amitlangote09@gmail.com> wrote:
> > > > 0001 and 0002 are preparatory patches.
> > >
> > > I read through these patches a bit but it's really unclear what the
> > > point of them is. I think they need better commit messages, or better
> > > comments, or both.
> >
> > Thanks for taking a look.  Sorry about the lack of good commentary,
> > which I have tried to address in the attached updated version. I
> > extracted one more part as preparatory from the earlier 0003 patch, so
> > there are 4 patches now.
> >
> > Also as discussed with Daniel, I have changed the patches so that they
> > can be applied on plain HEAD instead of having to first apply the
> > patches at [1].  Without runtime pruning for UPDATE/DELETE proposed in
> > [1], optimizing ResultRelInfo creation by itself does not improve the
> > performance/scalability by that much, but the benefit of lazily
> > creating ResultRelInfos seems clear so I think maybe it's okay to
> > pursue this independently.
>
> Per cfbot's automatic patch tester, there were some issues in the 0004 patch:
>
> nodeModifyTable.c: In function ‘ExecModifyTable’:
> 1529nodeModifyTable.c:2484:24: error: ‘junkfilter’ may be used
> uninitialized in this function [-Werror=maybe-uninitialized]
> 1530              junkfilter->jf_junkAttNo,
> 1531                        ^
> 1532nodeModifyTable.c:2309:14: note: ‘junkfilter’ was declared here
> 1533  JunkFilter *junkfilter;
> 1534              ^
> 1535cc1: all warnings being treated as errors
> 1536<builtin>: recipe for target 'nodeModifyTable.o' failed
> 1537make[3]: *** [nodeModifyTable.o] Error 1
>
> Fixed in the attached updated version

Needed a rebase due to f481d28232.  Attached updated patches.

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

Вложения

Re: ModifyTable overheads in generic plans

От
Amit Langote
Дата:
Attached updated patches based on recent the discussion at:

* Re: partition routing layering in nodeModifyTable.c *
https://www.postgresql.org/message-id/CA%2BHiwqHpmMjenQqNpMHrhg3DRhqqQfby2RCT1HWVwMin3_5vMA%40mail.gmail.com

0001 adjusts how ForeignScanState.resultRelInfo is initialized for use
by direct modify operations.

0002 refactors ResultRelInfo initialization do be don lazily on first use

I call these v6, because the last version posted on this thread was
v5, even though it went through a couple of iterations on the above
thread. Sorry about the confusion.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Вложения

Re: ModifyTable overheads in generic plans

От
Heikki Linnakangas
Дата:
On 30/10/2020 08:13, Amit Langote wrote:
> /*
>  * Perform WITH CHECK OPTIONS check, if any.
>  */
> static void
> ExecProcessWithCheckOptions(ModifyTableState *mtstate, ResultRelInfo *resultRelInfo,
>                             TupleTableSlot *slot, WCOKind wco_kind)
> {
>     ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
>     EState *estate = mtstate->ps.state;
> 
>     if (node->withCheckOptionLists == NIL)
>         return;
> 
>     /* Initialize expression state if not already done. */
>     if (resultRelInfo->ri_WithCheckOptions == NIL)
>     {
>         int        whichrel = resultRelInfo - mtstate->resultRelInfo;
>         List   *wcoList;
>         List   *wcoExprs = NIL;
>         ListCell   *ll;
> 
>         Assert(whichrel >= 0 && whichrel < mtstate->mt_nplans);
>         wcoList = (List *) list_nth(node->withCheckOptionLists, whichrel);
>         foreach(ll, wcoList)
>         {
>             WithCheckOption *wco = (WithCheckOption *) lfirst(ll);
>             ExprState  *wcoExpr = ExecInitQual((List *) wco->qual,
>                                                &mtstate->ps);
> 
>             wcoExprs = lappend(wcoExprs, wcoExpr);
>         }
> 
>         resultRelInfo->ri_WithCheckOptions = wcoList;
>         resultRelInfo->ri_WithCheckOptionExprs = wcoExprs;
>     }
> 
>     /*
>      * ExecWithCheckOptions() will skip any WCOs which are not of the kind
>      * we are looking for at this point.
>      */
>     ExecWithCheckOptions(wco_kind, resultRelInfo, slot, estate);
> }

Can we do this initialization in ExecGetResultRelation()? That would 
seem much more straightforward. Is there any advantage to delaying it 
here? And same thing with the junk filter and the RETURNING list.

(/me reads patch further) I presume that's what you referred to in the 
commit message:

>     Also, extend this lazy initialization approach to some of the
>     individual fields of ResultRelInfo such that even for the result
>     relations that are initialized, those fields are only initialized on
>     first access.  While no performance improvement is to be expected
>     there, it can lead to a simpler initialization logic of the
>     ResultRelInfo itself, because the conditions for whether a given
>     field is needed or not tends to look confusing.  One side-effect
>     of this is that any "SubPlans" referenced in the expressions of
>     those fields are also lazily initialized and hence changes the
>     output of EXPLAIN (without ANALYZE) in some regression tests.


I'm now curious what the initialization logic would look like, if we 
initialized those fields in ExecGetResultRelation(). At a quick glance 
on the conditions on when those initializations are done in the patch 
now, it would seem pretty straightforward. If the target list contains 
any junk columns, initialize junk filter, and if 
ModifyTable->returningLists is set, initialize RETURNING list. Maybe I'm 
missing something.

- Heikki



Re: ModifyTable overheads in generic plans

От
Amit Langote
Дата:
On Mon, Nov 2, 2020 at 10:19 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> On 30/10/2020 08:13, Amit Langote wrote:
> > /*
> >  * Perform WITH CHECK OPTIONS check, if any.
> >  */
> > static void
> > ExecProcessWithCheckOptions(ModifyTableState *mtstate, ResultRelInfo *resultRelInfo,
> >                                                       TupleTableSlot *slot, WCOKind wco_kind)
> > {
> >       ModifyTable *node = (ModifyTable *) mtstate->ps.plan;
> >       EState *estate = mtstate->ps.state;
> >
> >       if (node->withCheckOptionLists == NIL)
> >               return;
> >
> >       /* Initialize expression state if not already done. */
> >       if (resultRelInfo->ri_WithCheckOptions == NIL)
> >       {
> >               int             whichrel = resultRelInfo - mtstate->resultRelInfo;
> >               List   *wcoList;
> >               List   *wcoExprs = NIL;
> >               ListCell   *ll;
> >
> >               Assert(whichrel >= 0 && whichrel < mtstate->mt_nplans);
> >               wcoList = (List *) list_nth(node->withCheckOptionLists, whichrel);
> >               foreach(ll, wcoList)
> >               {
> >                       WithCheckOption *wco = (WithCheckOption *) lfirst(ll);
> >                       ExprState  *wcoExpr = ExecInitQual((List *) wco->qual,
> >                                                                                          &mtstate->ps);
> >
> >                       wcoExprs = lappend(wcoExprs, wcoExpr);
> >               }
> >
> >               resultRelInfo->ri_WithCheckOptions = wcoList;
> >               resultRelInfo->ri_WithCheckOptionExprs = wcoExprs;
> >       }
> >
> >       /*
> >        * ExecWithCheckOptions() will skip any WCOs which are not of the kind
> >        * we are looking for at this point.
> >        */
> >       ExecWithCheckOptions(wco_kind, resultRelInfo, slot, estate);
> > }
>
> Can we do this initialization in ExecGetResultRelation()? That would
> seem much more straightforward. Is there any advantage to delaying it
> here? And same thing with the junk filter and the RETURNING list.
>
> (/me reads patch further) I presume that's what you referred to in the
> commit message:
>
> >     Also, extend this lazy initialization approach to some of the
> >     individual fields of ResultRelInfo such that even for the result
> >     relations that are initialized, those fields are only initialized on
> >     first access.  While no performance improvement is to be expected
> >     there, it can lead to a simpler initialization logic of the
> >     ResultRelInfo itself, because the conditions for whether a given
> >     field is needed or not tends to look confusing.  One side-effect
> >     of this is that any "SubPlans" referenced in the expressions of
> >     those fields are also lazily initialized and hence changes the
> >     output of EXPLAIN (without ANALYZE) in some regression tests.
>
>
> I'm now curious what the initialization logic would look like, if we
> initialized those fields in ExecGetResultRelation(). At a quick glance
> on the conditions on when those initializations are done in the patch
> now, it would seem pretty straightforward. If the target list contains
> any junk columns, initialize junk filter, and if
> ModifyTable->returningLists is set, initialize RETURNING list. Maybe I'm
> missing something.

Yeah, it's not that complicated to initialize those things in
ExecGetResultRelation().  In fact, ExecGetResultRelation() (or its
subroutine ExecBuildResultRelation()) housed those initializations in
the earlier versions of this patch, but I changed that after our
discussion about being lazy about initializing as much stuff as we
can.  Maybe I should revert that?

-- 
Amit Langote
EDB: http://www.enterprisedb.com



Re: ModifyTable overheads in generic plans

От
Amit Langote
Дата:
On Mon, Nov 2, 2020 at 10:53 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Mon, Nov 2, 2020 at 10:19 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> > (/me reads patch further) I presume that's what you referred to in the
> > commit message:
> >
> > >     Also, extend this lazy initialization approach to some of the
> > >     individual fields of ResultRelInfo such that even for the result
> > >     relations that are initialized, those fields are only initialized on
> > >     first access.  While no performance improvement is to be expected
> > >     there, it can lead to a simpler initialization logic of the
> > >     ResultRelInfo itself, because the conditions for whether a given
> > >     field is needed or not tends to look confusing.  One side-effect
> > >     of this is that any "SubPlans" referenced in the expressions of
> > >     those fields are also lazily initialized and hence changes the
> > >     output of EXPLAIN (without ANALYZE) in some regression tests.
> >
> >
> > I'm now curious what the initialization logic would look like, if we
> > initialized those fields in ExecGetResultRelation(). At a quick glance
> > on the conditions on when those initializations are done in the patch
> > now, it would seem pretty straightforward. If the target list contains
> > any junk columns, initialize junk filter, and if
> > ModifyTable->returningLists is set, initialize RETURNING list. Maybe I'm
> > missing something.
>
> Yeah, it's not that complicated to initialize those things in
> ExecGetResultRelation().  In fact, ExecGetResultRelation() (or its
> subroutine ExecBuildResultRelation()) housed those initializations in
> the earlier versions of this patch, but I changed that after our
> discussion about being lazy about initializing as much stuff as we
> can.  Maybe I should revert that?

Please check the attached if that looks better.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Вложения

Re: ModifyTable overheads in generic plans

От
Heikki Linnakangas
Дата:
On 03/11/2020 10:27, Amit Langote wrote:
> Please check the attached if that looks better.

Great, thanks! Yeah, I like that much better.

This makes me a bit unhappy:

> 
>         /* Also let FDWs init themselves for foreign-table result rels */
>         if (resultRelInfo->ri_FdwRoutine != NULL)
>         {
>             if (resultRelInfo->ri_usesFdwDirectModify)
>             {
>                 ForeignScanState *fscan = (ForeignScanState *) mtstate->mt_plans[i];
> 
>                 /*
>                  * For the FDW's convenience, set the ForeignScanState node's
>                  * ResultRelInfo to let the FDW know which result relation it
>                  * is going to work with.
>                  */
>                 Assert(IsA(fscan, ForeignScanState));
>                 fscan->resultRelInfo = resultRelInfo;
>                 resultRelInfo->ri_FdwRoutine->BeginDirectModify(fscan, eflags);
>             }
>             else if (resultRelInfo->ri_FdwRoutine->BeginForeignModify != NULL)
>             {
>                 List   *fdw_private = (List *) list_nth(node->fdwPrivLists, i);
> 
>                 resultRelInfo->ri_FdwRoutine->BeginForeignModify(mtstate,
>                                                                  resultRelInfo,
>                                                                  fdw_private,
>                                                                  i,
>                                                                  eflags);
>             }
>         }

If you remember, I was unhappy with a similar assertion in the earlier 
patches [1]. I'm not sure what to do instead though. A few options:

A) We could change FDW API so that BeginDirectModify takes the same 
arguments as BeginForeignModify(). That avoids the assumption that it's 
a ForeignScan node, because BeginForeignModify() doesn't take 
ForeignScanState as argument. That would be consistent, which is nice. 
But I think we'd somehow still need to pass the ResultRelInfo to the 
corresponding ForeignScan, and I'm not sure how.

B) Look up the ResultRelInfo, and call BeginDirectModify(), on the first 
call to ForeignNext().

C) Accept the Assertion. And add an elog() check in the planner for that 
with a proper error message.

I'm leaning towards B), but maybe there's some better solution I didn't 
think of? Perhaps changing the API would make sense in any case, it is a 
bit weird as it is. Backwards-incompatible API changes are not nice, but 
I don't think there are many FDWs out there that implement the 
DirectModify functions. And those functions are pretty tightly coupled 
with the executor and ModifyTable node details anyway, so I don't feel 
like we can, or need to, guarantee that they stay unchanged across major 
versions.

[1] 
https://www.postgresql.org/message-id/19c23dd9-89ce-75a3-9105-5fc05a46f94a%40iki.fi

- Heikki



Re: ModifyTable overheads in generic plans

От
Amit Langote
Дата:
On Tue, Nov 3, 2020 at 9:05 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> On 03/11/2020 10:27, Amit Langote wrote:
> > Please check the attached if that looks better.
>
> Great, thanks! Yeah, I like that much better.
>
> This makes me a bit unhappy:
>
> >
> >               /* Also let FDWs init themselves for foreign-table result rels */
> >               if (resultRelInfo->ri_FdwRoutine != NULL)
> >               {
> >                       if (resultRelInfo->ri_usesFdwDirectModify)
> >                       {
> >                               ForeignScanState *fscan = (ForeignScanState *) mtstate->mt_plans[i];
> >
> >                               /*
> >                                * For the FDW's convenience, set the ForeignScanState node's
> >                                * ResultRelInfo to let the FDW know which result relation it
> >                                * is going to work with.
> >                                */
> >                               Assert(IsA(fscan, ForeignScanState));
> >                               fscan->resultRelInfo = resultRelInfo;
> >                               resultRelInfo->ri_FdwRoutine->BeginDirectModify(fscan, eflags);
> >                       }
> >                       else if (resultRelInfo->ri_FdwRoutine->BeginForeignModify != NULL)
> >                       {
> >                               List   *fdw_private = (List *) list_nth(node->fdwPrivLists, i);
> >
> >                               resultRelInfo->ri_FdwRoutine->BeginForeignModify(mtstate,
> >
          resultRelInfo,
 
> >
          fdw_private,
 
> >
          i,
 
> >
          eflags);
 
> >                       }
> >               }
>
> If you remember, I was unhappy with a similar assertion in the earlier
> patches [1]. I'm not sure what to do instead though. A few options:
>
> A) We could change FDW API so that BeginDirectModify takes the same
> arguments as BeginForeignModify(). That avoids the assumption that it's
> a ForeignScan node, because BeginForeignModify() doesn't take
> ForeignScanState as argument. That would be consistent, which is nice.
> But I think we'd somehow still need to pass the ResultRelInfo to the
> corresponding ForeignScan, and I'm not sure how.

Maybe ForeignScan doesn't need to contain any result relation info
then?  ForeignScan.operation != CMD_SELECT is enough to tell it to
call IterateDirectModify() as today.

> B) Look up the ResultRelInfo, and call BeginDirectModify(), on the first
> call to ForeignNext().
>
> C) Accept the Assertion. And add an elog() check in the planner for that
> with a proper error message.
>
> I'm leaning towards B), but maybe there's some better solution I didn't
> think of?   Perhaps changing the API would make sense in any case, it is a
> bit weird as it is. Backwards-incompatible API changes are not nice, but
> I don't think there are many FDWs out there that implement the
> DirectModify functions. And those functions are pretty tightly coupled
> with the executor and ModifyTable node details anyway, so I don't feel
> like we can, or need to, guarantee that they stay unchanged across major
> versions.

B is not too bad, but I tend to prefer doing A too.

-- 
Amit Langote
EDB: http://www.enterprisedb.com



Re: ModifyTable overheads in generic plans

От
Amit Langote
Дата:
On Wed, Nov 4, 2020 at 11:32 AM Amit Langote <amitlangote09@gmail.com> wrote:
> On Tue, Nov 3, 2020 at 9:05 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> > A) We could change FDW API so that BeginDirectModify takes the same
> > arguments as BeginForeignModify(). That avoids the assumption that it's
> > a ForeignScan node, because BeginForeignModify() doesn't take
> > ForeignScanState as argument. That would be consistent, which is nice.
> > But I think we'd somehow still need to pass the ResultRelInfo to the
> > corresponding ForeignScan, and I'm not sure how.
>
> Maybe ForeignScan doesn't need to contain any result relation info
> then?  ForeignScan.operation != CMD_SELECT is enough to tell it to
> call IterateDirectModify() as today.
>
> > B) Look up the ResultRelInfo, and call BeginDirectModify(), on the first
> > call to ForeignNext().
> >
> > C) Accept the Assertion. And add an elog() check in the planner for that
> > with a proper error message.
> >
> > I'm leaning towards B), but maybe there's some better solution I didn't
> > think of?   Perhaps changing the API would make sense in any case, it is a
> > bit weird as it is. Backwards-incompatible API changes are not nice, but
> > I don't think there are many FDWs out there that implement the
> > DirectModify functions. And those functions are pretty tightly coupled
> > with the executor and ModifyTable node details anyway, so I don't feel
> > like we can, or need to, guarantee that they stay unchanged across major
> > versions.
>
> B is not too bad, but I tend to prefer doing A too.

How about I update the 0001 patch to implement A?

-- 
Amit Langote
EDB: http://www.enterprisedb.com



Re: ModifyTable overheads in generic plans

От
Amit Langote
Дата:
On Wed, Nov 4, 2020 at 11:32 AM Amit Langote <amitlangote09@gmail.com> wrote:
> On Tue, Nov 3, 2020 at 9:05 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> > On 03/11/2020 10:27, Amit Langote wrote:
> > > Please check the attached if that looks better.
> >
> > Great, thanks! Yeah, I like that much better.
> >
> > This makes me a bit unhappy:
> >
> > >
> > >               /* Also let FDWs init themselves for foreign-table result rels */
> > >               if (resultRelInfo->ri_FdwRoutine != NULL)
> > >               {
> > >                       if (resultRelInfo->ri_usesFdwDirectModify)
> > >                       {
> > >                               ForeignScanState *fscan = (ForeignScanState *) mtstate->mt_plans[i];
> > >
> > >                               /*
> > >                                * For the FDW's convenience, set the ForeignScanState node's
> > >                                * ResultRelInfo to let the FDW know which result relation it
> > >                                * is going to work with.
> > >                                */
> > >                               Assert(IsA(fscan, ForeignScanState));
> > >                               fscan->resultRelInfo = resultRelInfo;
> > >                               resultRelInfo->ri_FdwRoutine->BeginDirectModify(fscan, eflags);
> > >                       }
> > >                       else if (resultRelInfo->ri_FdwRoutine->BeginForeignModify != NULL)
> > >                       {
> > >                               List   *fdw_private = (List *) list_nth(node->fdwPrivLists, i);
> > >
> > >                               resultRelInfo->ri_FdwRoutine->BeginForeignModify(mtstate,
> > >
            resultRelInfo,
 
> > >
            fdw_private,
 
> > >
            i,
 
> > >
            eflags);
 
> > >                       }
> > >               }
> >
> > If you remember, I was unhappy with a similar assertion in the earlier
> > patches [1]. I'm not sure what to do instead though. A few options:
> >
> > A) We could change FDW API so that BeginDirectModify takes the same
> > arguments as BeginForeignModify(). That avoids the assumption that it's
> > a ForeignScan node, because BeginForeignModify() doesn't take
> > ForeignScanState as argument. That would be consistent, which is nice.
> > But I think we'd somehow still need to pass the ResultRelInfo to the
> > corresponding ForeignScan, and I'm not sure how.
>
> Maybe ForeignScan doesn't need to contain any result relation info
> then?  ForeignScan.operation != CMD_SELECT is enough to tell it to
> call IterateDirectModify() as today.

Hmm, I misspoke.   We do still need ForeignScanState.resultRelInfo,
because the IterateDirectModify() API uses it to return the remotely
inserted/updated/deleted tuple for the RETURNING projection performed
by ExecModifyTable().

> > B) Look up the ResultRelInfo, and call BeginDirectModify(), on the first
> > call to ForeignNext().
> >
> > C) Accept the Assertion. And add an elog() check in the planner for that
> > with a proper error message.
> >
> > I'm leaning towards B), but maybe there's some better solution I didn't
> > think of?   Perhaps changing the API would make sense in any case, it is a
> > bit weird as it is. Backwards-incompatible API changes are not nice, but
> > I don't think there are many FDWs out there that implement the
> > DirectModify functions. And those functions are pretty tightly coupled
> > with the executor and ModifyTable node details anyway, so I don't feel
> > like we can, or need to, guarantee that they stay unchanged across major
> > versions.
>
> B is not too bad, but I tend to prefer doing A too.

On second thought, it seems A would amount to merely a cosmetic
adjustment of the API, nothing more.  B seems to get the job done for
me and also doesn't unnecessarily break compatibility, so I've updated
0001 to implement B.  Please give it a look.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Вложения

Re: ModifyTable overheads in generic plans

От
Heikki Linnakangas
Дата:
On 10/11/2020 13:12, Amit Langote wrote:
> On Wed, Nov 4, 2020 at 11:32 AM Amit Langote <amitlangote09@gmail.com> wrote:
>> On Tue, Nov 3, 2020 at 9:05 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>>> A) We could change FDW API so that BeginDirectModify takes the same
>>> arguments as BeginForeignModify(). That avoids the assumption that it's
>>> a ForeignScan node, because BeginForeignModify() doesn't take
>>> ForeignScanState as argument. That would be consistent, which is nice.
>>> But I think we'd somehow still need to pass the ResultRelInfo to the
>>> corresponding ForeignScan, and I'm not sure how.
>>
>> Maybe ForeignScan doesn't need to contain any result relation info
>> then?  ForeignScan.operation != CMD_SELECT is enough to tell it to
>> call IterateDirectModify() as today.
> 
> Hmm, I misspoke.   We do still need ForeignScanState.resultRelInfo,
> because the IterateDirectModify() API uses it to return the remotely
> inserted/updated/deleted tuple for the RETURNING projection performed
> by ExecModifyTable().
> 
>>> B) Look up the ResultRelInfo, and call BeginDirectModify(), on the first
>>> call to ForeignNext().
>>>
>>> C) Accept the Assertion. And add an elog() check in the planner for that
>>> with a proper error message.
>>>
>>> I'm leaning towards B), but maybe there's some better solution I didn't
>>> think of?   Perhaps changing the API would make sense in any case, it is a
>>> bit weird as it is. Backwards-incompatible API changes are not nice, but
>>> I don't think there are many FDWs out there that implement the
>>> DirectModify functions. And those functions are pretty tightly coupled
>>> with the executor and ModifyTable node details anyway, so I don't feel
>>> like we can, or need to, guarantee that they stay unchanged across major
>>> versions.
>>
>> B is not too bad, but I tend to prefer doing A too.
> 
> On second thought, it seems A would amount to merely a cosmetic
> adjustment of the API, nothing more.  B seems to get the job done for
> me and also doesn't unnecessarily break compatibility, so I've updated
> 0001 to implement B.  Please give it a look.

Looks good at a quick glance. It is a small API break that 
BeginDirectModify() is now called during execution, not at executor 
startup, but I don't think that's going to break FDWs in practice. One 
could argue, though, that if we're going to change the API, we should do 
it more loudly. So changing the arguments might be a good thing.

The BeginDirectModify() and BeginForeignModify() interfaces are 
inconsistent, but that's not this patch's fault. I wonder if we could 
move the call to BeginForeignModify() also to ForeignNext(), though? And 
BeginForeignScan() too, while we're at it.

Overall, this is probably fine as it is though. I'll review more 
thorougly tomorrow.

- Heikki



Re: ModifyTable overheads in generic plans

От
Heikki Linnakangas
Дата:
On 10/11/2020 17:32, Heikki Linnakangas wrote:
> On 10/11/2020 13:12, Amit Langote wrote:
>> On second thought, it seems A would amount to merely a cosmetic
>> adjustment of the API, nothing more.  B seems to get the job done for
>> me and also doesn't unnecessarily break compatibility, so I've updated
>> 0001 to implement B.  Please give it a look.
> 
> Looks good at a quick glance. It is a small API break that
> BeginDirectModify() is now called during execution, not at executor
> startup, but I don't think that's going to break FDWs in practice. One
> could argue, though, that if we're going to change the API, we should do
> it more loudly. So changing the arguments might be a good thing.
> 
> The BeginDirectModify() and BeginForeignModify() interfaces are
> inconsistent, but that's not this patch's fault. I wonder if we could
> move the call to BeginForeignModify() also to ForeignNext(), though? And
> BeginForeignScan() too, while we're at it.

With these patches, BeginForeignModify() and BeginDirectModify() are 
both called during execution, before the first 
IterateForeignScan/IterateDirectModify call. The documentation for 
BeginForeignModify() needs to be updated, it still claims that it's run 
at executor startup, but that's not true after these patches. So that 
needs to be updated.

I think that's a good thing, because it means that BeginForeignModify() 
and BeginDirectModify() are called at the same stage, from the FDW's 
point of view. Even though BeginDirectModify() is called from 
ForeignNext(), and BeginForeignModify() from ExecModifyTable(), that 
difference isn't visible to the FDW; both are after executor startup but 
before the first Iterate call.

- Heikki



Re: ModifyTable overheads in generic plans

От
Amit Langote
Дата:
Thanks for the review.

On Wed, Nov 11, 2020 at 5:55 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> On 10/11/2020 17:32, Heikki Linnakangas wrote:
> > On 10/11/2020 13:12, Amit Langote wrote:
> >> On second thought, it seems A would amount to merely a cosmetic
> >> adjustment of the API, nothing more.  B seems to get the job done for
> >> me and also doesn't unnecessarily break compatibility, so I've updated
> >> 0001 to implement B.  Please give it a look.
> >
> > Looks good at a quick glance. It is a small API break that
> > BeginDirectModify() is now called during execution, not at executor
> > startup, but I don't think that's going to break FDWs in practice. One
> > could argue, though, that if we're going to change the API, we should do
> > it more loudly. So changing the arguments might be a good thing.
> >
> > The BeginDirectModify() and BeginForeignModify() interfaces are
> > inconsistent, but that's not this patch's fault. I wonder if we could
> > move the call to BeginForeignModify() also to ForeignNext(), though? And
> > BeginForeignScan() too, while we're at it.
>
> With these patches, BeginForeignModify() and BeginDirectModify() are
> both called during execution, before the first
> IterateForeignScan/IterateDirectModify call. The documentation for
> BeginForeignModify() needs to be updated, it still claims that it's run
> at executor startup, but that's not true after these patches. So that
> needs to be updated.

Good point, I've updated the patch to note that.

> I think that's a good thing, because it means that BeginForeignModify()
> and BeginDirectModify() are called at the same stage, from the FDW's
> point of view. Even though BeginDirectModify() is called from
> ForeignNext(), and BeginForeignModify() from ExecModifyTable(), that
> difference isn't visible to the FDW; both are after executor startup but
> before the first Iterate call.

Right.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Вложения

Re: ModifyTable overheads in generic plans

От
Heikki Linnakangas
Дата:
I'm still a bit confused and unhappy about the initialization of 
ResultRelInfos and the various fields in them. We've made progress in 
the previous patches, but it's still a bit messy.

>         /*
>          * If transition tuples will be captured, initialize a map to convert
>          * child tuples into the format of the table mentioned in the query
>          * (root relation), because the transition tuple store can only store
>          * tuples in the root table format.  However for INSERT, the map is
>          * only initialized for a given partition when the partition itself is
>          * first initialized by ExecFindPartition.  Also, this map is also
>          * needed if an UPDATE ends up having to move tuples across
>          * partitions, because in that case the child tuple to be moved first
>          * needs to be converted into the root table's format.  In that case,
>          * we use GetChildToRootMap() to either create one from scratch if
>          * we didn't already create it here.
>          *
>          * Note: We cannot always initialize this map lazily, that is, use
>          * GetChildToRootMap(), because AfterTriggerSaveEvent(), which needs
>          * the map, doesn't have access to the "target" relation that is
>          * needed to create the map.
>          */
>         if (mtstate->mt_transition_capture && operation != CMD_INSERT)
>         {
>             Relation    relation = resultRelInfo->ri_RelationDesc;
>             Relation    targetRel = mtstate->rootResultRelInfo->ri_RelationDesc;
> 
>             resultRelInfo->ri_ChildToRootMap =
>                 convert_tuples_by_name(RelationGetDescr(relation),
>                                        RelationGetDescr(targetRel));
>             /* First time creating the map for this result relation. */
>             Assert(!resultRelInfo->ri_ChildToRootMapValid);
>             resultRelInfo->ri_ChildToRootMapValid = true;
>         }

The comment explains that AfterTriggerSaveEvent() cannot use 
GetChildToRootMap(), because it doesn't have access to the root target 
relation. But there is a field for that in ResultRelInfo: 
ri_PartitionRoot. However, that's only set up when we do partition routing.

How about we rename ri_PartitionRoot to e.g ri_RootTarget, and set it 
always, even for non-partition inheritance? We have that information 
available when we initialize the ResultRelInfo, so might as well.

Some code currently checks ri_PartitionRoot, to determine if a tuple 
that's been inserted, has been routed. For example:

>         /*
>          * Also check the tuple against the partition constraint, if there is
>          * one; except that if we got here via tuple-routing, we don't need to
>          * if there's no BR trigger defined on the partition.
>          */
>         if (resultRelationDesc->rd_rel->relispartition &&
>             (resultRelInfo->ri_PartitionRoot == NULL ||
>              (resultRelInfo->ri_TrigDesc &&
>               resultRelInfo->ri_TrigDesc->trig_insert_before_row)))
>             ExecPartitionCheck(resultRelInfo, slot, estate, true);

So if we set ri_PartitionRoot always, we would need some other way to 
determine if the tuple at hand has actually been routed or not. But 
wouldn't that be a good thing anyway? Isn't it possible that the same 
ResultRelInfo is sometimes used for routed tuples, and sometimes for 
tuples that have been inserted/updated "directly"? 
ExecLookupUpdateResultRelByOid() sets that field lazily, so I think it 
would be possible to get here with ri_PartitionRoot either set or not, 
depending on whether an earlier cross-partition update was routed to the 
table.

The above check is just an optimization, to skip unnecessary 
ExecPartitionCheck() calls, but I think this snippet in 
ExecConstraints() needs to get this right:

>                 /*
>                  * If the tuple has been routed, it's been converted to the
>                  * partition's rowtype, which might differ from the root
>                  * table's.  We must convert it back to the root table's
>                  * rowtype so that val_desc shown error message matches the
>                  * input tuple.
>                  */
>                 if (resultRelInfo->ri_PartitionRoot)
>                 {
>                     AttrMap    *map;
> 
>                     rel = resultRelInfo->ri_PartitionRoot;
>                     tupdesc = RelationGetDescr(rel);
>                     /* a reverse map */
>                     map = build_attrmap_by_name_if_req(orig_tupdesc,
>                                                        tupdesc);
> 
>                     /*
>                      * Partition-specific slot's tupdesc can't be changed, so
>                      * allocate a new one.
>                      */
>                     if (map != NULL)
>                         slot = execute_attr_map_slot(map, slot,
>                                                      MakeTupleTableSlot(tupdesc, &TTSOpsVirtual));
>                 }

Is that an existing bug, or am I missing?

- Heikki



Re: ModifyTable overheads in generic plans

От
Amit Langote
Дата:
On Wed, Nov 11, 2020 at 10:14 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> I'm still a bit confused and unhappy about the initialization of
> ResultRelInfos and the various fields in them. We've made progress in
> the previous patches, but it's still a bit messy.
>
> >               /*
> >                * If transition tuples will be captured, initialize a map to convert
> >                * child tuples into the format of the table mentioned in the query
> >                * (root relation), because the transition tuple store can only store
> >                * tuples in the root table format.  However for INSERT, the map is
> >                * only initialized for a given partition when the partition itself is
> >                * first initialized by ExecFindPartition.  Also, this map is also
> >                * needed if an UPDATE ends up having to move tuples across
> >                * partitions, because in that case the child tuple to be moved first
> >                * needs to be converted into the root table's format.  In that case,
> >                * we use GetChildToRootMap() to either create one from scratch if
> >                * we didn't already create it here.
> >                *
> >                * Note: We cannot always initialize this map lazily, that is, use
> >                * GetChildToRootMap(), because AfterTriggerSaveEvent(), which needs
> >                * the map, doesn't have access to the "target" relation that is
> >                * needed to create the map.
> >                */
> >               if (mtstate->mt_transition_capture && operation != CMD_INSERT)
> >               {
> >                       Relation        relation = resultRelInfo->ri_RelationDesc;
> >                       Relation        targetRel = mtstate->rootResultRelInfo->ri_RelationDesc;
> >
> >                       resultRelInfo->ri_ChildToRootMap =
> >                               convert_tuples_by_name(RelationGetDescr(relation),
> >                                                                          RelationGetDescr(targetRel));
> >                       /* First time creating the map for this result relation. */
> >                       Assert(!resultRelInfo->ri_ChildToRootMapValid);
> >                       resultRelInfo->ri_ChildToRootMapValid = true;
> >               }
>
> The comment explains that AfterTriggerSaveEvent() cannot use
> GetChildToRootMap(), because it doesn't have access to the root target
> relation. But there is a field for that in ResultRelInfo:
> ri_PartitionRoot. However, that's only set up when we do partition routing.
>
> How about we rename ri_PartitionRoot to e.g ri_RootTarget, and set it
> always, even for non-partition inheritance? We have that information
> available when we initialize the ResultRelInfo, so might as well.

Yeah, I agree it's better to use ri_PartitionRoot more generally like
you describe here.

> Some code currently checks ri_PartitionRoot, to determine if a tuple
> that's been inserted, has been routed. For example:
>
> >               /*
> >                * Also check the tuple against the partition constraint, if there is
> >                * one; except that if we got here via tuple-routing, we don't need to
> >                * if there's no BR trigger defined on the partition.
> >                */
> >               if (resultRelationDesc->rd_rel->relispartition &&
> >                       (resultRelInfo->ri_PartitionRoot == NULL ||
> >                        (resultRelInfo->ri_TrigDesc &&
> >                         resultRelInfo->ri_TrigDesc->trig_insert_before_row)))
> >                       ExecPartitionCheck(resultRelInfo, slot, estate, true);
>
> So if we set ri_PartitionRoot always, we would need some other way to
> determine if the tuple at hand has actually been routed or not. But
> wouldn't that be a good thing anyway? Isn't it possible that the same
> ResultRelInfo is sometimes used for routed tuples, and sometimes for
> tuples that have been inserted/updated "directly"?
> ExecLookupUpdateResultRelByOid() sets that field lazily, so I think it
> would be possible to get here with ri_PartitionRoot either set or not,
> depending on whether an earlier cross-partition update was routed to the
> table.

ri_RelationDesc != ri_PartitionRoot gives whether the result relation
is the original target relation of the query or not, so checking that
should be enough here.

> The above check is just an optimization, to skip unnecessary
> ExecPartitionCheck() calls, but I think this snippet in
> ExecConstraints() needs to get this right:
>
> >                               /*
> >                                * If the tuple has been routed, it's been converted to the
> >                                * partition's rowtype, which might differ from the root
> >                                * table's.  We must convert it back to the root table's
> >                                * rowtype so that val_desc shown error message matches the
> >                                * input tuple.
> >                                */
> >                               if (resultRelInfo->ri_PartitionRoot)
> >                               {
> >                                       AttrMap    *map;
> >
> >                                       rel = resultRelInfo->ri_PartitionRoot;
> >                                       tupdesc = RelationGetDescr(rel);
> >                                       /* a reverse map */
> >                                       map = build_attrmap_by_name_if_req(orig_tupdesc,
> >                                                                                                          tupdesc);
> >
> >                                       /*
> >                                        * Partition-specific slot's tupdesc can't be changed, so
> >                                        * allocate a new one.
> >                                        */
> >                                       if (map != NULL)
> >                                               slot = execute_attr_map_slot(map, slot,
> >
MakeTupleTableSlot(tupdesc,&TTSOpsVirtual));
 
> >                               }
>
> Is that an existing bug, or am I missing?

What it's doing is converting a routed tuple in the partition's tuple
format back into the original target relation's format before showing
the tuple in the error message.  Note that we do this reverse
conversion only for tuple routing target relations, not all child
result relations, so in that sense it's a bit inconsistent.  Maybe we
don't need to be too pedantic about showing the exact same tuple as
the user inserted (that is, one matching the "root" table's column
order), so it seems okay to just remove these reverse-conversion
blocks that are repeated in a number of places that show an error
message after failing a constraint check.

Attached new 0002 which does these adjustments.  I went with
ri_RootTargetDesc to go along with ri_RelationDesc.

Also, I have updated the original 0002 (now 0003) to make
GetChildToRootMap() use ri_RootTargetDesc instead of
ModifyTableState.rootResultRelInfo.ri_RelationDesc, so that even
AfterTriggerSaveEvent() can now use that function.  This allows us to
avoid having to initialize ri_ChildToRootMap anywhere but inside
GetChildRootMap(), with that long comment defending doing so. :-)

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Вложения

Re: ModifyTable overheads in generic plans

От
Amit Langote
Дата:
On Thu, Nov 12, 2020 at 5:04 PM Amit Langote <amitlangote09@gmail.com> wrote:
> Attached new 0002 which does these adjustments.  I went with
> ri_RootTargetDesc to go along with ri_RelationDesc.
>
> Also, I have updated the original 0002 (now 0003) to make
> GetChildToRootMap() use ri_RootTargetDesc instead of
> ModifyTableState.rootResultRelInfo.ri_RelationDesc, so that even
> AfterTriggerSaveEvent() can now use that function.  This allows us to
> avoid having to initialize ri_ChildToRootMap anywhere but inside
> GetChildRootMap(), with that long comment defending doing so. :-)

These needed to be rebased due to recent copy.c upheavals.  Attached.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Вложения

Re: ModifyTable overheads in generic plans

От
Amit Langote
Дата:
On Mon, Dec 7, 2020 at 3:53 PM Amit Langote <amitlangote09@gmail.com> wrote:
>
> On Thu, Nov 12, 2020 at 5:04 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > Attached new 0002 which does these adjustments.  I went with
> > ri_RootTargetDesc to go along with ri_RelationDesc.
> >
> > Also, I have updated the original 0002 (now 0003) to make
> > GetChildToRootMap() use ri_RootTargetDesc instead of
> > ModifyTableState.rootResultRelInfo.ri_RelationDesc, so that even
> > AfterTriggerSaveEvent() can now use that function.  This allows us to
> > avoid having to initialize ri_ChildToRootMap anywhere but inside
> > GetChildRootMap(), with that long comment defending doing so. :-)
>
> These needed to be rebased due to recent copy.c upheavals.  Attached.

Needed to be rebased again.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Вложения

Re: ModifyTable overheads in generic plans

От
Amit Langote
Дата:
On Tue, Dec 22, 2020 at 5:16 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Mon, Dec 7, 2020 at 3:53 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > On Thu, Nov 12, 2020 at 5:04 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > > Attached new 0002 which does these adjustments.  I went with
> > > ri_RootTargetDesc to go along with ri_RelationDesc.
> > >
> > > Also, I have updated the original 0002 (now 0003) to make
> > > GetChildToRootMap() use ri_RootTargetDesc instead of
> > > ModifyTableState.rootResultRelInfo.ri_RelationDesc, so that even
> > > AfterTriggerSaveEvent() can now use that function.  This allows us to
> > > avoid having to initialize ri_ChildToRootMap anywhere but inside
> > > GetChildRootMap(), with that long comment defending doing so. :-)
> >
> > These needed to be rebased due to recent copy.c upheavals.  Attached.
>
> Needed to be rebased again.

And again, this time over the recent batch insert API related patches.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Вложения

Re: ModifyTable overheads in generic plans

От
Amit Langote
Дата:
On Mon, Jan 25, 2021 at 2:23 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Tue, Dec 22, 2020 at 5:16 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > On Mon, Dec 7, 2020 at 3:53 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > > On Thu, Nov 12, 2020 at 5:04 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > > > Attached new 0002 which does these adjustments.  I went with
> > > > ri_RootTargetDesc to go along with ri_RelationDesc.
> > > >
> > > > Also, I have updated the original 0002 (now 0003) to make
> > > > GetChildToRootMap() use ri_RootTargetDesc instead of
> > > > ModifyTableState.rootResultRelInfo.ri_RelationDesc, so that even
> > > > AfterTriggerSaveEvent() can now use that function.  This allows us to
> > > > avoid having to initialize ri_ChildToRootMap anywhere but inside
> > > > GetChildRootMap(), with that long comment defending doing so. :-)
> > >
> > > These needed to be rebased due to recent copy.c upheavals.  Attached.
> >
> > Needed to be rebased again.
>
> And again, this time over the recent batch insert API related patches.

Another rebase.

I've dropped what was patch 0001 in the previous set, because I think
it has been rendered unnecessary due to recently committed changes.
However, the rebase led to a couple of additional regression test
output changes that I think are harmless.  The changes are caused by
the fact that ri_RootResultRelInfo now gets initialized in *all* child
result relations, not just those that participate in tuple routing.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Вложения

Re: ModifyTable overheads in generic plans

От
Tom Lane
Дата:
Amit Langote <amitlangote09@gmail.com> writes:
> [ v14-0002-Initialize-result-relation-information-lazily.patch ]

Needs YA rebase over 86dc90056.

            regards, tom lane



Re: ModifyTable overheads in generic plans

От
Amit Langote
Дата:
On Thu, Apr 1, 2021 at 3:12 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Amit Langote <amitlangote09@gmail.com> writes:
> > [ v14-0002-Initialize-result-relation-information-lazily.patch ]
> Needs YA rebase over 86dc90056.

Done.  I will post the updated results for -Mprepared benchmarks I did
in the other thread shortly.

--
Amit Langote
EDB: http://www.enterprisedb.com

Вложения

Re: ModifyTable overheads in generic plans

От
Amit Langote
Дата:
On Thu, Apr 1, 2021 at 10:12 PM Amit Langote <amitlangote09@gmail.com> wrote:
>
> On Thu, Apr 1, 2021 at 3:12 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Amit Langote <amitlangote09@gmail.com> writes:
> > > [ v14-0002-Initialize-result-relation-information-lazily.patch ]
> > Needs YA rebase over 86dc90056.
>
> Done.  I will post the updated results for -Mprepared benchmarks I did
> in the other thread shortly.

Test details:

pgbench -n -T60 -Mprepared -f nojoin.sql

nojoin.sql:

\set a random(1, 1000000)
update test_table t set b = :a where a = :a;

* test_table has 40 columns and partitions as shown below
* plan_cache_mode = force_generic_plan

Results:

nparts  master  patched

64      6262    17118
128     3449    12082
256     1722    7643
1024    359     2099

* tps figures shown are the median of 3 runs.

So, drastic speedup can be seen by even just not creating
ResultRelInfos for child relations that are not updated, as the patch
does.  I haven't yet included any changes for AcquireExecutorLocks()
and ExecCheckRTPerms() bottlenecks that still remain and cause the
drop in tps as partition count increases.

--
Amit Langote
EDB: http://www.enterprisedb.com



Re: ModifyTable overheads in generic plans

От
Tom Lane
Дата:
Amit Langote <amitlangote09@gmail.com> writes:
> On Thu, Apr 1, 2021 at 3:12 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Amit Langote <amitlangote09@gmail.com> writes:
> [ v14-0002-Initialize-result-relation-information-lazily.patch ]
>> Needs YA rebase over 86dc90056.

> Done.

I spent some time looking this over.  There are bits of it we can
adopt without too much trouble, but I'm afraid that 0001 (delay
FDW BeginDirectModify until the first actual update) is a nonstarter,
which makes the main idea of delaying ExecInitResultRelation unworkable.

My fear about 0001 is that it will destroy any hope of direct updates
on different remote partitions executing with consistent semantics
(i.e. compatible snapshots), because some row updates triggered by the
local query may have already happened before a given partition gets to
start its remote query.  Maybe we can work around that, but I do not
want to commit a major restructuring that assumes we can dodge this
problem when we don't yet even have a fix for cross-partition updates
that does rely on the assumption of synchronous startup.

In some desultory performance testing here, it seemed like a
significant part of the cost is ExecOpenIndices, and I don't see
a reason offhand why we could not delay/skip that.  I also concur
with delaying construction of ri_ChildToRootMap and the
partition_tuple_routing data structures, since many queries will
never need those at all.

> * PartitionTupleRouting.subplan_resultrel_htab is removed in favor
> of using ModifyTableState.mt_resultOidHash to look up an UPDATE
> result relation by OID.

Hmm, that sounds promising too, though I didn't look at the details.

Anyway, I think the way to proceed for now is to grab the low-hanging
fruit of things that clearly won't change any semantics.  But tail end
of the dev cycle is no time to be making really fundamental changes
in how FDW direct modify works.

            regards, tom lane



Re: ModifyTable overheads in generic plans

От
Amit Langote
Дата:
On Sun, Apr 4, 2021 at 10:20 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Amit Langote <amitlangote09@gmail.com> writes:
> > On Thu, Apr 1, 2021 at 3:12 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> Amit Langote <amitlangote09@gmail.com> writes:
> > [ v14-0002-Initialize-result-relation-information-lazily.patch ]
> >> Needs YA rebase over 86dc90056.
>
> > Done.
>
> I spent some time looking this over.

Thanks.

> There are bits of it we can
> adopt without too much trouble, but I'm afraid that 0001 (delay
> FDW BeginDirectModify until the first actual update) is a nonstarter,
> which makes the main idea of delaying ExecInitResultRelation unworkable.
>
> My fear about 0001 is that it will destroy any hope of direct updates
> on different remote partitions executing with consistent semantics
> (i.e. compatible snapshots), because some row updates triggered by the
> local query may have already happened before a given partition gets to
> start its remote query.  Maybe we can work around that, but I do not
> want to commit a major restructuring that assumes we can dodge this
> problem when we don't yet even have a fix for cross-partition updates
> that does rely on the assumption of synchronous startup.

Hmm, okay, I can understand the concern.

> In some desultory performance testing here, it seemed like a
> significant part of the cost is ExecOpenIndices, and I don't see
> a reason offhand why we could not delay/skip that.  I also concur
> with delaying construction of ri_ChildToRootMap and the
> partition_tuple_routing data structures, since many queries will
> never need those at all.

As I mentioned in [1], creating ri_projectNew can be expensive too,
especially as column count (and partition count for the generic plan
case) grows.  I think we should have an static inline
initialize-on-first-access accessor function for that field too.

Actually, I remember considering having such accessor functions (all
static inline) for ri_WithCheckOptionExprs, ri_projectReturning,
ri_onConflictArbiterIndexes, and ri_onConflict (needed by ON CONFLICT
UPDATE) as well, prompted by Heikki's comments earlier in the
discussion.  I also remember, before even writing this patch, not
liking that WCO and RETURNING expressions are initialized in their own
separate loops, rather than being part of the earlier loop that says:

    /*
     * Do additional per-result-relation initialization.
     */
    for (i = 0; i < nrels; i++)
    {

I guess ri_RowIdAttNo initialization can go into the same loop.

> > * PartitionTupleRouting.subplan_resultrel_htab is removed in favor
> > of using ModifyTableState.mt_resultOidHash to look up an UPDATE
> > result relation by OID.
>
> Hmm, that sounds promising too, though I didn't look at the details.
>
> Anyway, I think the way to proceed for now is to grab the low-hanging
> fruit of things that clearly won't change any semantics.  But tail end
> of the dev cycle is no time to be making really fundamental changes
> in how FDW direct modify works.

I agree.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

[1] https://www.postgresql.org/message-id/CA+HiwqHLUNhMxy46Mrb04VJpN=HUdm9bD7xdZ6f5h2o4imX79g@mail.gmail.com



Re: ModifyTable overheads in generic plans

От
Tom Lane
Дата:
Amit Langote <amitlangote09@gmail.com> writes:
> On Sun, Apr 4, 2021 at 10:20 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> In some desultory performance testing here, it seemed like a
>> significant part of the cost is ExecOpenIndices, and I don't see
>> a reason offhand why we could not delay/skip that.  I also concur
>> with delaying construction of ri_ChildToRootMap and the
>> partition_tuple_routing data structures, since many queries will
>> never need those at all.

> As I mentioned in [1], creating ri_projectNew can be expensive too,
> especially as column count (and partition count for the generic plan
> case) grows.  I think we should have an static inline
> initialize-on-first-access accessor function for that field too.

> Actually, I remember considering having such accessor functions (all
> static inline) for ri_WithCheckOptionExprs, ri_projectReturning,
> ri_onConflictArbiterIndexes, and ri_onConflict (needed by ON CONFLICT
> UPDATE) as well, prompted by Heikki's comments earlier in the
> discussion.  I also remember, before even writing this patch, not
> liking that WCO and RETURNING expressions are initialized in their own
> separate loops, rather than being part of the earlier loop that says:

Sure, we might as well try to improve the cosmetics here.

>> Anyway, I think the way to proceed for now is to grab the low-hanging
>> fruit of things that clearly won't change any semantics.  But tail end
>> of the dev cycle is no time to be making really fundamental changes
>> in how FDW direct modify works.

> I agree.

OK.  Do you want to pull out the bits of the patch that we can still
do without postponing BeginDirectModify?

Another thing we could consider, perhaps, is keeping the behavior
the same for foreign tables but postponing init of local ones.
To avoid opening the relations to figure out which kind they are,
we'd have to rely on the RTE copies of relkind, which is a bit
worrisome --- I'm not certain that those are guaranteed to be
up-to-date --- but it's probably okay since there is no way to
convert a regular table to foreign or vice versa.  Anyway, that
idea seems fairly messy so I'm inclined to just pursue the
lower-hanging fruit for now.

            regards, tom lane



Re: ModifyTable overheads in generic plans

От
Amit Langote
Дата:
On Mon, Apr 5, 2021 at 1:43 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Amit Langote <amitlangote09@gmail.com> writes:
> > On Sun, Apr 4, 2021 at 10:20 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> In some desultory performance testing here, it seemed like a
> >> significant part of the cost is ExecOpenIndices, and I don't see
> >> a reason offhand why we could not delay/skip that.  I also concur
> >> with delaying construction of ri_ChildToRootMap and the
> >> partition_tuple_routing data structures, since many queries will
> >> never need those at all.
>
> > As I mentioned in [1], creating ri_projectNew can be expensive too,
> > especially as column count (and partition count for the generic plan
> > case) grows.  I think we should have an static inline
> > initialize-on-first-access accessor function for that field too.
>
> > Actually, I remember considering having such accessor functions (all
> > static inline) for ri_WithCheckOptionExprs, ri_projectReturning,
> > ri_onConflictArbiterIndexes, and ri_onConflict (needed by ON CONFLICT
> > UPDATE) as well, prompted by Heikki's comments earlier in the
> > discussion.  I also remember, before even writing this patch, not
> > liking that WCO and RETURNING expressions are initialized in their own
> > separate loops, rather than being part of the earlier loop that says:
>
> Sure, we might as well try to improve the cosmetics here.
>
> >> Anyway, I think the way to proceed for now is to grab the low-hanging
> >> fruit of things that clearly won't change any semantics.  But tail end
> >> of the dev cycle is no time to be making really fundamental changes
> >> in how FDW direct modify works.
>
> > I agree.
>
> OK.  Do you want to pull out the bits of the patch that we can still
> do without postponing BeginDirectModify?

I ended up with the attached, whereby ExecInitResultRelation() is now
performed for all relations before calling ExecInitNode() on the
subplan.  As mentioned, I moved other per-result-rel initializations
into the same loop that does ExecInitResultRelation(), while moving
code related to some initializations into initialize-on-first-access
accessor functions for the concerned fields.  I chose to do that for
ri_WIthCheckOptionExprs, ri_projectReturning, and ri_projectNew.

ExecInitNode() is called on the subplan (to set
outerPlanState(mtstate) that is) after all of the per-result-rel
initializations are done.  One of the initializations is calling
BeginForeignModify() for non-direct modifications, an API to which we
currently pass mtstate.  Moving that to before setting
outerPlanState(mtstate) so as to be in the same loop as other
initializations had me worried just a little bit given a modification
I had to perform in postgresBeginForeignModify():

@@ -1879,7 +1879,7 @@ postgresBeginForeignModify(ModifyTableState *mtstate,
                                    rte,
                                    resultRelInfo,
                                    mtstate->operation,
-                                   outerPlanState(mtstate)->plan,
+                                   outerPlan(mtstate->ps.plan),
                                    query,
                                    target_attrs,
                                    values_end_len,

Though I think that this is harmless, because I'd think that the
implementers of this API shouldn't really rely too strongly on
assuming that outerPlanState(mtstate) is valid when it is called, if
postgres_fdw's implementation is any indication.

Another slightly ugly bit is the dependence of direct modify API on
ri_projectReturning being set even if it doesn't care for anything
else in the ResultRelInfo.  So in ExecInitModifyTable()
ri_projectReturning initialization is not skipped for
directly-modified foreign result relations.

Notes on regression test changes:

* Initializing WCO quals during execution instead of during
ExecInitNode() of ModifyTable() causes a couple of regression test
changes in updatable_view.out that were a bit unexpected for me --
Subplans that are referenced in WCO quals are no longer shown in the
plain EXPLAIN output.  Even though that's a user-visible change, maybe
we can live with that?

* ri_RootResultRelInfo in *all* child relations instead of just in
tuple-routing result relations has caused changes to inherit.out and
privileges.out.  I think that's basically down to ExecConstraints() et
al doing one thing for child relations in which ri_RootResultRelInfo
is set and another for those in which it is not.  Now it's set in
*all* child relations, so it always does the former thing.  I remember
having checked that those changes are only cosmetic when I first
encountered them.

* Moving PartitionTupleRouting initialization to be done lazily for
cross-partition update cases causes changes to update.out.  They have
to do with the fact that the violations of the actual target table's
partition constraint are now shown as such, instead of reporting them
as occurring on one of the leaf partitions.  Again, only cosmetic.

> Another thing we could consider, perhaps, is keeping the behavior
> the same for foreign tables but postponing init of local ones.
> To avoid opening the relations to figure out which kind they are,
> we'd have to rely on the RTE copies of relkind, which is a bit
> worrisome --- I'm not certain that those are guaranteed to be
> up-to-date --- but it's probably okay since there is no way to
> convert a regular table to foreign or vice versa.  Anyway, that
> idea seems fairly messy so I'm inclined to just pursue the
> lower-hanging fruit for now.

It would be nice to try that idea out, but I tend to agree with the last part.

Also, I'm fairly happy with the kind of performance improvement I see
even with the lower-hanging fruit patch for my earlier earlier shared
benchmark that tests the performance of generic plan execution:

HEAD (especially with 86dc90056 now in):

nparts  10cols      20cols      40cols

64      6926        6394        6253
128     3758        3501        3482
256     1938        1822        1776
1024    406         371         406

Patched:

64      13147       12554       14787
128     7850        9788        9631
256     4472        5599        5638
1024    1218        1503        1309

I also tried with a version where the new tuple projections are built
in ExecInitModifyTable() as opposed to lazily:

64      10937       9969        8535
128     6586        5903        4887
256     3613        3118        2654
1024    884         749         652

This tells us that delaying initializing new tuple projection for
updates can have a sizable speedup and better scalability.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Вложения

Re: ModifyTable overheads in generic plans

От
Tom Lane
Дата:
Amit Langote <amitlangote09@gmail.com> writes:
> On Mon, Apr 5, 2021 at 1:43 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> OK.  Do you want to pull out the bits of the patch that we can still
>> do without postponing BeginDirectModify?

> I ended up with the attached, whereby ExecInitResultRelation() is now
> performed for all relations before calling ExecInitNode() on the
> subplan.  As mentioned, I moved other per-result-rel initializations
> into the same loop that does ExecInitResultRelation(), while moving
> code related to some initializations into initialize-on-first-access
> accessor functions for the concerned fields.  I chose to do that for
> ri_WIthCheckOptionExprs, ri_projectReturning, and ri_projectNew.

I pushed the parts of this that I thought were safe and productive.

The business about moving the subplan tree initialization to after
calling FDWs' BeginForeignModify functions seems to me to be a
nonstarter.  Existing FDWs are going to expect their scan initializations
to have been done first.  I'm surprised that postgres_fdw seemed to
need only a one-line fix; it could have been far worse.  The amount of
trouble that could cause is absolutely not worth it to remove one loop
over the result relations.

I also could not get excited about postponing initialization of RETURNING
or WITH CHECK OPTIONS expressions.  I grant that that can be helpful
when those features are used, but I doubt that RETURNING is used that
heavily, and WITH CHECK OPTIONS is surely next door to nonexistent
in performance-critical queries.  If the feature isn't used, the cost
of the existing code is about zero.  So I couldn't see that it was worth
the amount of code thrashing and risk of new bugs involved.  The bit you
noted about EXPLAIN missing a subplan is pretty scary in this connection;
I'm not at all sure that that's just cosmetic.

(Having said that, I'm wondering if there are bugs in these cases for
cross-partition updates that target a previously-not-used partition.
So we might have things to fix anyway.)

Anyway, looking at the test case you posted at the very top of this
thread, I was getting this with HEAD on Friday:

nparts    TPS
0    12152
10    8672
100    2753
1000    314

and after the two patches I just pushed, it looks like:

0    12105
10    9928
100    5433
1000    938

So while there's certainly work left to do, that's not bad for
some low-hanging-fruit grabbing.

            regards, tom lane



Re: ModifyTable overheads in generic plans

От
Andres Freund
Дата:
Hi,

On 2021-04-06 19:24:11 -0400, Tom Lane wrote:
> I also could not get excited about postponing initialization of RETURNING
> or WITH CHECK OPTIONS expressions.  I grant that that can be helpful
> when those features are used, but I doubt that RETURNING is used that
> heavily, and WITH CHECK OPTIONS is surely next door to nonexistent
> in performance-critical queries.

FWIW, there's a number of ORMs etc that use it on every insert (there's
not really a better way to get the serial when you also want to do
pipelining).

> nparts    TPS
> 0    12152
> 10    8672
> 100    2753
> 1000    314
> 
> and after the two patches I just pushed, it looks like:
> 
> 0    12105
> 10    9928
> 100    5433
> 1000    938
> 
> So while there's certainly work left to do, that's not bad for
> some low-hanging-fruit grabbing.

Nice. 3x at the upper end is pretty good.

Greetings,

Andres Freund



Re: ModifyTable overheads in generic plans

От
Amit Langote
Дата:
On Wed, Apr 7, 2021 at 8:24 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Amit Langote <amitlangote09@gmail.com> writes:
> > On Mon, Apr 5, 2021 at 1:43 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> OK.  Do you want to pull out the bits of the patch that we can still
> >> do without postponing BeginDirectModify?
>
> > I ended up with the attached, whereby ExecInitResultRelation() is now
> > performed for all relations before calling ExecInitNode() on the
> > subplan.  As mentioned, I moved other per-result-rel initializations
> > into the same loop that does ExecInitResultRelation(), while moving
> > code related to some initializations into initialize-on-first-access
> > accessor functions for the concerned fields.  I chose to do that for
> > ri_WIthCheckOptionExprs, ri_projectReturning, and ri_projectNew.
>
> I pushed the parts of this that I thought were safe and productive.

Thank you.

+/*
+ * ExecInitInsertProjection
+ *     Do one-time initialization of projection data for INSERT tuples.
+ *
+ * INSERT queries may need a projection to filter out junk attrs in the tlist.
+ *
+ * This is "one-time" for any given result rel, but we might touch
+ * more than one result rel in the course of a partitioned INSERT.

I don't think we need this last bit for INSERT, because the result
rels for leaf partitions will never have to go through
ExecInitInsertProjection().  Leaf partitions are never directly fed
tuples that ExecModifyTable() extracts out of the subplan, because
those tuples will have gone through the root target table's projection
before being passed to tuple routing.  So, if INSERTs will ever need a
projection, only the partitioned table being inserted into will need
to have one built for.

Also, I think we should update the commentary around ri_projectNew a
bit to make it clear that noplace beside ExecGet{Insert|Update}Tuple
should be touching it and the associated slots.

+ * This is "one-time" for any given result rel, but we might touch more than
+ * one result rel in the course of a partitioned UPDATE, and each one needs
+ * its own projection due to possible column order variation.

Minor quibble, but should we write it as "...in the course of an
inherited UPDATE"?

Attached patch contains these changes.

> The business about moving the subplan tree initialization to after
> calling FDWs' BeginForeignModify functions seems to me to be a
> nonstarter.  Existing FDWs are going to expect their scan initializations
> to have been done first.  I'm surprised that postgres_fdw seemed to
> need only a one-line fix; it could have been far worse.  The amount of
> trouble that could cause is absolutely not worth it to remove one loop
> over the result relations.

Okay, that sounds fair.  After all, we write this about 'mtstate' in
the description of BeginForeignModify(), which I had failed to notice:

"mtstate is the overall state of the ModifyTable plan node being
executed; global data about the plan and execution state is available
via this structure."

> I also could not get excited about postponing initialization of RETURNING
> or WITH CHECK OPTIONS expressions.  I grant that that can be helpful
> when those features are used, but I doubt that RETURNING is used that
> heavily, and WITH CHECK OPTIONS is surely next door to nonexistent
> in performance-critical queries.  If the feature isn't used, the cost
> of the existing code is about zero.  So I couldn't see that it was worth
> the amount of code thrashing and risk of new bugs involved.

Okay.

> The bit you
> noted about EXPLAIN missing a subplan is pretty scary in this connection;
> I'm not at all sure that that's just cosmetic.

Yeah, this and...

> (Having said that, I'm wondering if there are bugs in these cases for
> cross-partition updates that target a previously-not-used partition.
> So we might have things to fix anyway.)

...this would need to be looked at a bit more closely, which I'll try
to do sometime later this week.

> Anyway, looking at the test case you posted at the very top of this
> thread, I was getting this with HEAD on Friday:
>
> nparts  TPS
> 0       12152
> 10      8672
> 100     2753
> 1000    314
>
> and after the two patches I just pushed, it looks like:
>
> 0       12105
> 10      9928
> 100     5433
> 1000    938
>
> So while there's certainly work left to do, that's not bad for
> some low-hanging-fruit grabbing.

Yes, certainly.

I reran my usual benchmark and got the following numbers, this time
comparing v13.2 against the latest HEAD:

nparts  10cols      20cols      40cols

v13.2

64      3231        2747        2217
128     1528        1269        1121
256     709         652         491
1024    96          78          67

v14dev HEAD

64      14835       14360       14563
128     9469        9601        9490
256     5523        5383        5268
1024    1482        1415        1366

Clearly, we've made some very good progress here.  Thanks.



--
Amit Langote
EDB: http://www.enterprisedb.com

Вложения

Re: ModifyTable overheads in generic plans

От
Tom Lane
Дата:
Amit Langote <amitlangote09@gmail.com> writes:
> Also, I think we should update the commentary around ri_projectNew a
> bit to make it clear that noplace beside ExecGet{Insert|Update}Tuple
> should be touching it and the associated slots.

Hm.  I pushed your comment fixes in nodeModifyTable.c, but not this
change, because it seemed to be more verbose and not really an
improvement.  Why are these fields any more hands-off than any others?
Besides which, there certainly is other code touching ri_oldTupleSlot.

Anyway, I've marked the CF entry closed, because I think this is about
as far as we can get for v14.  I'm not averse to revisiting the
RETURNING and WITH CHECK OPTIONS issues later, but it looks to me like
that needs more study.

> I reran my usual benchmark and got the following numbers, this time
> comparing v13.2 against the latest HEAD:

> nparts  10cols      20cols      40cols

> v13.2
> 64      3231        2747        2217
> 128     1528        1269        1121
> 256     709         652         491
> 1024    96          78          67

> v14dev HEAD
> 64      14835       14360       14563
> 128     9469        9601        9490
> 256     5523        5383        5268
> 1024    1482        1415        1366

> Clearly, we've made some very good progress here.  Thanks.

Indeed, that's a pretty impressive comparison.

            regards, tom lane



Re: ModifyTable overheads in generic plans

От
Robert Haas
Дата:
On Wed, Apr 7, 2021 at 12:34 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > v13.2
> > 64      3231        2747        2217
> > 128     1528        1269        1121
> > 256     709         652         491
> > 1024    96          78          67
>
> > v14dev HEAD
> > 64      14835       14360       14563
> > 128     9469        9601        9490
> > 256     5523        5383        5268
> > 1024    1482        1415        1366
>
> > Clearly, we've made some very good progress here.  Thanks.
>
> Indeed, that's a pretty impressive comparison.

+1. That looks like a big improvement.

In a vacuum, you'd hope that partitioning a table would make things
faster rather than slower, when only one partition is implicated. Or
at least that the speed would stay about the same. And, while this is
a lot better, we're clearly not there yet. So I hope that, in future
releases, we can continue to find ways to whittle down the overhead.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: ModifyTable overheads in generic plans

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Apr 7, 2021 at 12:34 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Indeed, that's a pretty impressive comparison.

> +1. That looks like a big improvement.

> In a vacuum, you'd hope that partitioning a table would make things
> faster rather than slower, when only one partition is implicated. Or
> at least that the speed would stay about the same. And, while this is
> a lot better, we're clearly not there yet. So I hope that, in future
> releases, we can continue to find ways to whittle down the overhead.

Note that this test case includes plan_cache_mode = force_generic_plan,
so it's deliberately kneecapping our ability to tell that "only one
partition is implicated".  I think things would often be better in
production cases.  No argument that there's not work left to do, though.

            regards, tom lane



Re: ModifyTable overheads in generic plans

От
Amit Langote
Дата:
On Thu, Apr 8, 2021 at 1:34 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Amit Langote <amitlangote09@gmail.com> writes:
> > Also, I think we should update the commentary around ri_projectNew a
> > bit to make it clear that noplace beside ExecGet{Insert|Update}Tuple
> > should be touching it and the associated slots.
>
> Hm.  I pushed your comment fixes in nodeModifyTable.c, but not this
> change, because it seemed to be more verbose and not really an
> improvement.  Why are these fields any more hands-off than any others?
> Besides which, there certainly is other code touching ri_oldTupleSlot.

Oops, that's right.

> Anyway, I've marked the CF entry closed, because I think this is about
> as far as we can get for v14.  I'm not averse to revisiting the
> RETURNING and WITH CHECK OPTIONS issues later, but it looks to me like
> that needs more study.

Sure, I will look into that.

-- 
Amit Langote
EDB: http://www.enterprisedb.com



Re: ModifyTable overheads in generic plans

От
Amit Langote
Дата:
On Thu, Apr 8, 2021 at 3:02 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Wed, Apr 7, 2021 at 12:34 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> Indeed, that's a pretty impressive comparison.
>
> > +1. That looks like a big improvement.
>
> > In a vacuum, you'd hope that partitioning a table would make things
> > faster rather than slower, when only one partition is implicated. Or
> > at least that the speed would stay about the same. And, while this is
> > a lot better, we're clearly not there yet. So I hope that, in future
> > releases, we can continue to find ways to whittle down the overhead.
>
> Note that this test case includes plan_cache_mode = force_generic_plan,
> so it's deliberately kneecapping our ability to tell that "only one
> partition is implicated".

For the record, here are the numbers for plan_cache_mode = auto.
(Actually, plancache.c always goes with custom planning for
partitioned tables.)

v13.2
nparts  10cols      20cols      40cols

64      13391       12140       10958
128     13436       12297       10643
256     12564       12294       10355
1024    11450       10600       9020

v14dev HEAD

64      14925       14648       13361
128     14379       14333       13138
256     14478       14246       13316
1024    12744       12621       11579

There's 10-20% improvement in this case too for various partition
counts, which really has more to do with 86dc90056 than the work done
here.

-- 
Amit Langote
EDB: http://www.enterprisedb.com



Re: ModifyTable overheads in generic plans

От
David Rowley
Дата:
On Thu, 8 Apr 2021 at 15:32, Amit Langote <amitlangote09@gmail.com> wrote:
> There's 10-20% improvement in this case too for various partition
> counts, which really has more to do with 86dc90056 than the work done
> here.

I'm not sure of the exact query you're running, but I imagine the
reason that it wasn't that slow with custom plans was down to
428b260f87.

So the remaining slowness for the generic plan case with large numbers
of partitions in the plan vs custom plans plan-time pruning is a)
locking run-time pruned partitions; and; b) permission checks during
executor startup?

Aside from the WCO and RETURNING stuff you mentioned, I mean.

David



Re: ModifyTable overheads in generic plans

От
Amit Langote
Дата:
On Thu, Apr 8, 2021 at 1:54 PM David Rowley <dgrowleyml@gmail.com> wrote:
> On Thu, 8 Apr 2021 at 15:32, Amit Langote <amitlangote09@gmail.com> wrote:
> > There's 10-20% improvement in this case too for various partition
> > counts, which really has more to do with 86dc90056 than the work done
> > here.
>
> I'm not sure of the exact query you're running,

The query is basically this:

\set a random(1, 1000000)
update test_table set b = :a where a = :a;

> but I imagine the
> reason that it wasn't that slow with custom plans was down to
> 428b260f87.

Right, 428b260f87 is certainly why we are seeing numbers this big at
all.  However, I was saying that 86dc90056 is what makes v14 HEAD run
about 10-20% faster than *v13.2* in this benchmark.  Note that
inheritance_planner() in v13, which, although not as bad as it used to
be in v11, is still more expensive than a single grouping_planner()
call for a given query that we now get thanks to 86dc90056.

> So the remaining slowness for the generic plan case with large numbers
> of partitions in the plan vs custom plans plan-time pruning is a)
> locking run-time pruned partitions; and; b) permission checks during
> executor startup?

Actually, we didn't move ahead with making the ResulRelInfos
themselves lazily as I had proposed in the original patch, so
ExecInitModifyTable() still builds ResultRelInfos for all partitions.
 Although we did move initializations of some fields of it out of
ExecInitModifyTable() --- commits a1115fa0, c5b7ba4e, saving a decent
amount of time spent there.  We need to study closely whether
initializing foreign partition's updates (direct or otherwise) lazily
doesn't produce wrong semantics before we can do that and we need the
ResultRelInfos to pass to those APIs.

--
Amit Langote
EDB: http://www.enterprisedb.com



Re: ModifyTable overheads in generic plans

От
Amit Langote
Дата:
On Wed, Apr 7, 2021 at 5:18 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Wed, Apr 7, 2021 at 8:24 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > I also could not get excited about postponing initialization of RETURNING
> > or WITH CHECK OPTIONS expressions.  I grant that that can be helpful
> > when those features are used, but I doubt that RETURNING is used that
> > heavily, and WITH CHECK OPTIONS is surely next door to nonexistent
> > in performance-critical queries.  If the feature isn't used, the cost
> > of the existing code is about zero.  So I couldn't see that it was worth
> > the amount of code thrashing and risk of new bugs involved.
>
> Okay.
>
> > The bit you
> > noted about EXPLAIN missing a subplan is pretty scary in this connection;
> > I'm not at all sure that that's just cosmetic.
>
> Yeah, this and...

I looked into this and can't see why this isn't just cosmetic as far
as ModifyTable is concerned.

"EXPLAIN missing a subplan" here just means that
ModifyTableState.PlanState.subPlan is not set. Besides ExplainNode(),
only ExecReScan() looks at PlanState.subPlan, and that does not seem
relevant to ModifyTable, because it doesn't support rescanning.

I don't see any such problems with creating RETURNING projections
on-demand either.

> > (Having said that, I'm wondering if there are bugs in these cases for
> > cross-partition updates that target a previously-not-used partition.
> > So we might have things to fix anyway.)
>
> ...this would need to be looked at a bit more closely, which I'll try
> to do sometime later this week.

Given the above, I can't think of any undiscovered problems related to
WCO and RETURNING expression states in the cases where cross-partition
updates target partitions that need to be initialized by
ExecInitPartitionInfo().  Here is the result for the test case in
updatable_views.sql modified to use partitioning and cross-partition
updates:

CREATE TABLE base_tbl (a int) partition by range (a);
CREATE TABLE base_tbl1 PARTITION OF base_tbl FOR VALUES FROM (1) TO (6);
CREATE TABLE base_tbl2 PARTITION OF base_tbl FOR VALUES FROM (6) TO (11);
CREATE TABLE base_tbl3 PARTITION OF base_tbl FOR VALUES FROM (11) TO (15);
CREATE TABLE ref_tbl (a int PRIMARY KEY);
INSERT INTO ref_tbl SELECT * FROM generate_series(1,10);
CREATE VIEW rw_view1 AS
  SELECT * FROM base_tbl b
  WHERE EXISTS(SELECT 1 FROM ref_tbl r WHERE r.a = b.a)
  WITH CHECK OPTION;

INSERT INTO rw_view1 VALUES (1);
INSERT 0 1

INSERT INTO rw_view1 VALUES (11);
ERROR:  new row violates check option for view "rw_view1"
DETAIL:  Failing row contains (11).

-- Both are cross-partition updates where the target relation is
-- lazily initialized in ExecInitPartitionInfo(), along with the WCO
-- qual ExprState
UPDATE rw_view1 SET a = a + 5 WHERE a = 1;
UPDATE 1

UPDATE rw_view1 SET a = a + 5 WHERE a = 6;
ERROR:  new row violates check option for view "rw_view1"
DETAIL:  Failing row contains (11).

EXPLAIN (costs off) INSERT INTO rw_view1 VALUES (5);
      QUERY PLAN
----------------------
 Insert on base_tbl b
   ->  Result
(2 rows)

EXPLAIN (costs off) UPDATE rw_view1 SET a = a + 5 WHERE a = 1;
                       QUERY PLAN
--------------------------------------------------------
 Update on base_tbl b
   Update on base_tbl1 b_1
   ->  Nested Loop
         ->  Index Scan using ref_tbl_pkey on ref_tbl r
               Index Cond: (a = 1)
         ->  Seq Scan on base_tbl1 b_1
               Filter: (a = 1)
(7 rows)

EXPLAIN (costs off) UPDATE rw_view1 SET a = a + 5 WHERE a = 6;
                       QUERY PLAN
--------------------------------------------------------
 Update on base_tbl b
   Update on base_tbl2 b_1
   ->  Nested Loop
         ->  Index Scan using ref_tbl_pkey on ref_tbl r
               Index Cond: (a = 6)
         ->  Seq Scan on base_tbl2 b_1
               Filter: (a = 6)
(7 rows)

Patch attached.  I tested the performance benefit of doing this by
modifying the update query used in earlier benchmarks to have a
RETURNING * clause, getting the following TPS numbers:

-Mprepared (plan_cache_mode=force_generic_plan)

nparts  10cols      20cols      40cols

HEAD
64      10909       9067        7171
128     6903        5624        4161
256     3748        3056        2219
1024    953         738         427

Patched
64      13817       13395       12754
128     9271        9102        8279
256     5345        5207        5083
1024    1463        1443        1389

Also, I don't see much impact of checking if (node->returningLists) in
the per-result-rel initialization loop in the common cases where
there's no RETURNING.

--
Amit Langote
EDB: http://www.enterprisedb.com

Вложения