Обсуждение: executor relation handling

Поиск

Список

Период

Сортировка

executor relation handling

От

Amit Langote

Дата:

16 августа 2018 г., 14:22:51

This is the continuation of the discussion at:

https://www.postgresql.org/message-id/7500.1531920772%40sss.pgh.pa.us

Actually, for more background on what I've written below, reading this
email in the same discussion would help:

https://www.postgresql.org/message-id/4114.1531674142@sss.pgh.pa.us


Attached find a series of patches, the first of which tries to implement
the main topic discussed in the above email, which is to eliminate
execution-time locking of relations in PlannedStmt.rtable.  One of the
other patches removes the plan node fields that are obsoleted by
eliminating execution-time locking of relations.  Those fields served no
purpose beside telling the executor which relations to lock, or more
precisely which relations to lock before initializing the plan tree so
that we don't end up upgrading lock strength due to same relation being
both a source relation and target relation.

When working on that, I noticed that planner fails to remove PlanRowMarks
of relations that won't be scanned by a given plan, which results in
executor redundantly locking relations that the planner already deemed
unnecessary to scan.  The locking would be gone with one of the proposed
patches, but there are still a couple of overheads during executor
initialization of having all those PlanRowMarks.  For example,
ExecInitLockRows or ExecInitModifyTable calling ExecFindRowMark would end
up going over PlanRowMarks that won't ever be used, which especially grows
worse with many partitions.

Short description of each patch follows:

0001-Don-t-lock-range-table-relations-in-the-executor.patch

This removes all instances of heap_open(relid, <not-NoLock>) in the
executor code with heap_open(relid, NoLock).  To verify that an
appropriate lock is already taken on a relation by the time we get into
executor, this also installs an assert-build-only check that confirms that
lmgr indeed holds the lock.  To remember which lock was taken when
creating a given RTE_RELATION range table entry, this adds a lockmode
field to RangeTblEntry.  Finally, because we don't lock in the executor
and hence there are no concerns about lock strength upgrade hazard,
InitPlan doesn't need toinitialize ResultRelInfos and ExecRowMarks, in
favor of doing that in the ExecInit* routines of respective nodes which
need those ResultRelInfos and ExecLockRows.

(This doesn't touch index relations though, as they don't have a range
table entry.)    

0002-Remove-useless-fields-from-planner-nodes.patch

This removes some fields from PlannedStmt whose only purpose currently is
to help InitPlan do certain things that it no longer does, as mentioned
above.  Also, some fields in Append, MergeAppend, ModifyTable, whose only
purpose currently is to propagate partitioned table RT indexes so that
executor could lock them, are removed.  Removing them also means that the
planner doesn't have to spend cycles initializing them.

0003-Prune-PlanRowMark-of-relations-that-are-pruned-from-.patch

This prevents PlanRowMarks corresponding to relations that won't be
scanned by a given plan from appearing in the rowMarks field of LockRows
or ModifyTable nodes.  This results in removing significant overhead from
the executor initialization of those nodes, especially for partitioned
tables with many partitions.

0004-Revise-executor-range-table-relation-opening-closing.patch

This adds two arrays to EState indexed by RT indexes, one for
RangeTblEntry's and another for Relation pointers.  The former reduces the
cost of fetching RangeTblEntry by RT index.  The latter is used by a newly
introduced function ExecRangeTableRelation(), which replaces heap_open for
relations that are contained in the range table.  If a given RTE's
relation is opened by multiple times, only the first call of
ExecRangeTableRelation will go to relcache.



Although improving executor's performance is not the main goal of these
patches, the fact that we're getting rid of redundant processing means
there would be at least some speedup, especially with large number of
relations in the range table, such as with partitioned tables with many
partitions.

* Benchmark script used:

$ cat /tmp/select-lt.sql
select * from lt where b = 999;

'lt' above is a list-partitioned table with 1000 partitions, with one
partition for each value in the range 1..1000.


$ for i in 1 2;
> do
> pgbench -n -Mprepared -T 60 -f /tmp/select-lt.sql
> done

master

tps = 768.172129 (excluding connections establishing)
tps = 764.180834 (excluding connections establishing)

patch 0001 (no locking in the executor)

tps = 775.060323 (excluding connections establishing)
tps = 778.772917 (excluding connections establishing)

patch 0002 (remove useless planner node fields)

tps = 782.165436 (excluding connections establishing)
tps = 759.417411 (excluding connections establishing

patch 0003 (prune PlanRowMarks)

tps = 783.558539 (excluding connections establishing)
tps = 776.106055 (excluding connections establishing)

patch 0004 (executor range table Relation open)

tps = 778.924649 (excluding connections establishing)
tps = 769.093431 (excluding connections establishing)


Speedup is more pronounced with a benchmark that needs RowMarks, because
one of the patches (0003) removes overhead around handling them.

$ cat /tmp/select-lt-for-share.sql
select * from lt where b = 999 for share;

master

tps = 94.095985 (excluding connections establishing)
tps = 93.955702 (excluding connections establishing)

patch 0001 (no locking in the executor)

tps = 199.030555 (excluding connections establishing)
tps = 197.630424 (excluding connections establishing)

patch 0002 (remove useless planner node fields)

tps = 194.384994 (excluding connections establishing)
tps = 195.821362 (excluding connections establishing)

patch 0003 (prune PlanRowMarks)

tps = 712.544029 (excluding connections establishing)
tps = 717.540052 (excluding connections establishing)

patch 0004 (executor range table Relation open)

tps = 725.189715 (excluding connections establishing)
tps = 727.344683 (excluding connections establishing)


Will add this to next CF.

Thanks,
Amit

Вложения

Re: executor relation handling

От

Amit Langote

Дата:

04 сентября 2018 г., 14:53:11

On 2018/08/16 17:22, Amit Langote wrote:
> 0004-Revise-executor-range-table-relation-opening-closing.patch
> 
> This adds two arrays to EState indexed by RT indexes, one for
> RangeTblEntry's and another for Relation pointers.  The former reduces the
> cost of fetching RangeTblEntry by RT index.  The latter is used by a newly
> introduced function ExecRangeTableRelation(), which replaces heap_open for
> relations that are contained in the range table.  If a given RTE's
> relation is opened by multiple times, only the first call of
> ExecRangeTableRelation will go to relcache.

David Rowely recently, independently [1], proposed one of the ideas
mentioned above (store RangeTblEntry's in an array in EState).  As I
mentioned in reply to his email, I think his implementation of the idea is
better than mine, so I've merged his patch in the above patch, except one
part: instead of removing the es_range_table list altogether, I decided to
keep it around so that ExecSerializePlan doesn't have to build one all
over again from the array like his patch did.

Updated patches attached; 0001-0003 are same as v1.

Thanks,
Amit

[1] Make executor's Range Table an array instead of a List
https://postgr.es/m/CAKJS1f9EypD_=xG6ACFdF=1cBjz+Z9hiHHSd-RqLjor+QyA-nw@mail.gmail.com

Вложения

Re: executor relation handling

От

David Rowley

Дата:

10 сентября 2018 г., 10:36:57

On 4 September 2018 at 20:53, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
> Updated patches attached; 0001-0003 are same as v1.

I've looked at these. Here's my review so far:

0001:

1. The following does not seem to be true any longer:

+ /*
+ * If you change the conditions under which rel locks are acquired
+ * here, be sure to adjust ExecOpenScanRelation to match.
+ */

per:

@@ -652,28 +654,10 @@ ExecOpenScanRelation(EState *estate, Index
scanrelid, int eflags)
 {
  Relation rel;
  Oid reloid;
- LOCKMODE lockmode;

- /*
- * Determine the lock type we need.  First, scan to see if target relation
- * is a result relation.  If not, check if it's a FOR UPDATE/FOR SHARE
- * relation.  In either of those cases, we got the lock already.
- */
- lockmode = AccessShareLock;
- if (ExecRelationIsTargetRelation(estate, scanrelid))
- lockmode = NoLock;
- else
- {
- /* Keep this check in sync with InitPlan! */
- ExecRowMark *erm = ExecFindRowMark(estate, scanrelid, true);
-
- if (erm != NULL && erm->relation != NULL)
- lockmode = NoLock;
- }
-

2. Should addRangeTableEntryForRelation() initialize lockmode, or
maybe take it as a parameter?

3. Don't think there's a need to capitalise true and false in:

+ * Return TRUE if we acquired a new lock, FALSE if already held.

4. The comment probably should read "lock level to obtain, or 0 if no
lock is required" in:

+ int lockmode; /* lock taken on the relation or 0 */

The field should likely also be LOCKMODE, not int.

5. AcquireExecutorLocks() does not need the local variable named rt_index

0002:

6. I don't think "rootRelation" is a good name for this field. I think
"root" is being confused with "target". Nothing is it say the target
is the same as the root.

+ Index rootRelation; /* RT index of root partitioned table */

Perhaps "partitionedTarget" is a better name?

7. Should use "else" instead of "else if" in:

+ /* Top-level Plan must be LockRows or ModifyTable */
+ Assert(IsA(stmt->planTree, LockRows) ||
+    IsA(stmt->planTree, ModifyTable));
+ if (IsA(stmt->planTree, LockRows))
+ rowMarks = ((LockRows *) stmt->planTree)->rowMarks;
+ else if (IsA(stmt->planTree, ModifyTable))
+ rowMarks = ((ModifyTable *) stmt->planTree)->rowMarks;

or you'll likely get a compiler warning on non-Assert enabled builds.

0003:

8. The following code seems repeated enough to warrant a static function:

+ rowMarks = NIL;
+ foreach(lc, root->rowMarks)
+ {
+ PlanRowMark *rc = lfirst(lc);
+
+ if (root->simple_rel_array[rc->rti] != NULL &&
+ IS_DUMMY_REL(root->simple_rel_array[rc->rti]))
+ continue;
+
+ rowMarks = lappend(rowMarks, rc);
+ }

Also, why not reverse the condition and do the lappend inside the if?
Save two lines.

9. The following code appears in copy.c, which is pretty much the same
as the code in execMain.c:

estate->es_range_table_size = list_length(cstate->range_table);
estate->es_range_table_array = (RangeTblEntry **)
palloc(sizeof(RangeTblEntry *) *
   estate->es_range_table_size);
/* Populate the range table array */
i = 0;
foreach(lc, cstate->range_table)
estate->es_range_table_array[i++] = lfirst_node(RangeTblEntry, lc);

Would it not be better to invent a function with the signature:

void
setup_range_table_array(EState *estate, List *rangeTable)

and use it in both locations?

10. In create_estate_for_relation() I don't think you should remove
the line that sets es_range_table.

@@ -199,7 +199,8 @@ create_estate_for_relation(LogicalRepRelMapEntry *rel)
  rte->rtekind = RTE_RELATION;
  rte->relid = RelationGetRelid(rel->localrel);
  rte->relkind = rel->localrel->rd_rel->relkind;
- estate->es_range_table = list_make1(rte);

If you're keeping es_range_table then I think it needs to always be
set properly to help prevent future bugs in that area.

11. ExecRangeTableRelation should look more like:

ExecRangeTableRelation(EState *estate, Index rti)
{
  Relation rel = estate->es_relations[rti - 1];

  if (rel != NULL)
    RelationIncrementReferenceCount(rel);
  else
  {
    RangeTblEntry *rte = exec_rt_fetch(rti, estate->es_range_table_array);

   /*
    * No need to lock the relation lock, because upstream code
    * must hold the lock already.
    */
    rel = estate->es_relations[rti - 1] = heap_open(rte->relid, NoLock);
  }

  return rel;
}

12. I think this should read: /* Fill in RTEs. es_relations will be
populated later. */

+ /* Fill the RTEs, Relations array will be filled later. */

13. I also notice that you're still cleaning up Relations with
heap_close or relation_close. Did you consider not doing that and just
having a function that's run at the end of execution which closes all
non-NULL es_relations? This way you'd not need to perform
RelationIncrementReferenceCount inside ExecRangeTableRelation.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: executor relation handling

От

Amit Langote

Дата:

12 сентября 2018 г., 11:23:57

Thank you for reviewing.

On 2018/09/10 13:36, David Rowley wrote:
> On 4 September 2018 at 20:53, Amit Langote
> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>> Updated patches attached; 0001-0003 are same as v1.
> 
> I've looked at these. Here's my review so far:
> 
> 0001:
> 
> 1. The following does not seem to be true any longer:
> 
> + /*
> + * If you change the conditions under which rel locks are acquired
> + * here, be sure to adjust ExecOpenScanRelation to match.
> + */
> 
> per:
> 
> @@ -652,28 +654,10 @@ ExecOpenScanRelation(EState *estate, Index
> scanrelid, int eflags)
>  {
>   Relation rel;
>   Oid reloid;
> - LOCKMODE lockmode;
> 
> - /*
> - * Determine the lock type we need.  First, scan to see if target relation
> - * is a result relation.  If not, check if it's a FOR UPDATE/FOR SHARE
> - * relation.  In either of those cases, we got the lock already.
> - */
> - lockmode = AccessShareLock;
> - if (ExecRelationIsTargetRelation(estate, scanrelid))
> - lockmode = NoLock;
> - else
> - {
> - /* Keep this check in sync with InitPlan! */
> - ExecRowMark *erm = ExecFindRowMark(estate, scanrelid, true);
> -
> - if (erm != NULL && erm->relation != NULL)
> - lockmode = NoLock;
> - }
> -

Yeah, removed that comment in ExecBuildRowMark.

> 2. Should addRangeTableEntryForRelation() initialize lockmode, or
> maybe take it as a parameter?

Hmm, that sounds like a good idea.  Looking at the various places it's
called from though, it's not clear in many instances which lock was taken
on the relation that's passed to it, because the lock itself seems to be
taken several stack frames removed from the call site.

That said, at least for the cases that we care about, that is, the cases
in which the RTE being built will be passed to the executor by the way of
being in a PlannedStmt, it's clear which lock is taken.  So, we can add
the lockmode parameter to addRangeTableEntryForRelation as you suggest and
pass the actual lockmode value only in the cases we care about, mentioning
the fact in the function's header comment that callers may pass NoLock
arbitrarily if it's clear the RTE won't be passed to the executor.

I've modified patch that way.  Thoughts?

> 3. Don't think there's a need to capitalise true and false in:
> 
> + * Return TRUE if we acquired a new lock, FALSE if already held.

OK, fixed.

> 4. The comment probably should read "lock level to obtain, or 0 if no
> lock is required" in:
> 
> + int lockmode; /* lock taken on the relation or 0 */

OK.

> The field should likely also be LOCKMODE, not int.

OK.

> 5. AcquireExecutorLocks() does not need the local variable named rt_index

Good catch, removed.

> 0002:
> 
> 6. I don't think "rootRelation" is a good name for this field. I think
> "root" is being confused with "target". Nothing is it say the target
> is the same as the root.
> 
> + Index rootRelation; /* RT index of root partitioned table */
> 
> Perhaps "partitionedTarget" is a better name?

I realized that we don't need a new Index field here.  nominalRelation
serves the purpose that rootRelation is meant for, so it seems silly to
have two fields of the same value.  Instead, let's have a bool
partitionedTarget which is set to true if the target (whose RT index is
nominalRelation) is a partitioned tables.

> 7. Should use "else" instead of "else if" in:
> 
> + /* Top-level Plan must be LockRows or ModifyTable */
> + Assert(IsA(stmt->planTree, LockRows) ||
> +    IsA(stmt->planTree, ModifyTable));
> + if (IsA(stmt->planTree, LockRows))
> + rowMarks = ((LockRows *) stmt->planTree)->rowMarks;
> + else if (IsA(stmt->planTree, ModifyTable))
> + rowMarks = ((ModifyTable *) stmt->planTree)->rowMarks;
> 
> or you'll likely get a compiler warning on non-Assert enabled builds.

Yep, fixed.

> 0003:
> 
> 8. The following code seems repeated enough to warrant a static function:
> 
> + rowMarks = NIL;
> + foreach(lc, root->rowMarks)
> + {
> + PlanRowMark *rc = lfirst(lc);
> +
> + if (root->simple_rel_array[rc->rti] != NULL &&
> + IS_DUMMY_REL(root->simple_rel_array[rc->rti]))
> + continue;
> +
> + rowMarks = lappend(rowMarks, rc);
> + }
> 
> Also, why not reverse the condition and do the lappend inside the if?
> Save two lines.

OK, made this change and added a static function called
get_unpruned_rowmarks(PlannerInfo *root).

> 
> 9. The following code appears in copy.c, which is pretty much the same
> as the code in execMain.c:
> 
> estate->es_range_table_size = list_length(cstate->range_table);
> estate->es_range_table_array = (RangeTblEntry **)
> palloc(sizeof(RangeTblEntry *) *
>    estate->es_range_table_size);
> /* Populate the range table array */
> i = 0;
> foreach(lc, cstate->range_table)
> estate->es_range_table_array[i++] = lfirst_node(RangeTblEntry, lc);
> 
> Would it not be better to invent a function with the signature:
> 
> void
> setup_range_table_array(EState *estate, List *rangeTable)
> 
> and use it in both locations?

Agreed, but I named it ExecInitRangeTable.

> 10. In create_estate_for_relation() I don't think you should remove
> the line that sets es_range_table.
> 
> @@ -199,7 +199,8 @@ create_estate_for_relation(LogicalRepRelMapEntry *rel)
>   rte->rtekind = RTE_RELATION;
>   rte->relid = RelationGetRelid(rel->localrel);
>   rte->relkind = rel->localrel->rd_rel->relkind;
> - estate->es_range_table = list_make1(rte);
> 
> If you're keeping es_range_table then I think it needs to always be
> set properly to help prevent future bugs in that area.

My bad, fixed.

> 11. ExecRangeTableRelation should look more like:
> 
> ExecRangeTableRelation(EState *estate, Index rti)
> {
>   Relation rel = estate->es_relations[rti - 1];
> 
>   if (rel != NULL)
>     RelationIncrementReferenceCount(rel);
>   else
>   {
>     RangeTblEntry *rte = exec_rt_fetch(rti, estate->es_range_table_array);
> 
>    /*
>     * No need to lock the relation lock, because upstream code
>     * must hold the lock already.
>     */
>     rel = estate->es_relations[rti - 1] = heap_open(rte->relid, NoLock);
>   }
> 
>   return rel;
> }

Much better, done.

> 12. I think this should read: /* Fill in RTEs. es_relations will be
> populated later. */
> 
> + /* Fill the RTEs, Relations array will be filled later. */

I've rewritten the comment.

> 13. I also notice that you're still cleaning up Relations with
> heap_close or relation_close. Did you consider not doing that and just
> having a function that's run at the end of execution which closes all
> non-NULL es_relations? This way you'd not need to perform
> RelationIncrementReferenceCount inside ExecRangeTableRelation.

Agreed, done.  I'd be slightly hesitant to remove ExecCloseScanRelation,
ExecDestroyPartitionPruneState et al if they weren't just a wrapper around
heap_close.

Please find attached revised patches.

Thanks,
Amit

Вложения

Re: executor relation handling

От

Jesper Pedersen

Дата:

12 сентября 2018 г., 18:23:24

Hi Amit,

On 9/12/18 1:23 AM, Amit Langote wrote:
> Please find attached revised patches.
> 

After applying 0004 I'm getting a crash in 'eval-plan-qual' during 
check-world using

export CFLAGS="-DCOPY_PARSE_PLAN_TREES -O0 -fno-omit-frame-pointer" && 
./configure --enable-dtrace --with-openssl --with-gssapi --with-libxml 
--with-llvm --enable-debug --enable-depend --enable-tap-tests 
--enable-cassert

Confirmed by CFBot in [1].

[1] https://travis-ci.org/postgresql-cfbot/postgresql/builds/427530296

Best regards,
  Jesper

Re: executor relation handling

От

Amit Langote

Дата:

12 сентября 2018 г., 21:23:48

On Wed, Sep 12, 2018 at 9:23 PM, Jesper Pedersen
<jesper.pedersen@redhat.com> wrote:
> Hi Amit,
>
> On 9/12/18 1:23 AM, Amit Langote wrote:
>>
>> Please find attached revised patches.
>>
>
> After applying 0004 I'm getting a crash in 'eval-plan-qual' during
> check-world using
>
> export CFLAGS="-DCOPY_PARSE_PLAN_TREES -O0 -fno-omit-frame-pointer" &&
> ./configure --enable-dtrace --with-openssl --with-gssapi --with-libxml
> --with-llvm --enable-debug --enable-depend --enable-tap-tests
> --enable-cassert
>
> Confirmed by CFBot in [1].
>
> [1] https://travis-ci.org/postgresql-cfbot/postgresql/builds/427530296

Thanks Jesper.  Will look into it first thing tomorrow morning.

Thanks,
Amit

Re: executor relation handling

От

Amit Langote

Дата:

13 сентября 2018 г., 10:58:05

On 2018/09/13 0:23, Amit Langote wrote:
> On Wed, Sep 12, 2018 at 9:23 PM, Jesper Pedersen
> <jesper.pedersen@redhat.com> wrote:
>> Hi Amit,
>>
>> On 9/12/18 1:23 AM, Amit Langote wrote:
>>>
>>> Please find attached revised patches.
>>>
>>
>> After applying 0004 I'm getting a crash in 'eval-plan-qual' during
>> check-world using
>>
>> export CFLAGS="-DCOPY_PARSE_PLAN_TREES -O0 -fno-omit-frame-pointer" &&
>> ./configure --enable-dtrace --with-openssl --with-gssapi --with-libxml
>> --with-llvm --enable-debug --enable-depend --enable-tap-tests
>> --enable-cassert
>>
>> Confirmed by CFBot in [1].
>>
>> [1] https://travis-ci.org/postgresql-cfbot/postgresql/builds/427530296
> 
> Thanks Jesper.  Will look into it first thing tomorrow morning.

Attached updated patches.

Beside the issue that caused eval-plan-qual isolation test to crash, I
also spotted and fixed an oversight in the 0002 patch which would lead to
EState.es_output_cid being set to wrong value and causing unexpected error
during tuple locking as result of that.

Thanks,
Amit

Вложения

Re: executor relation handling

От

Jesper Pedersen

Дата:

13 сентября 2018 г., 17:27:45

Hi Amit,

On 9/13/18 12:58 AM, Amit Langote wrote:
> Attached updated patches.
> 
> Beside the issue that caused eval-plan-qual isolation test to crash, I
> also spotted and fixed an oversight in the 0002 patch which would lead to
> EState.es_output_cid being set to wrong value and causing unexpected error
> during tuple locking as result of that.
> 

Thanks for the update.

However, the subscription TAP test 
(src/test/subscription/t/001_rep_changes.pl) is still failing.

CFBot has the same log

  https://travis-ci.org/postgresql-cfbot/postgresql/builds/427999969

as locally.

Best regards,
  Jesper

Re: executor relation handling

От

Amit Langote

Дата:

14 сентября 2018 г., 11:36:38

Thanks again, Jesper.

On 2018/09/13 20:27, Jesper Pedersen wrote:
> Hi Amit,
> 
> On 9/13/18 12:58 AM, Amit Langote wrote:
>> Attached updated patches.
>>
>> Beside the issue that caused eval-plan-qual isolation test to crash, I
>> also spotted and fixed an oversight in the 0002 patch which would lead to
>> EState.es_output_cid being set to wrong value and causing unexpected error
>> during tuple locking as result of that.
>>
> 
> Thanks for the update.
> 
> However, the subscription TAP test
> (src/test/subscription/t/001_rep_changes.pl) is still failing.
> 
> CFBot has the same log
> 
>  https://travis-ci.org/postgresql-cfbot/postgresql/builds/427999969
> 
> as locally.

My bad.  I missed that logical replication code depends on the affected
executor code.

Fixed patches attached.

Thanks,
Amit

Вложения

Re: executor relation handling

От

David Rowley

Дата:

27 сентября 2018 г., 15:15:50

I've just completed a review of the v5 patch set. I ended up just
making the changes myself since Amit mentioned he was on leave for a
few weeks.

Summary of changes:

1. Changed the way we verify the lock already exists with debug
builds. I reverted some incorrect code added to LockRelationOid that
seems to have gotten broken after being rebased on f868a8143a9.  I've
just added some functions that verify the lock is in the
LockMethodLocalHash hashtable.
2. Fixed some incorrect lock types being passed into
addRangeTableEntryForRelation()
3. Added code in addRangeTableEntryForRelation to verify we actually
hold the lock that the parameter claims we do. (This found all the
errors I fixed in #2)
4. Updated various comments outdated by the patches
5. Updated executor README's mention that we close relations when
calling the end node function.  This is now handled at the end of
execution.
6. Renamed nominalRelation to targetRelation.  I think this fits
better since we're overloading the variable.
7. Use LOCKMODE instead of int in some places.
8. Changed warning about relation not locked to WARNING instead of NOTICE.
9. Renamed get_unpruned_rowmarks() to get_nondummy_rowmarks(). Pruning
makes me think of partition pruning but the function checks for dummy
rels. These could be dummy for reasons other than partition pruning.

I've attached a diff showing the changes I made along with the full
patches which I tagged as v6.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: executor relation handling

От

Amit Langote

Дата:

27 сентября 2018 г., 15:23:13

On 2018/09/27 18:15, David Rowley wrote:
> I've just completed a review of the v5 patch set. I ended up just
> making the changes myself since Amit mentioned he was on leave for a
> few weeks.

Thanks David.  I'm back today and will look at the updated patches tomorrow.

Regards,
Amit

Re: executor relation handling

От

Jesper Pedersen

Дата:

27 сентября 2018 г., 20:24:46

Hi,

On 9/27/18 5:15 AM, David Rowley wrote:
> I've just completed a review of the v5 patch set. I ended up just
> making the changes myself since Amit mentioned he was on leave for a
> few weeks.
> 
> Summary of changes:
> 
> 1. Changed the way we verify the lock already exists with debug
> builds. I reverted some incorrect code added to LockRelationOid that
> seems to have gotten broken after being rebased on f868a8143a9.  I've
> just added some functions that verify the lock is in the
> LockMethodLocalHash hashtable.
> 2. Fixed some incorrect lock types being passed into
> addRangeTableEntryForRelation()
> 3. Added code in addRangeTableEntryForRelation to verify we actually
> hold the lock that the parameter claims we do. (This found all the
> errors I fixed in #2)
> 4. Updated various comments outdated by the patches
> 5. Updated executor README's mention that we close relations when
> calling the end node function.  This is now handled at the end of
> execution.
> 6. Renamed nominalRelation to targetRelation.  I think this fits
> better since we're overloading the variable.
> 7. Use LOCKMODE instead of int in some places.
> 8. Changed warning about relation not locked to WARNING instead of NOTICE.
> 9. Renamed get_unpruned_rowmarks() to get_nondummy_rowmarks(). Pruning
> makes me think of partition pruning but the function checks for dummy
> rels. These could be dummy for reasons other than partition pruning.
> 
> I've attached a diff showing the changes I made along with the full
> patches which I tagged as v6.
> 

Thanks David and Amit -- this version passes check-world. I'll also take 
a deeper look.

Best regards,
  Jesper

Re: executor relation handling

От

Amit Langote

Дата:

28 сентября 2018 г., 14:00:00

On 2018/09/27 18:15, David Rowley wrote:
> I've just completed a review of the v5 patch set. I ended up just
> making the changes myself since Amit mentioned he was on leave for a
> few weeks.
> 
> Summary of changes:
> 
> 1. Changed the way we verify the lock already exists with debug
> builds. I reverted some incorrect code added to LockRelationOid that
> seems to have gotten broken after being rebased on f868a8143a9.  I've
> just added some functions that verify the lock is in the
> LockMethodLocalHash hashtable.

Thanks.  I guess I wasn't terribly happy with my job of rebasing on top of
f868a8143a9, because that commit had made the result of
LockAcquireExtended a bit ambiguous for me.

I like the new CheckRelationLockedByUs() and LocalLockExists() functions
that you added to lock manager.

> 2. Fixed some incorrect lock types being passed into
> addRangeTableEntryForRelation()
>
> 3. Added code in addRangeTableEntryForRelation to verify we actually
> hold the lock that the parameter claims we do. (This found all the
> errors I fixed in #2)

I see that I'd falsely copy-pasted AccessShareLock in transformRuleStmt,
which you've corrected to AccessExclusiveLock.  I was not sure about other
cases where you've replaced NoLock by something else, because they won't
be checked or asserted, but perhaps that's fine.

> 4. Updated various comments outdated by the patches
> 5. Updated executor README's mention that we close relations when
> calling the end node function.  This is now handled at the end of
> execution.

Thank you.  I see that you also fixed some useless code that I had left
lying around as result of code movement such as the following dead code:

@@ -2711,9 +2711,7 @@ ExecEndModifyTable(ModifyTableState *node)
 {
     int         i;

-    /*
-     * close the result relation(s) if any, but hold locks until xact commit.
-     */
+    /* Perform cleanup. */
     for (i = 0; i < node->mt_nplans; i++)
     {
         ResultRelInfo *resultRelInfo = node->resultRelInfo + i;
@@ -2728,7 +2726,6 @@ ExecEndModifyTable(ModifyTableState *node)
         /* Close indices and then the relation itself */
         ExecCloseIndices(resultRelInfo);
         heap_close(resultRelInfo->ri_RelationDesc, NoLock);
-        resultRelInfo++;
     }

> 6. Renamed nominalRelation to targetRelation.  I think this fits
> better since we're overloading the variable.

That makes sense to me.

> 7. Use LOCKMODE instead of int in some places.
> 8. Changed warning about relation not locked to WARNING instead of NOTICE.

Oops, think I'd forgotten to do that myself.

> 9. Renamed get_unpruned_rowmarks() to get_nondummy_rowmarks(). Pruning
> makes me think of partition pruning but the function checks for dummy
> rels. These could be dummy for reasons other than partition pruning.

Makes sense too.

> I've attached a diff showing the changes I made along with the full
> patches which I tagged as v6.

Thanks a lot for working on that.  I've made minor tweaks, which find in
the attached updated patches (a .diff file containing changes from v6 to
v7 is also attached).

Regards,
Amit

Вложения

Re: executor relation handling

От

David Rowley

Дата:

28 сентября 2018 г., 14:21:02

On 28 September 2018 at 20:00, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
> I've made minor tweaks, which find in
> the attached updated patches (a .diff file containing changes from v6 to
> v7 is also attached).

Thanks for looking over the changes.

I've looked at the v6 to v7 diff and it seems all good, apart from:

+ * The following asserts that the necessary lock on the relation

I think we maybe should switch the word "assert" for "verifies". The
Assert is just checking we didn't get a NoLock and I don't think
you're using "assert" meaning the Assert() marco, so likely should be
changed to avoid confusion.

Apart from that, I see nothing wrong with the patches, so I think we
should get someone else to look. I'm marking it as ready for
committer.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: executor relation handling

От

Amit Langote

Дата:

28 сентября 2018 г., 14:28:58

On 2018/09/28 17:21, David Rowley wrote:
> On 28 September 2018 at 20:00, Amit Langote
> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>> I've made minor tweaks, which find in
>> the attached updated patches (a .diff file containing changes from v6 to
>> v7 is also attached).
> 
> Thanks for looking over the changes.
> 
> I've looked at the v6 to v7 diff and it seems all good, apart from:
> 
> + * The following asserts that the necessary lock on the relation
> 
> I think we maybe should switch the word "assert" for "verifies". The
> Assert is just checking we didn't get a NoLock and I don't think
> you're using "assert" meaning the Assert() marco, so likely should be
> changed to avoid confusion.

Okay, I've revised the text in the attached updated patch.

> Apart from that, I see nothing wrong with the patches, so I think we
> should get someone else to look. I'm marking it as ready for
> committer.

Thanks for your time reviewing the patches.

Regards,
Amit

Вложения

Re: executor relation handling

От

David Rowley

Дата:

28 сентября 2018 г., 14:48:33

On 28 September 2018 at 20:28, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
> On 2018/09/28 17:21, David Rowley wrote:
>> I think we maybe should switch the word "assert" for "verifies". The
>> Assert is just checking we didn't get a NoLock and I don't think
>> you're using "assert" meaning the Assert() marco, so likely should be
>> changed to avoid confusion.
>
> Okay, I've revised the text in the attached updated patch.

Meh, I just noticed that the WARNING text claims "InitPlan" is the
function name. I think it's best to get rid of that. It's pretty much
redundant anyway if you do: \set VERBOSITY verbose

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: executor relation handling

От

Amit Langote

Дата:

28 сентября 2018 г., 14:58:49

On 2018/09/28 17:48, David Rowley wrote:
> On 28 September 2018 at 20:28, Amit Langote
> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>> On 2018/09/28 17:21, David Rowley wrote:
>>> I think we maybe should switch the word "assert" for "verifies". The
>>> Assert is just checking we didn't get a NoLock and I don't think
>>> you're using "assert" meaning the Assert() marco, so likely should be
>>> changed to avoid confusion.
>>
>> Okay, I've revised the text in the attached updated patch.
> 
> Meh, I just noticed that the WARNING text claims "InitPlan" is the
> function name. I think it's best to get rid of that. It's pretty much
> redundant anyway if you do: \set VERBOSITY verbose

Oops, good catch that one.  Removed "InitPlan: " from the message in the
attached.

Thanks,
Amit

Вложения

Re: executor relation handling

От

Jesper Pedersen

Дата:

28 сентября 2018 г., 19:51:47

Hi,

On 9/28/18 4:58 AM, Amit Langote wrote:
>>> Okay, I've revised the text in the attached updated patch.
>>
>> Meh, I just noticed that the WARNING text claims "InitPlan" is the
>> function name. I think it's best to get rid of that. It's pretty much
>> redundant anyway if you do: \set VERBOSITY verbose
> 
> Oops, good catch that one.  Removed "InitPlan: " from the message in the
> attached.
> 

I have looked at the patch (v9), and have no further comments. I can 
confirm a speedup in the SELECT FOR SHARE case.

Thanks for working on this !

Best regards,
  Jesper

Re: executor relation handling

От

Alvaro Herrera

Дата:

28 сентября 2018 г., 21:49:46

On 2018-Sep-28, Amit Langote wrote:
> On 2018/09/28 17:48, David Rowley wrote:

> > Meh, I just noticed that the WARNING text claims "InitPlan" is the
> > function name. I think it's best to get rid of that. It's pretty much
> > redundant anyway if you do: \set VERBOSITY verbose
> 
> Oops, good catch that one.  Removed "InitPlan: " from the message in the
> attached.

Were there two cases of that?  Because one still remains.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: executor relation handling

От

David Rowley

Дата:

29 сентября 2018 г., 06:46:36

On 29 September 2018 at 03:49, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> On 2018-Sep-28, Amit Langote wrote:
>> On 2018/09/28 17:48, David Rowley wrote:
>
>> > Meh, I just noticed that the WARNING text claims "InitPlan" is the
>> > function name. I think it's best to get rid of that. It's pretty much
>> > redundant anyway if you do: \set VERBOSITY verbose
>>
>> Oops, good catch that one.  Removed "InitPlan: " from the message in the
>> attached.
>
> Were there two cases of that?  Because one still remains.

Yeah, 0001 added it and 0004 removed it again replacing it with the
corrected version.

I've attached v10 which fixes this and aligns the WARNING text in
ExecInitRangeTable() and addRangeTableEntryForRelation().

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: executor relation handling

От

Tom Lane

Дата:

30 сентября 2018 г., 02:04:17

David Rowley <david.rowley@2ndquadrant.com> writes:
> I've attached v10 which fixes this and aligns the WARNING text in
> ExecInitRangeTable() and addRangeTableEntryForRelation().

I started poking at this.  I thought that it would be a good cross-check
to apply just the "front half" of 0001 (i.e., creation and population of
the RTE lockmode field), and then to insert checks in each of the
"back half" places (executor, plancache, etc) that the lockmodes they
are computing today match what is in the RTE.

Those checks fell over many times in the regression tests.

There seem to be at least four distinct problems:

1. You set up transformRuleStmt to insert AccessExclusiveLock into
the "OLD" and "NEW" RTEs for a view.  This is surely wrong; we do
not want to take exclusive lock on a view just to run a query using
the view.  It should (usually, anyway) just be AccessShareLock.
However, because addRangeTableEntryForRelation insists that you
hold the requested lock type *now*, just changing the parameter
to AccessShareLock doesn't work.

I hacked around this for the moment by passing NoLock to
addRangeTableEntryForRelation and then changing rte->lockmode
after it returns, but man that's ugly.  It makes me wonder whether
addRangeTableEntryForRelation should be checking the lockmode at all.

I'm inclined to think we should remove the check for current lockmode
in addRangeTableEntryForRelation, and instead just assert that the
passed lockmode must be one of AccessShareLock, RowShareLock, or
RowExclusiveLock depending on whether the RTE is meant to represent
a plain source table, a FOR UPDATE/SHARE source table, or a target
table.  I don't think it's that helpful to be checking that the
caller got exactly the same lock, especially given the number of
places where the patch had to cheat already by using NoLock.

2. The "forUpdatePushedDown" override in AcquireRewriteLocks isn't
getting modeled in the RTE lockmode fields.  In other words, if we
have something like

    CREATE VIEW vv AS SELECT * FROM tab1;
    SELECT * FROM vv FOR UPDATE OF vv;

the checks fall over, because the tab1 RTE in the view's rangetable
just has AccessShareLock, but we need RowShareLock.  I fixed this
by having AcquireRewriteLocks actually modify the RTE if it is
promoting the lock level.  This is kinda grotty, but we were already
assuming that AcquireRewriteLocks could scribble on the query tree,
so it seems safe enough.

3. There remain some cases where the RTE says RowExclusiveLock but
the executor calculation indicates we only need AccessShareLock.
AFAICT, this happens only when we have a DO ALSO rule that results
in an added query that merely scans the target table.  The RTE used
for that purpose is just the original one, so it still has a lockmode
suitable for a target table.

We could probably hack the rewriter so that it changes the RTE lockmode
back down to AccessShareLock in these cases.  However, I'm inclined to
think that that is something *not* to do, and that we should let the
higher lockmode be used instead, for two reasons:

(1) Asking for both AccessShareLock and RowExclusiveLock on the same
table requires an extra trip through the shared lock manager, for little
value that I can see.

(2) If the DO ALSO rule is run before the main one, we'd be acquiring
AccessShareLock before RowExclusiveLock, resulting in deadlock hazard
due to lock upgrade.  (I think this may be a pre-existing bug, although
it could likely only manifest in corner cases such as where we're pulling
a plan tree out of plancache.  In most cases the first thing we'd acquire
on a rule target table is RowExclusiveLock in the parser, before any
rule rewriting could happen.)

4. I also notice some cases (all in FDW tests) where ExecOpenScanRelation
is choosing AccessShareLock although the RTE has RowShareLock.  These seem
to all be cases where we're implementing FOR UPDATE/SHARE via
ROW_MARK_COPY, so that this:

        ExecRowMark *erm = ExecFindRowMark(estate, scanrelid, true);

        if (erm != NULL && erm->relation != NULL)
            lockmode = NoLock;

leaves lockmode as AccessShareLock because erm->relation is NULL.
Again, I think this is probably OK, and it'd be better to use
RowShareLock for consistency with other places that might open
the rel.  It will be a change in externally-visible behavior though.

Thoughts?

            regards, tom lane

Re: executor relation handling

От

Tom Lane

Дата:

30 сентября 2018 г., 04:05:41

I wrote:
> I started poking at this.  I thought that it would be a good cross-check
> to apply just the "front half" of 0001 (i.e., creation and population of
> the RTE lockmode field), and then to insert checks in each of the
> "back half" places (executor, plancache, etc) that the lockmodes they
> are computing today match what is in the RTE.
> Those checks fell over many times in the regression tests.

Here's a version that doesn't fall over (for me), incorporating fixes
as I suggested for points 1 and 2, and weakening the back-half assertions
enough to let points 3 and 4 succeed.  I'm inclined to commit this to see
if the buildfarm can find any problems I missed.

Some cosmetic changes:

* I renamed the RTE field to rellockmode in the name of greppability.

* I rearranged the order of the arguments for
addRangeTableEntryForRelation to make it more consistent with the
other addRangeTableEntryXXX functions.

            regards, tom lane

diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index 4f1d365..7dfa327 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -1415,6 +1415,7 @@ recordDependencyOnSingleRelExpr(const ObjectAddress *depender,
     rte.rtekind = RTE_RELATION;
     rte.relid = relId;
     rte.relkind = RELKIND_RELATION; /* no need for exactness here */
+    rte.rellockmode = AccessShareLock;

     context.rtables = list_make1(list_make1(&rte));

diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 9176f62..4ad5868 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -2549,6 +2549,7 @@ AddRelationNewConstraints(Relation rel,
     pstate->p_sourcetext = queryString;
     rte = addRangeTableEntryForRelation(pstate,
                                         rel,
+                                        AccessShareLock,
                                         NULL,
                                         false,
                                         true);
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index d06662b..2d5bc8a 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -833,20 +833,21 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,

     if (stmt->relation)
     {
+        LOCKMODE    lockmode = is_from ? RowExclusiveLock : AccessShareLock;
+        RangeTblEntry *rte;
         TupleDesc    tupDesc;
         List       *attnums;
         ListCell   *cur;
-        RangeTblEntry *rte;

         Assert(!stmt->query);

         /* Open and lock the relation, using the appropriate lock type. */
-        rel = heap_openrv(stmt->relation,
-                          (is_from ? RowExclusiveLock : AccessShareLock));
+        rel = heap_openrv(stmt->relation, lockmode);

         relid = RelationGetRelid(rel);

-        rte = addRangeTableEntryForRelation(pstate, rel, NULL, false, false);
+        rte = addRangeTableEntryForRelation(pstate, rel, lockmode,
+                                            NULL, false, false);
         rte->requiredPerms = (is_from ? ACL_INSERT : ACL_SELECT);

         tupDesc = RelationGetDescr(rel);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 3d82edb..d5cb62d 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -528,6 +528,7 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
     rte->rtekind = RTE_RELATION;
     rte->relid = intoRelationAddr.objectId;
     rte->relkind = relkind;
+    rte->rellockmode = RowExclusiveLock;
     rte->requiredPerms = ACL_INSERT;

     for (attnum = 1; attnum <= intoRelationDesc->rd_att->natts; attnum++)
diff --git a/src/backend/commands/policy.c b/src/backend/commands/policy.c
index cee0ef9..2fd17b2 100644
--- a/src/backend/commands/policy.c
+++ b/src/backend/commands/policy.c
@@ -567,7 +567,9 @@ RemoveRoleFromObjectPolicy(Oid roleid, Oid classid, Oid policy_id)
             qual_expr = stringToNode(qual_value);

             /* Add this rel to the parsestate's rangetable, for dependencies */
-            addRangeTableEntryForRelation(qual_pstate, rel, NULL, false, false);
+            addRangeTableEntryForRelation(qual_pstate, rel,
+                                          AccessShareLock,
+                                          NULL, false, false);

             qual_parse_rtable = qual_pstate->p_rtable;
             free_parsestate(qual_pstate);
@@ -589,8 +591,9 @@ RemoveRoleFromObjectPolicy(Oid roleid, Oid classid, Oid policy_id)
             with_check_qual = stringToNode(with_check_value);

             /* Add this rel to the parsestate's rangetable, for dependencies */
-            addRangeTableEntryForRelation(with_check_pstate, rel, NULL, false,
-                                          false);
+            addRangeTableEntryForRelation(with_check_pstate, rel,
+                                          AccessShareLock,
+                                          NULL, false, false);

             with_check_parse_rtable = with_check_pstate->p_rtable;
             free_parsestate(with_check_pstate);
@@ -752,11 +755,13 @@ CreatePolicy(CreatePolicyStmt *stmt)

     /* Add for the regular security quals */
     rte = addRangeTableEntryForRelation(qual_pstate, target_table,
+                                        AccessShareLock,
                                         NULL, false, false);
     addRTEtoQuery(qual_pstate, rte, false, true, true);

     /* Add for the with-check quals */
     rte = addRangeTableEntryForRelation(with_check_pstate, target_table,
+                                        AccessShareLock,
                                         NULL, false, false);
     addRTEtoQuery(with_check_pstate, rte, false, true, true);

@@ -928,6 +933,7 @@ AlterPolicy(AlterPolicyStmt *stmt)
         ParseState *qual_pstate = make_parsestate(NULL);

         rte = addRangeTableEntryForRelation(qual_pstate, target_table,
+                                            AccessShareLock,
                                             NULL, false, false);

         addRTEtoQuery(qual_pstate, rte, false, true, true);
@@ -950,6 +956,7 @@ AlterPolicy(AlterPolicyStmt *stmt)
         ParseState *with_check_pstate = make_parsestate(NULL);

         rte = addRangeTableEntryForRelation(with_check_pstate, target_table,
+                                            AccessShareLock,
                                             NULL, false, false);

         addRTEtoQuery(with_check_pstate, rte, false, true, true);
@@ -1096,8 +1103,9 @@ AlterPolicy(AlterPolicyStmt *stmt)
             qual = stringToNode(qual_value);

             /* Add this rel to the parsestate's rangetable, for dependencies */
-            addRangeTableEntryForRelation(qual_pstate, target_table, NULL,
-                                          false, false);
+            addRangeTableEntryForRelation(qual_pstate, target_table,
+                                          AccessShareLock,
+                                          NULL, false, false);

             qual_parse_rtable = qual_pstate->p_rtable;
             free_parsestate(qual_pstate);
@@ -1137,8 +1145,9 @@ AlterPolicy(AlterPolicyStmt *stmt)
             with_check_qual = stringToNode(with_check_value);

             /* Add this rel to the parsestate's rangetable, for dependencies */
-            addRangeTableEntryForRelation(with_check_pstate, target_table, NULL,
-                                          false, false);
+            addRangeTableEntryForRelation(with_check_pstate, target_table,
+                                          AccessShareLock,
+                                          NULL, false, false);

             with_check_parse_rtable = with_check_pstate->p_rtable;
             free_parsestate(with_check_pstate);
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 29d8353..9e60bb3 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -13632,7 +13632,8 @@ transformPartitionSpec(Relation rel, PartitionSpec *partspec, char *strategy)
      * rangetable entry.  We need a ParseState for transformExpr.
      */
     pstate = make_parsestate(NULL);
-    rte = addRangeTableEntryForRelation(pstate, rel, NULL, false, true);
+    rte = addRangeTableEntryForRelation(pstate, rel, AccessShareLock,
+                                        NULL, false, true);
     addRTEtoQuery(pstate, rte, true, true, true);

     /* take care of any partition expressions */
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 36778b6..b2de1cc 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -577,10 +577,12 @@ CreateTrigger(CreateTrigStmt *stmt, const char *queryString,
          * 'OLD' must always have varno equal to 1 and 'NEW' equal to 2.
          */
         rte = addRangeTableEntryForRelation(pstate, rel,
+                                            AccessShareLock,
                                             makeAlias("old", NIL),
                                             false, false);
         addRTEtoQuery(pstate, rte, false, true, true);
         rte = addRangeTableEntryForRelation(pstate, rel,
+                                            AccessShareLock,
                                             makeAlias("new", NIL),
                                             false, false);
         addRTEtoQuery(pstate, rte, false, true, true);
diff --git a/src/backend/commands/view.c b/src/backend/commands/view.c
index ffb71c0..e73f1d8 100644
--- a/src/backend/commands/view.c
+++ b/src/backend/commands/view.c
@@ -353,7 +353,7 @@ DefineViewRules(Oid viewOid, Query *viewParse, bool replace)
  * by 2...
  *
  * These extra RT entries are not actually used in the query,
- * except for run-time permission checking.
+ * except for run-time locking and permission checking.
  *---------------------------------------------------------------
  */
 static Query *
@@ -386,9 +386,11 @@ UpdateRangeTableOfViewParse(Oid viewOid, Query *viewParse)
      * OLD first, then NEW....
      */
     rt_entry1 = addRangeTableEntryForRelation(pstate, viewRel,
+                                              AccessShareLock,
                                               makeAlias("old", NIL),
                                               false, false);
     rt_entry2 = addRangeTableEntryForRelation(pstate, viewRel,
+                                              AccessShareLock,
                                               makeAlias("new", NIL),
                                               false, false);
     /* Must override addRangeTableEntry's default access-check flags */
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 5443cbf..2b7cf26 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -864,6 +864,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
             Relation    resultRelation;

             resultRelationOid = getrelid(resultRelationIndex, rangeTable);
+            Assert(rt_fetch(resultRelationIndex, rangeTable)->rellockmode == RowExclusiveLock);
             resultRelation = heap_open(resultRelationOid, RowExclusiveLock);

             InitResultRelInfo(resultRelInfo,
@@ -904,6 +905,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
                 Relation    resultRelDesc;

                 resultRelOid = getrelid(resultRelIndex, rangeTable);
+                Assert(rt_fetch(resultRelIndex, rangeTable)->rellockmode == RowExclusiveLock);
                 resultRelDesc = heap_open(resultRelOid, RowExclusiveLock);
                 InitResultRelInfo(resultRelInfo,
                                   resultRelDesc,
@@ -924,8 +926,11 @@ InitPlan(QueryDesc *queryDesc, int eflags)
                 /* We locked the roots above. */
                 if (!list_member_int(plannedstmt->rootResultRelations,
                                      resultRelIndex))
+                {
+                    Assert(rt_fetch(resultRelIndex, rangeTable)->rellockmode == RowExclusiveLock);
                     LockRelationOid(getrelid(resultRelIndex, rangeTable),
                                     RowExclusiveLock);
+                }
             }
         }
     }
@@ -955,6 +960,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
     {
         PlanRowMark *rc = (PlanRowMark *) lfirst(l);
         Oid            relid;
+        LOCKMODE    rellockmode;
         Relation    relation;
         ExecRowMark *erm;

@@ -964,6 +970,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)

         /* get relation's OID (will produce InvalidOid if subquery) */
         relid = getrelid(rc->rti, rangeTable);
+        rellockmode = rt_fetch(rc->rti, rangeTable)->rellockmode;

         /*
          * If you change the conditions under which rel locks are acquired
@@ -975,9 +982,13 @@ InitPlan(QueryDesc *queryDesc, int eflags)
             case ROW_MARK_NOKEYEXCLUSIVE:
             case ROW_MARK_SHARE:
             case ROW_MARK_KEYSHARE:
+                Assert(rellockmode == RowShareLock);
                 relation = heap_open(relid, RowShareLock);
                 break;
             case ROW_MARK_REFERENCE:
+                /* RTE might be a query target table */
+                Assert(rellockmode == AccessShareLock ||
+                       rellockmode == RowExclusiveLock);
                 relation = heap_open(relid, AccessShareLock);
                 break;
             case ROW_MARK_COPY:
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 5b3eaec..da84d5d 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -657,20 +657,32 @@ ExecOpenScanRelation(EState *estate, Index scanrelid, int eflags)
     /*
      * Determine the lock type we need.  First, scan to see if target relation
      * is a result relation.  If not, check if it's a FOR UPDATE/FOR SHARE
-     * relation.  In either of those cases, we got the lock already.
+     * relation.
+     *
+     * Note: we may have already gotten the desired lock type, but for now
+     * don't try to optimize; this logic is going away soon anyhow.
      */
     lockmode = AccessShareLock;
     if (ExecRelationIsTargetRelation(estate, scanrelid))
-        lockmode = NoLock;
+        lockmode = RowExclusiveLock;
     else
     {
         /* Keep this check in sync with InitPlan! */
         ExecRowMark *erm = ExecFindRowMark(estate, scanrelid, true);

-        if (erm != NULL && erm->relation != NULL)
-            lockmode = NoLock;
+        if (erm != NULL)
+        {
+            if (erm->markType == ROW_MARK_REFERENCE ||
+                erm->markType == ROW_MARK_COPY)
+                lockmode = AccessShareLock;
+            else
+                lockmode = RowShareLock;
+        }
     }

+    /* lockmode per above logic must not be more than we previously acquired */
+    Assert(lockmode <= rt_fetch(scanrelid, estate->es_range_table)->rellockmode);
+
     /* Open the relation and acquire lock as needed */
     reloid = getrelid(scanrelid, estate->es_range_table);
     rel = heap_open(reloid, lockmode);
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 7c8220c..925cb8f 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2356,6 +2356,7 @@ _copyRangeTblEntry(const RangeTblEntry *from)
     COPY_SCALAR_FIELD(rtekind);
     COPY_SCALAR_FIELD(relid);
     COPY_SCALAR_FIELD(relkind);
+    COPY_SCALAR_FIELD(rellockmode);
     COPY_NODE_FIELD(tablesample);
     COPY_NODE_FIELD(subquery);
     COPY_SCALAR_FIELD(security_barrier);
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 378f2fa..3bb91c9 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -2630,6 +2630,7 @@ _equalRangeTblEntry(const RangeTblEntry *a, const RangeTblEntry *b)
     COMPARE_SCALAR_FIELD(rtekind);
     COMPARE_SCALAR_FIELD(relid);
     COMPARE_SCALAR_FIELD(relkind);
+    COMPARE_SCALAR_FIELD(rellockmode);
     COMPARE_NODE_FIELD(tablesample);
     COMPARE_NODE_FIELD(subquery);
     COMPARE_SCALAR_FIELD(security_barrier);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 93f1e2c..22dbae1 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -3131,6 +3131,7 @@ _outRangeTblEntry(StringInfo str, const RangeTblEntry *node)
         case RTE_RELATION:
             WRITE_OID_FIELD(relid);
             WRITE_CHAR_FIELD(relkind);
+            WRITE_INT_FIELD(rellockmode);
             WRITE_NODE_FIELD(tablesample);
             break;
         case RTE_SUBQUERY:
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 519deab..ce55658 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1361,6 +1361,7 @@ _readRangeTblEntry(void)
         case RTE_RELATION:
             READ_OID_FIELD(relid);
             READ_CHAR_FIELD(relkind);
+            READ_INT_FIELD(rellockmode);
             READ_NODE_FIELD(tablesample);
             break;
         case RTE_SUBQUERY:
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 22c010c..89625f4 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -6021,6 +6021,7 @@ plan_cluster_use_sort(Oid tableOid, Oid indexOid)
     rte->rtekind = RTE_RELATION;
     rte->relid = tableOid;
     rte->relkind = RELKIND_RELATION;    /* Don't be too picky. */
+    rte->rellockmode = AccessShareLock;
     rte->lateral = false;
     rte->inh = false;
     rte->inFromCl = true;
@@ -6143,6 +6144,7 @@ plan_create_index_workers(Oid tableOid, Oid indexOid)
     rte->rtekind = RTE_RELATION;
     rte->relid = tableOid;
     rte->relkind = RELKIND_RELATION;    /* Don't be too picky. */
+    rte->rellockmode = AccessShareLock;
     rte->lateral = false;
     rte->inh = true;
     rte->inFromCl = true;
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index c020600..226927b 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -1037,6 +1037,7 @@ transformOnConflictClause(ParseState *pstate,
          */
         exclRte = addRangeTableEntryForRelation(pstate,
                                                 targetrel,
+                                                RowExclusiveLock,
                                                 makeAlias("excluded", NIL),
                                                 false, false);
         exclRte->relkind = RELKIND_COMPOSITE_TYPE;
diff --git a/src/backend/parser/parse_clause.c b/src/backend/parser/parse_clause.c
index d6b93f5..660011a 100644
--- a/src/backend/parser/parse_clause.c
+++ b/src/backend/parser/parse_clause.c
@@ -217,6 +217,7 @@ setTargetTable(ParseState *pstate, RangeVar *relation,
      * Now build an RTE.
      */
     rte = addRangeTableEntryForRelation(pstate, pstate->p_target_relation,
+                                        RowExclusiveLock,
                                         relation->alias, inh, false);
     pstate->p_target_rangetblentry = rte;

diff --git a/src/backend/parser/parse_relation.c b/src/backend/parser/parse_relation.c
index 60b8de0..da600dc 100644
--- a/src/backend/parser/parse_relation.c
+++ b/src/backend/parser/parse_relation.c
@@ -1208,15 +1208,22 @@ addRangeTableEntry(ParseState *pstate,
     rte->alias = alias;

     /*
+     * Identify the type of lock we'll need on this relation.  It's not the
+     * query's target table (that case is handled elsewhere), so we need
+     * either RowShareLock if it's locked by FOR UPDATE/SHARE, or plain
+     * AccessShareLock otherwise.
+     */
+    lockmode = isLockedRefname(pstate, refname) ? RowShareLock : AccessShareLock;
+
+    /*
      * Get the rel's OID.  This access also ensures that we have an up-to-date
      * relcache entry for the rel.  Since this is typically the first access
-     * to a rel in a statement, be careful to get the right access level
-     * depending on whether we're doing SELECT FOR UPDATE/SHARE.
+     * to a rel in a statement, we must open the rel with the proper lockmode.
      */
-    lockmode = isLockedRefname(pstate, refname) ? RowShareLock : AccessShareLock;
     rel = parserOpenTable(pstate, relation, lockmode);
     rte->relid = RelationGetRelid(rel);
     rte->relkind = rel->rd_rel->relkind;
+    rte->rellockmode = lockmode;

     /*
      * Build the list of effective column names using user-supplied aliases
@@ -1262,10 +1269,20 @@ addRangeTableEntry(ParseState *pstate,
  *
  * This is just like addRangeTableEntry() except that it makes an RTE
  * given an already-open relation instead of a RangeVar reference.
+ *
+ * lockmode is the lock type required for query execution; it must be one
+ * of AccessShareLock, RowShareLock, or RowExclusiveLock depending on the
+ * RTE's role within the query.  The caller should always hold that lock mode
+ * or a stronger one.
+ *
+ * Note: properly, lockmode should be declared LOCKMODE not int, but that
+ * would require importing storage/lock.h into parse_relation.h.  Since
+ * LOCKMODE is typedef'd as int anyway, that seems like overkill.
  */
 RangeTblEntry *
 addRangeTableEntryForRelation(ParseState *pstate,
                               Relation rel,
+                              int lockmode,
                               Alias *alias,
                               bool inh,
                               bool inFromCl)
@@ -1275,10 +1292,15 @@ addRangeTableEntryForRelation(ParseState *pstate,

     Assert(pstate != NULL);

+    Assert(lockmode == AccessShareLock ||
+           lockmode == RowShareLock ||
+           lockmode == RowExclusiveLock);
+
     rte->rtekind = RTE_RELATION;
     rte->alias = alias;
     rte->relid = RelationGetRelid(rel);
     rte->relkind = rel->rd_rel->relkind;
+    rte->rellockmode = lockmode;

     /*
      * Build the list of effective column names using user-supplied aliases
diff --git a/src/backend/parser/parse_utilcmd.c b/src/backend/parser/parse_utilcmd.c
index f5c1e2a..42bf9bf 100644
--- a/src/backend/parser/parse_utilcmd.c
+++ b/src/backend/parser/parse_utilcmd.c
@@ -2526,7 +2526,9 @@ transformIndexStmt(Oid relid, IndexStmt *stmt, const char *queryString)
      * relation, but we still need to open it.
      */
     rel = relation_open(relid, NoLock);
-    rte = addRangeTableEntryForRelation(pstate, rel, NULL, false, true);
+    rte = addRangeTableEntryForRelation(pstate, rel,
+                                        AccessShareLock,
+                                        NULL, false, true);

     /* no to join list, yes to namespaces */
     addRTEtoQuery(pstate, rte, false, true, true);
@@ -2635,9 +2637,11 @@ transformRuleStmt(RuleStmt *stmt, const char *queryString,
      * qualification.
      */
     oldrte = addRangeTableEntryForRelation(pstate, rel,
+                                           AccessShareLock,
                                            makeAlias("old", NIL),
                                            false, false);
     newrte = addRangeTableEntryForRelation(pstate, rel,
+                                           AccessShareLock,
                                            makeAlias("new", NIL),
                                            false, false);
     /* Must override addRangeTableEntry's default access-check flags */
@@ -2733,9 +2737,11 @@ transformRuleStmt(RuleStmt *stmt, const char *queryString,
              * them in the joinlist.
              */
             oldrte = addRangeTableEntryForRelation(sub_pstate, rel,
+                                                   AccessShareLock,
                                                    makeAlias("old", NIL),
                                                    false, false);
             newrte = addRangeTableEntryForRelation(sub_pstate, rel,
+                                                   AccessShareLock,
                                                    makeAlias("new", NIL),
                                                    false, false);
             oldrte->requiredPerms = 0;
@@ -2938,6 +2944,7 @@ transformAlterTableStmt(Oid relid, AlterTableStmt *stmt,
     pstate->p_sourcetext = queryString;
     rte = addRangeTableEntryForRelation(pstate,
                                         rel,
+                                        AccessShareLock,
                                         NULL,
                                         false,
                                         true);
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index acc6498..6e420d8 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -795,7 +795,8 @@ copy_table(Relation rel)
     copybuf = makeStringInfo();

     pstate = make_parsestate(NULL);
-    addRangeTableEntryForRelation(pstate, rel, NULL, false, false);
+    addRangeTableEntryForRelation(pstate, rel, AccessShareLock,
+                                  NULL, false, false);

     attnamelist = make_copy_attnamelist(relmapentry);
     cstate = BeginCopyFrom(pstate, rel, NULL, false, copy_read_data, attnamelist, NIL);
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 42345f9..e247539 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -199,6 +199,7 @@ create_estate_for_relation(LogicalRepRelMapEntry *rel)
     rte->rtekind = RTE_RELATION;
     rte->relid = RelationGetRelid(rel->localrel);
     rte->relkind = rel->localrel->rd_rel->relkind;
+    rte->rellockmode = AccessShareLock;
     estate->es_range_table = list_make1(rte);

     resultRelInfo = makeNode(ResultRelInfo);
diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c
index 327e5c3..50788fe 100644
--- a/src/backend/rewrite/rewriteHandler.c
+++ b/src/backend/rewrite/rewriteHandler.c
@@ -89,6 +89,10 @@ static Bitmapset *adjust_view_column_set(Bitmapset *cols, List *targetlist);
  *      These locks will ensure that the relation schemas don't change under us
  *      while we are rewriting and planning the query.
  *
+ * Caution: this may modify the querytree, therefore caller should usually
+ * have done a copyObject() to make a writable copy of the querytree in the
+ * current memory context.
+ *
  * forExecute indicates that the query is about to be executed.
  * If so, we'll acquire RowExclusiveLock on the query's resultRelation,
  * RowShareLock on any relation accessed FOR [KEY] UPDATE/SHARE, and
@@ -101,13 +105,11 @@ static Bitmapset *adjust_view_column_set(Bitmapset *cols, List *targetlist);
  * forUpdatePushedDown indicates that a pushed-down FOR [KEY] UPDATE/SHARE
  * applies to the current subquery, requiring all rels to be opened with at
  * least RowShareLock.  This should always be false at the top of the
- * recursion.  This flag is ignored if forExecute is false.
+ * recursion.  This flag is ignored if forExecute is false.  If it is true,
+ * though, we adjust RTE rellockmode fields to reflect the higher lock level.
  *
  * A secondary purpose of this routine is to fix up JOIN RTE references to
- * dropped columns (see details below).  Because the RTEs are modified in
- * place, it is generally appropriate for the caller of this routine to have
- * first done a copyObject() to make a writable copy of the querytree in the
- * current memory context.
+ * dropped columns (see details below).  Such RTEs are modified in-place.
  *
  * This processing can, and for efficiency's sake should, be skipped when the
  * querytree has just been built by the parser: parse analysis already got
@@ -170,12 +172,22 @@ AcquireRewriteLocks(Query *parsetree,
                     lockmode = AccessShareLock;
                 else if (rt_index == parsetree->resultRelation)
                     lockmode = RowExclusiveLock;
-                else if (forUpdatePushedDown ||
-                         get_parse_rowmark(parsetree, rt_index) != NULL)
+                else if (forUpdatePushedDown)
+                {
+                    lockmode = RowShareLock;
+                    /* Upgrade RTE's lock mode to reflect pushed-down lock */
+                    if (rte->rellockmode == AccessShareLock)
+                        rte->rellockmode = RowShareLock;
+                }
+                else if (get_parse_rowmark(parsetree, rt_index) != NULL)
                     lockmode = RowShareLock;
                 else
                     lockmode = AccessShareLock;

+                Assert(!forExecute || lockmode == rte->rellockmode ||
+                       (lockmode == AccessShareLock &&
+                        rte->rellockmode == RowExclusiveLock));
+
                 rel = heap_open(rte->relid, lockmode);

                 /*
@@ -1593,6 +1605,7 @@ ApplyRetrieveRule(Query *parsetree,
     /* Clear fields that should not be set in a subquery RTE */
     rte->relid = InvalidOid;
     rte->relkind = 0;
+    rte->rellockmode = 0;
     rte->tablesample = NULL;
     rte->inh = false;            /* must not be set for a subquery */

@@ -2920,8 +2933,14 @@ rewriteTargetView(Query *parsetree, Relation view)
      * very much like what the planner would do to "pull up" the view into the
      * outer query.  Perhaps someday we should refactor things enough so that
      * we can share code with the planner.)
+     *
+     * Be sure to set rellockmode to the correct thing for the target table.
+     * Since we copied the whole viewquery above, we can just scribble on
+     * base_rte instead of copying it.
      */
-    new_rte = (RangeTblEntry *) base_rte;
+    new_rte = base_rte;
+    new_rte->rellockmode = RowExclusiveLock;
+
     parsetree->rtable = lappend(parsetree->rtable, new_rte);
     new_rt_index = list_length(parsetree->rtable);

@@ -3101,8 +3120,8 @@ rewriteTargetView(Query *parsetree, Relation view)

         new_exclRte = addRangeTableEntryForRelation(make_parsestate(NULL),
                                                     base_rel,
-                                                    makeAlias("excluded",
-                                                              NIL),
+                                                    RowExclusiveLock,
+                                                    makeAlias("excluded", NIL),
                                                     false, false);
         new_exclRte->relkind = RELKIND_COMPOSITE_TYPE;
         new_exclRte->requiredPerms = 0;
diff --git a/src/backend/utils/adt/ri_triggers.c b/src/backend/utils/adt/ri_triggers.c
index fc034ce..049b204 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -1878,12 +1878,14 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
     pkrte->rtekind = RTE_RELATION;
     pkrte->relid = RelationGetRelid(pk_rel);
     pkrte->relkind = pk_rel->rd_rel->relkind;
+    pkrte->rellockmode = AccessShareLock;
     pkrte->requiredPerms = ACL_SELECT;

     fkrte = makeNode(RangeTblEntry);
     fkrte->rtekind = RTE_RELATION;
     fkrte->relid = RelationGetRelid(fk_rel);
     fkrte->relkind = fk_rel->rd_rel->relkind;
+    fkrte->rellockmode = AccessShareLock;
     fkrte->requiredPerms = ACL_SELECT;

     for (i = 0; i < riinfo->nkeys; i++)
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index eecd64e..f6693ea 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -1000,6 +1000,7 @@ pg_get_triggerdef_worker(Oid trigid, bool pretty)
         oldrte->rtekind = RTE_RELATION;
         oldrte->relid = trigrec->tgrelid;
         oldrte->relkind = relkind;
+        oldrte->rellockmode = AccessShareLock;
         oldrte->alias = makeAlias("old", NIL);
         oldrte->eref = oldrte->alias;
         oldrte->lateral = false;
@@ -1010,6 +1011,7 @@ pg_get_triggerdef_worker(Oid trigid, bool pretty)
         newrte->rtekind = RTE_RELATION;
         newrte->relid = trigrec->tgrelid;
         newrte->relkind = relkind;
+        newrte->rellockmode = AccessShareLock;
         newrte->alias = makeAlias("new", NIL);
         newrte->eref = newrte->alias;
         newrte->lateral = false;
@@ -3206,6 +3208,7 @@ deparse_context_for(const char *aliasname, Oid relid)
     rte->rtekind = RTE_RELATION;
     rte->relid = relid;
     rte->relkind = RELKIND_RELATION;    /* no need for exactness here */
+    rte->rellockmode = AccessShareLock;
     rte->alias = makeAlias(aliasname, NIL);
     rte->eref = rte->alias;
     rte->lateral = false;
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 7271b58..15df189 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -1539,6 +1539,8 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
             else
                 lockmode = AccessShareLock;

+            Assert(lockmode == rte->rellockmode);
+
             if (acquire)
                 LockRelationOid(rte->relid, lockmode);
             else
@@ -1609,6 +1611,8 @@ ScanQueryForLocks(Query *parsetree, bool acquire)
                     lockmode = RowShareLock;
                 else
                     lockmode = AccessShareLock;
+                Assert(lockmode == rte->rellockmode ||
+                       (lockmode == AccessShareLock && rte->rellockmode == RowExclusiveLock));
                 if (acquire)
                     LockRelationOid(rte->relid, lockmode);
                 else
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index 07e0576..1e064d4 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -53,6 +53,6 @@
  */

 /*                            yyyymmddN */
-#define CATALOG_VERSION_NO    201809241
+#define CATALOG_VERSION_NO    201809291

 #endif
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 62209a8..3056819 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -973,9 +973,21 @@ typedef struct RangeTblEntry
      * that the tuple format of the tuplestore is the same as the referenced
      * relation.  This allows plans referencing AFTER trigger transition
      * tables to be invalidated if the underlying table is altered.
+     *
+     * rellockmode is really LOCKMODE, but it's declared int to avoid having
+     * to include lock-related headers here.  It must be RowExclusiveLock if
+     * the RTE is an INSERT/UPDATE/DELETE target, else RowShareLock if the RTE
+     * is a SELECT FOR UPDATE/FOR SHARE source, else AccessShareLock.
+     *
+     * Note: in some cases, rule expansion may result in RTEs that are marked
+     * with RowExclusiveLock even though they are not the target of the
+     * current query; this happens if a DO ALSO rule simply scans the original
+     * target table.  We leave such RTEs with their original lockmode so as to
+     * avoid getting an additional, lesser lock.
      */
     Oid            relid;            /* OID of the relation */
     char        relkind;        /* relation kind (see pg_class.relkind) */
+    int            rellockmode;    /* lock level that query requires on the rel */
     struct TableSampleClause *tablesample;    /* sampling info, or NULL */

     /*
diff --git a/src/include/parser/parse_relation.h b/src/include/parser/parse_relation.h
index b9792ac..687a7d7 100644
--- a/src/include/parser/parse_relation.h
+++ b/src/include/parser/parse_relation.h
@@ -69,6 +69,7 @@ extern RangeTblEntry *addRangeTableEntry(ParseState *pstate,
                    bool inFromCl);
 extern RangeTblEntry *addRangeTableEntryForRelation(ParseState *pstate,
                               Relation rel,
+                              int lockmode,
                               Alias *alias,
                               bool inh,
                               bool inFromCl);
diff --git a/src/test/modules/test_rls_hooks/test_rls_hooks.c b/src/test/modules/test_rls_hooks/test_rls_hooks.c
index cab67a6..d492697 100644
--- a/src/test/modules/test_rls_hooks/test_rls_hooks.c
+++ b/src/test/modules/test_rls_hooks/test_rls_hooks.c
@@ -80,8 +80,8 @@ test_rls_hooks_permissive(CmdType cmdtype, Relation relation)

     qual_pstate = make_parsestate(NULL);

-    rte = addRangeTableEntryForRelation(qual_pstate, relation, NULL, false,
-                                        false);
+    rte = addRangeTableEntryForRelation(qual_pstate, relation, AccessShareLock,
+                                        NULL, false, false);
     addRTEtoQuery(qual_pstate, rte, false, true, true);

     role = ObjectIdGetDatum(ACL_ID_PUBLIC);
@@ -143,8 +143,8 @@ test_rls_hooks_restrictive(CmdType cmdtype, Relation relation)

     qual_pstate = make_parsestate(NULL);

-    rte = addRangeTableEntryForRelation(qual_pstate, relation, NULL, false,
-                                        false);
+    rte = addRangeTableEntryForRelation(qual_pstate, relation, AccessShareLock,
+                                        NULL, false, false);
     addRTEtoQuery(qual_pstate, rte, false, true, true);

     role = ObjectIdGetDatum(ACL_ID_PUBLIC);

Re: executor relation handling

От

Tom Lane

Дата:

30 сентября 2018 г., 23:18:14

I wrote:
> 1. You set up transformRuleStmt to insert AccessExclusiveLock into
> the "OLD" and "NEW" RTEs for a view.  This is surely wrong; we do
> not want to take exclusive lock on a view just to run a query using
> the view.  It should (usually, anyway) just be AccessShareLock.
> However, because addRangeTableEntryForRelation insists that you
> hold the requested lock type *now*, just changing the parameter
> to AccessShareLock doesn't work.
> I hacked around this for the moment by passing NoLock to
> addRangeTableEntryForRelation and then changing rte->lockmode
> after it returns, but man that's ugly.  It makes me wonder whether
> addRangeTableEntryForRelation should be checking the lockmode at all.

It occurred to me that it'd be reasonable to insist that the caller
holds a lock *at least as strong* as the one being recorded in the RTE,
and that there's also been discussions about verifying that some lock
is held when something like heap_open(foo, NoLock) is attempted.
So I dusted off the part of 0001 that did that, producing the
attached delta patch.

Unfortunately, I can't commit this, because it exposes at least two
pre-existing bugs :-(.  So we'll need to fix those first, which seems
like it should be a separate thread.  I'm just parking this here for
the moment.

I think that the call sites should ultimately look like

    Assert(CheckRelationLockedByMe(...));

but for hunting down the places where the assertion currently fails,
it's more convenient if it's just an elog(WARNING).

            regards, tom lane

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 3395445..f8a55ee 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -1137,6 +1137,11 @@ relation_open(Oid relationId, LOCKMODE lockmode)
     if (!RelationIsValid(r))
         elog(ERROR, "could not open relation with OID %u", relationId);

+    if (lockmode == NoLock &&
+        !CheckRelationLockedByMe(r, AccessShareLock, true))
+        elog(WARNING, "relation_open: no lock held on %s",
+             RelationGetRelationName(r));
+
     /* Make note that we've accessed a temporary relation */
     if (RelationUsesLocalBuffers(r))
         MyXactFlags |= XACT_FLAGS_ACCESSEDTEMPREL;
@@ -1183,6 +1188,11 @@ try_relation_open(Oid relationId, LOCKMODE lockmode)
     if (!RelationIsValid(r))
         elog(ERROR, "could not open relation with OID %u", relationId);

+    if (lockmode == NoLock &&
+        !CheckRelationLockedByMe(r, AccessShareLock, true))
+        elog(WARNING, "try_relation_open: no lock held on %s",
+             RelationGetRelationName(r));
+
     /* Make note that we've accessed a temporary relation */
     if (RelationUsesLocalBuffers(r))
         MyXactFlags |= XACT_FLAGS_ACCESSEDTEMPREL;
@@ -8084,6 +8094,10 @@ ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_changed, bool *

     idx_rel = RelationIdGetRelation(replidindex);

+    if (!CheckRelationLockedByMe(idx_rel, AccessShareLock, true))
+        elog(WARNING, "ExtractReplicaIdentity: no lock held on %s",
+             RelationGetRelationName(idx_rel));
+
     /* deform tuple, so we have fast access to columns */
     heap_deform_tuple(tp, desc, values, nulls);

diff --git a/src/backend/parser/parse_relation.c b/src/backend/parser/parse_relation.c
index da600dc..aaf5544 100644
--- a/src/backend/parser/parse_relation.c
+++ b/src/backend/parser/parse_relation.c
@@ -28,6 +28,7 @@
 #include "parser/parse_enr.h"
 #include "parser/parse_relation.h"
 #include "parser/parse_type.h"
+#include "storage/lmgr.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
 #include "utils/rel.h"
@@ -1272,7 +1273,7 @@ addRangeTableEntry(ParseState *pstate,
  *
  * lockmode is the lock type required for query execution; it must be one
  * of AccessShareLock, RowShareLock, or RowExclusiveLock depending on the
- * RTE's role within the query.  The caller should always hold that lock mode
+ * RTE's role within the query.  The caller must hold that lock mode
  * or a stronger one.
  *
  * Note: properly, lockmode should be declared LOCKMODE not int, but that
@@ -1295,6 +1296,9 @@ addRangeTableEntryForRelation(ParseState *pstate,
     Assert(lockmode == AccessShareLock ||
            lockmode == RowShareLock ||
            lockmode == RowExclusiveLock);
+    if (!CheckRelationLockedByMe(rel, lockmode, true))
+        elog(WARNING, "addRangeTableEntryForRelation: no lock held on %s",
+             RelationGetRelationName(rel));

     rte->rtekind = RTE_RELATION;
     rte->alias = alias;
diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c
index dc0a439..517ff8e 100644
--- a/src/backend/storage/lmgr/lmgr.c
+++ b/src/backend/storage/lmgr/lmgr.c
@@ -288,6 +288,51 @@ UnlockRelation(Relation relation, LOCKMODE lockmode)
 }

 /*
+ *        CheckRelationLockedByMe
+ *
+ * Returns true if current transaction holds a lock on 'relation' of mode
+ * 'lockmode'.  If 'orstronger' is true, a stronger lockmode is also OK.
+ * ("Stronger" is defined as "numerically higher", which is a bit
+ * semantically dubious but is OK for the purposes we use this for.)
+ */
+bool
+CheckRelationLockedByMe(Relation relation, LOCKMODE lockmode, bool orstronger)
+{
+    LOCKTAG        tag;
+
+    SET_LOCKTAG_RELATION(tag,
+                         relation->rd_lockInfo.lockRelId.dbId,
+                         relation->rd_lockInfo.lockRelId.relId);
+
+    if (LockHeldByMe(&tag, lockmode))
+        return true;
+
+    if (orstronger)
+    {
+        LOCKMODE    slockmode;
+
+        for (slockmode = lockmode + 1;
+             slockmode <= AccessExclusiveLock;
+             slockmode++)
+        {
+            if (LockHeldByMe(&tag, slockmode))
+            {
+#ifdef NOT_USED
+                /* Sometimes this might be useful for debugging purposes */
+                elog(WARNING, "lock mode %s substituted for %s on relation %s",
+                     GetLockmodeName(tag.locktag_lockmethodid, slockmode),
+                     GetLockmodeName(tag.locktag_lockmethodid, lockmode),
+                     RelationGetRelationName(relation));
+#endif
+                return true;
+            }
+        }
+    }
+
+    return false;
+}
+
+/*
  *        LockHasWaitersRelation
  *
  * This is a function to check whether someone else is waiting for a
diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 831ae62..10f6f60 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -564,6 +564,30 @@ DoLockModesConflict(LOCKMODE mode1, LOCKMODE mode2)
 }

 /*
+ * LockHeldByMe -- test whether lock 'locktag' is held with mode 'lockmode'
+ *        by the current transaction
+ */
+bool
+LockHeldByMe(const LOCKTAG *locktag, LOCKMODE lockmode)
+{
+    LOCALLOCKTAG localtag;
+    LOCALLOCK  *locallock;
+
+    /*
+     * See if there is a LOCALLOCK entry for this lock and lockmode
+     */
+    MemSet(&localtag, 0, sizeof(localtag)); /* must clear padding */
+    localtag.lock = *locktag;
+    localtag.mode = lockmode;
+
+    locallock = (LOCALLOCK *) hash_search(LockMethodLocalHash,
+                                          (void *) &localtag,
+                                          HASH_FIND, NULL);
+
+    return (locallock && locallock->nLocks > 0);
+}
+
+/*
  * LockHasWaiters -- look up 'locktag' and check if releasing this
  *        lock would wake up other processes waiting for it.
  */
diff --git a/src/include/storage/lmgr.h b/src/include/storage/lmgr.h
index a217de9..e5356b7 100644
--- a/src/include/storage/lmgr.h
+++ b/src/include/storage/lmgr.h
@@ -45,6 +45,8 @@ extern void UnlockRelationOid(Oid relid, LOCKMODE lockmode);
 extern void LockRelation(Relation relation, LOCKMODE lockmode);
 extern bool ConditionalLockRelation(Relation relation, LOCKMODE lockmode);
 extern void UnlockRelation(Relation relation, LOCKMODE lockmode);
+extern bool CheckRelationLockedByMe(Relation relation, LOCKMODE lockmode,
+                        bool orstronger);
 extern bool LockHasWaitersRelation(Relation relation, LOCKMODE lockmode);

 extern void LockRelationIdForSession(LockRelId *relid, LOCKMODE lockmode);
diff --git a/src/include/storage/lock.h b/src/include/storage/lock.h
index ff4df7f..a37fda7 100644
--- a/src/include/storage/lock.h
+++ b/src/include/storage/lock.h
@@ -540,6 +540,7 @@ extern void LockReleaseAll(LOCKMETHODID lockmethodid, bool allLocks);
 extern void LockReleaseSession(LOCKMETHODID lockmethodid);
 extern void LockReleaseCurrentOwner(LOCALLOCK **locallocks, int nlocks);
 extern void LockReassignCurrentOwner(LOCALLOCK **locallocks, int nlocks);
+extern bool LockHeldByMe(const LOCKTAG *locktag, LOCKMODE lockmode);
 extern bool LockHasWaiters(const LOCKTAG *locktag,
                LOCKMODE lockmode, bool sessionLock);
 extern VirtualTransactionId *GetLockConflicts(const LOCKTAG *locktag,

Re: executor relation handling

От

David Rowley

Дата:

01 октября 2018 г., 04:54:27

On 1 October 2018 at 06:18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> It occurred to me that it'd be reasonable to insist that the caller
> holds a lock *at least as strong* as the one being recorded in the RTE,
> and that there's also been discussions about verifying that some lock
> is held when something like heap_open(foo, NoLock) is attempted.
> So I dusted off the part of 0001 that did that, producing the
> attached delta patch.

My imagination struggles to think of a case, but perhaps one day in
the future we might have a lock manager that coordinates locks on
multiple nodes. If so, is there not a risk that one day we might have
a lock level greater than AccessExclusiveLock, meaning the following
would get broken:

+ for (slockmode = lockmode + 1;
+ slockmode <= AccessExclusiveLock;
+ slockmode++)

For index strategies we do:

#define BTGreaterStrategyNumber 5

#define BTMaxStrategyNumber 5

So would it not be better to add the following to lockdefs.h?

#define MaxLockLevel 8

then use that to terminate the loop.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: executor relation handling

От

Tom Lane

Дата:

01 октября 2018 г., 08:49:18

David Rowley <david.rowley@2ndquadrant.com> writes:
> On 1 October 2018 at 06:18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> + for (slockmode = lockmode + 1;
>> +      slockmode <= AccessExclusiveLock;
>> +      slockmode++)

> So would it not be better to add the following to lockdefs.h?
> #define MaxLockLevel 8
> then use that to terminate the loop.

Good idea, will do.

            regards, tom lane

Re: executor relation handling

От

Amit Langote

Дата:

01 октября 2018 г., 12:39:22

On 2018/09/30 5:04, Tom Lane wrote:
> David Rowley <david.rowley@2ndquadrant.com> writes:
>> I've attached v10 which fixes this and aligns the WARNING text in
>> ExecInitRangeTable() and addRangeTableEntryForRelation().
> 
> I started poking at this.

Thanks a lot for looking at this.

> I thought that it would be a good cross-check
> to apply just the "front half" of 0001 (i.e., creation and population of
> the RTE lockmode field), and then to insert checks in each of the
> "back half" places (executor, plancache, etc) that the lockmodes they
> are computing today match what is in the RTE.
> 
> Those checks fell over many times in the regression tests.
> 
> There seem to be at least four distinct problems:
> 
> 1. You set up transformRuleStmt to insert AccessExclusiveLock into
> the "OLD" and "NEW" RTEs for a view.  This is surely wrong; we do
> not want to take exclusive lock on a view just to run a query using
> the view.  It should (usually, anyway) just be AccessShareLock.
>
> However, because addRangeTableEntryForRelation insists that you
> hold the requested lock type *now*, just changing the parameter
> to AccessShareLock doesn't work.
>
> I hacked around this for the moment by passing NoLock to
> addRangeTableEntryForRelation and then changing rte->lockmode
> after it returns, but man that's ugly.  It makes me wonder whether
> addRangeTableEntryForRelation should be checking the lockmode at all.
> 
> I'm inclined to think we should remove the check for current lockmode
> in addRangeTableEntryForRelation, and instead just assert that the
> passed lockmode must be one of AccessShareLock, RowShareLock, or
> RowExclusiveLock depending on whether the RTE is meant to represent
> a plain source table, a FOR UPDATE/SHARE source table, or a target
> table.  I don't think it's that helpful to be checking that the
> caller got exactly the same lock, especially given the number of
> places where the patch had to cheat already by using NoLock.

Yeah, addRangeTableEntryForRelation should not insist on holding the
"exact" lock, but rather *at least* as strong as the passed in lock mode.
I see that the patch you posted downthread does that.

In the original patch, the lock mode used for handling the CREATE RULE
command was being conflated with the lock mode to require on NEW and OLD
RTEs that will be added to the rule's query.

> 2. The "forUpdatePushedDown" override in AcquireRewriteLocks isn't
> getting modeled in the RTE lockmode fields.  In other words, if we
> have something like
> 
>     CREATE VIEW vv AS SELECT * FROM tab1;
>     SELECT * FROM vv FOR UPDATE OF vv;
> 
> the checks fall over, because the tab1 RTE in the view's rangetable
> just has AccessShareLock, but we need RowShareLock.  I fixed this
> by having AcquireRewriteLocks actually modify the RTE if it is
> promoting the lock level.  This is kinda grotty, but we were already
> assuming that AcquireRewriteLocks could scribble on the query tree,
> so it seems safe enough.

Thanks for fixing that.

> 3. There remain some cases where the RTE says RowExclusiveLock but
> the executor calculation indicates we only need AccessShareLock.
> AFAICT, this happens only when we have a DO ALSO rule that results
> in an added query that merely scans the target table.

I've seen something like that happen for ON CONFLICT's excluded
pseudo-relation RTE too, because the executor deems only the RTE fetched
with result relation RT index to require RowExclusiveLock.

> The RTE used
> for that purpose is just the original one, so it still has a lockmode
> suitable for a target table.
> 
> We could probably hack the rewriter so that it changes the RTE lockmode
> back down to AccessShareLock in these cases.

Is the problematic RTE one coming from an action query?  If so, isn't it
distinct from the original RTE and its rellockmode independently
determined based on whether the action is select or not?

I'm not sure if I understand why we'd need to change its lock mode.

> However, I'm inclined to
> think that that is something *not* to do, and that we should let the
> higher lockmode be used instead, for two reasons:
> 
> (1) Asking for both AccessShareLock and RowExclusiveLock on the same
> table requires an extra trip through the shared lock manager, for little
> value that I can see.
> 
> (2) If the DO ALSO rule is run before the main one, we'd be acquiring
> AccessShareLock before RowExclusiveLock, resulting in deadlock hazard
> due to lock upgrade.  (I think this may be a pre-existing bug, although
> it could likely only manifest in corner cases such as where we're pulling
> a plan tree out of plancache.  In most cases the first thing we'd acquire
> on a rule target table is RowExclusiveLock in the parser, before any
> rule rewriting could happen.)
>
> 4. I also notice some cases (all in FDW tests) where ExecOpenScanRelation
> is choosing AccessShareLock although the RTE has RowShareLock.  These seem
> to all be cases where we're implementing FOR UPDATE/SHARE via
> ROW_MARK_COPY, so that this:
> 
>         ExecRowMark *erm = ExecFindRowMark(estate, scanrelid, true);
> 
>         if (erm != NULL && erm->relation != NULL)
>             lockmode = NoLock;
> 
> leaves lockmode as AccessShareLock because erm->relation is NULL.
> Again, I think this is probably OK, and it'd be better to use
> RowShareLock for consistency with other places that might open
> the rel.  It will be a change in externally-visible behavior though.

For this and the other cases (AcquireRewriteLocks, AcquireExecutorLocks,
etc.), I wonder whether we couldn't just *not* recalculate the lock mode
based on inspecting the query tree to cross-check with rellockmode?  Why
not just use rellockmode for locking?  Maybe, okay to keep doing that in
debug builds though.  Also, are the discrepancies like this to be
considered bugs of the existing logic?

Thanks,
Amit

Re: executor relation handling

От

Amit Langote

Дата:

01 октября 2018 г., 16:20:29

On 2018/10/01 2:18, Tom Lane wrote:
> I wrote:
>> 1. You set up transformRuleStmt to insert AccessExclusiveLock into
>> the "OLD" and "NEW" RTEs for a view.  This is surely wrong; we do
>> not want to take exclusive lock on a view just to run a query using
>> the view.  It should (usually, anyway) just be AccessShareLock.
>> However, because addRangeTableEntryForRelation insists that you
>> hold the requested lock type *now*, just changing the parameter
>> to AccessShareLock doesn't work.
>> I hacked around this for the moment by passing NoLock to
>> addRangeTableEntryForRelation and then changing rte->lockmode
>> after it returns, but man that's ugly.  It makes me wonder whether
>> addRangeTableEntryForRelation should be checking the lockmode at all.
> 
> It occurred to me that it'd be reasonable to insist that the caller
> holds a lock *at least as strong* as the one being recorded in the RTE,
> and that there's also been discussions about verifying that some lock
> is held when something like heap_open(foo, NoLock) is attempted.
> So I dusted off the part of 0001 that did that, producing the
> attached delta patch.
> 
> Unfortunately, I can't commit this, because it exposes at least two
> pre-existing bugs :-(.  So we'll need to fix those first, which seems
> like it should be a separate thread.  I'm just parking this here for
> the moment.
> 
> I think that the call sites should ultimately look like
> 
>     Assert(CheckRelationLockedByMe(...));
> 
> but for hunting down the places where the assertion currently fails,
> it's more convenient if it's just an elog(WARNING).

Should this check that we're not in a parallel worker process?

Thanks,
Amit

Re: executor relation handling

От

Tom Lane

Дата:

01 октября 2018 г., 19:45:36

Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes:
> On 2018/09/30 5:04, Tom Lane wrote:
>> 3. There remain some cases where the RTE says RowExclusiveLock but
>> the executor calculation indicates we only need AccessShareLock.
>> AFAICT, this happens only when we have a DO ALSO rule that results
>> in an added query that merely scans the target table.

> I've seen something like that happen for ON CONFLICT's excluded
> pseudo-relation RTE too, because the executor deems only the RTE fetched
> with result relation RT index to require RowExclusiveLock.

OK, I had not carefully inspected every case.

> For this and the other cases (AcquireRewriteLocks, AcquireExecutorLocks,
> etc.), I wonder whether we couldn't just *not* recalculate the lock mode
> based on inspecting the query tree to cross-check with rellockmode?  Why
> not just use rellockmode for locking?

Right, that's exactly where we want to end up.  This intermediate state of
the patch is just an attempt to verify that we understand when and how
relying on rellockmode will change the behavior.

> Also, are the discrepancies like this to be
> considered bugs of the existing logic?

Mmm ... hard to say.  In the DO ALSO case, it's possible that query
execution would take AccessShareLock and then RowExclusiveLock, which
at least in principle creates a lock-upgrade deadlock hazard.  So I
think standardizing on taking the rellockmode will be an improvement,
but I don't know that I'd call the existing behavior a bug; I certainly
wouldn't risk trying to back-patch a change for it.

In the ROW_MARK_COPY case, it's conceivable that with the existing code
query re-execution would take only AccessShareLock on a FOR UPDATE target
table, where the parser had originally taken RowShareLock.  Again, it
seems like making the behavior more consistent is an improvement, but
not something we'd try to back-patch.

            regards, tom lane

Re: executor relation handling

От

Tom Lane

Дата:

01 октября 2018 г., 19:49:44

Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes:
> On 2018/10/01 2:18, Tom Lane wrote:
>> I think that the call sites should ultimately look like
>> Assert(CheckRelationLockedByMe(...));
>> but for hunting down the places where the assertion currently fails,
>> it's more convenient if it's just an elog(WARNING).

> Should this check that we're not in a parallel worker process?

Hmm.  I've not seen any failures in the parallel parts of the regular
regression tests, but maybe I'd better do a force_parallel_mode
run before committing.

In general, I'm not on board with the idea that parallel workers don't
need to get their own locks, so I don't really want to exclude parallel
workers from this check.  But if it's not safe for that today, fixing it
is beyond the scope of this particular patch.

            regards, tom lane

Re: executor relation handling

От

David Rowley

Дата:

02 октября 2018 г., 01:04:22

On 1 October 2018 at 19:39, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
> For this and the other cases (AcquireRewriteLocks, AcquireExecutorLocks,
> etc.), I wonder whether we couldn't just *not* recalculate the lock mode
> based on inspecting the query tree to cross-check with rellockmode?  Why
> not just use rellockmode for locking?  Maybe, okay to keep doing that in
> debug builds though.  Also, are the discrepancies like this to be
> considered bugs of the existing logic?

I got the impression Tom was just leaving that in for a while to let
the buildfarm verify the new code is getting the same lock level as
the old code. Of course, I might be wrong.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: executor relation handling

От

Tom Lane

Дата:

02 октября 2018 г., 03:25:40

David Rowley <david.rowley@2ndquadrant.com> writes:
> On 1 October 2018 at 19:39, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>> For this and the other cases (AcquireRewriteLocks, AcquireExecutorLocks,
>> etc.), I wonder whether we couldn't just *not* recalculate the lock mode
>> based on inspecting the query tree to cross-check with rellockmode?  Why
>> not just use rellockmode for locking?  Maybe, okay to keep doing that in
>> debug builds though.  Also, are the discrepancies like this to be
>> considered bugs of the existing logic?

> I got the impression Tom was just leaving that in for a while to let
> the buildfarm verify the new code is getting the same lock level as
> the old code. Of course, I might be wrong.

Yeah, exactly.  My plan is to next switch to taking the locks based on
rellockmode, and then if that doesn't show any problems to switch the
executor to just Assert that there's already a suitable lock, and then
lastly to proceed with ripping out the no-longer-needed logic that
supports the downstream calculations of lockmode.  So a bit more granular
than what Amit submitted, but we'll get to the same place in the end,
with more confidence that we didn't break anything.

            regards, tom lane

Re: executor relation handling

От

Tom Lane

Дата:

04 октября 2018 г., 02:16:11

I wrote:
> Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes:
>> Should this check that we're not in a parallel worker process?

> Hmm.  I've not seen any failures in the parallel parts of the regular
> regression tests, but maybe I'd better do a force_parallel_mode
> run before committing.
> In general, I'm not on board with the idea that parallel workers don't
> need to get their own locks, so I don't really want to exclude parallel
> workers from this check.  But if it's not safe for that today, fixing it
> is beyond the scope of this particular patch.

So the place where that came out in the wash is the commit I just made
(9a3cebeaa) to change the executor from taking table locks to asserting
that somebody else took them already.  To make that work, I had to make
both ExecOpenScanRelation and relation_open skip checking for lock-held
if IsParallelWorker().

This makes me entirely uncomfortable with the idea that parallel workers
can be allowed to not take any locks of their own.  There is no basis
for arguing that we have field proof that that's safe, because *up to
now, parallel workers in fact did take their own locks*.  And it seems
unsafe on its face, because there's nothing that really guarantees that
the parent process won't go away while children are still running.
(elog(FATAL) seems like a counterexample, for instance.)

I think that we ought to adjust parallel query to insist that children
do take locks, and then revert the IsParallelWorker() exceptions I made
here.  I plan to leave that point in abeyance till we've got the rest
of these changes in place, though.  The easiest way to do it will
doubtless change once we've centralized the executor's table-opening
logic, so trying to code it right now seems like a waste of effort.

            regards, tom lane

Re: executor relation handling

От

Amit Langote

Дата:

04 октября 2018 г., 12:59:36

On 2018/10/04 5:16, Tom Lane wrote:
> I wrote:
>> Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes:
>>> Should this check that we're not in a parallel worker process?
> 
>> Hmm.  I've not seen any failures in the parallel parts of the regular
>> regression tests, but maybe I'd better do a force_parallel_mode
>> run before committing.
>> In general, I'm not on board with the idea that parallel workers don't
>> need to get their own locks, so I don't really want to exclude parallel
>> workers from this check.  But if it's not safe for that today, fixing it
>> is beyond the scope of this particular patch.
> 
> So the place where that came out in the wash is the commit I just made
> (9a3cebeaa) to change the executor from taking table locks to asserting
> that somebody else took them already.

Thanks for getting that done.

> To make that work, I had to make
> both ExecOpenScanRelation and relation_open skip checking for lock-held
> if IsParallelWorker().

Yeah, I had to do that to when rebasing the remaining patches.

> This makes me entirely uncomfortable with the idea that parallel workers
> can be allowed to not take any locks of their own.  There is no basis
> for arguing that we have field proof that that's safe, because *up to
> now, parallel workers in fact did take their own locks*.  And it seems
> unsafe on its face, because there's nothing that really guarantees that
> the parent process won't go away while children are still running.
> (elog(FATAL) seems like a counterexample, for instance.)
> 
> I think that we ought to adjust parallel query to insist that children
> do take locks, and then revert the IsParallelWorker() exceptions I made
> here.

Maybe I'm missing something here, but isn't the necessary adjustment just
that the relations are opened with locks if inside a parallel worker?

>  I plan to leave that point in abeyance till we've got the rest
> of these changes in place, though.  The easiest way to do it will
> doubtless change once we've centralized the executor's table-opening
> logic, so trying to code it right now seems like a waste of effort.

Okay.

I've rebased the remaining patches.  I broke down one of the patches into
2 and re-ordered the patches as follows:

0001: introduces a function that opens range table relations and maintains
them in an array indexes by RT index

0002: introduces a new field in EState that's an array of RangeTblEntry
pointers and revises macros used in the executor that access RTEs to
return them from the array (David Rowley co-authored this one)

0003: moves result relation and ExecRowMark initialization out of InitPlan
and into ExecInit* routines of respective nodes

0004: removes useless fields from certain planner nodes whose only purpose
has been to assist the executor lock relations in proper order

0005: teaches planner to remove PlanRowMarks corresponding to dummy relations

Thanks,
Amit

Вложения

Re: executor relation handling

От

Tom Lane

Дата:

04 октября 2018 г., 20:19:24

Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes:
> On 2018/10/04 5:16, Tom Lane wrote:
>> I think that we ought to adjust parallel query to insist that children
>> do take locks, and then revert the IsParallelWorker() exceptions I made
>> here.

> Maybe I'm missing something here, but isn't the necessary adjustment just
> that the relations are opened with locks if inside a parallel worker?

Yeah, that's one plausible way to fix it.  I hadn't wanted to prejudge
the best way before we finish the other changes, though.

> I've rebased the remaining patches.  I broke down one of the patches into
> 2 and re-ordered the patches as follows:

Thanks, will start looking at these today.

            regards, tom lane

Re: executor relation handling

От

Andres Freund

Дата:

05 октября 2018 г., 01:12:27

Hi,

On 2018-10-03 16:16:11 -0400, Tom Lane wrote:
> I wrote:
> > Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes:
> >> Should this check that we're not in a parallel worker process?
> 
> > Hmm.  I've not seen any failures in the parallel parts of the regular
> > regression tests, but maybe I'd better do a force_parallel_mode
> > run before committing.
> > In general, I'm not on board with the idea that parallel workers don't
> > need to get their own locks, so I don't really want to exclude parallel
> > workers from this check.  But if it's not safe for that today, fixing it
> > is beyond the scope of this particular patch.
> 
> So the place where that came out in the wash is the commit I just made
> (9a3cebeaa) to change the executor from taking table locks to asserting
> that somebody else took them already.  To make that work, I had to make
> both ExecOpenScanRelation and relation_open skip checking for lock-held
> if IsParallelWorker().
> 
> This makes me entirely uncomfortable with the idea that parallel workers
> can be allowed to not take any locks of their own.  There is no basis
> for arguing that we have field proof that that's safe, because *up to
> now, parallel workers in fact did take their own locks*.  And it seems
> unsafe on its face, because there's nothing that really guarantees that
> the parent process won't go away while children are still running.
> (elog(FATAL) seems like a counterexample, for instance.)

> I think that we ought to adjust parallel query to insist that children
> do take locks, and then revert the IsParallelWorker() exceptions I made
> here.  I plan to leave that point in abeyance till we've got the rest
> of these changes in place, though.  The easiest way to do it will
> doubtless change once we've centralized the executor's table-opening
> logic, so trying to code it right now seems like a waste of effort.

I've not really followed this thread, and just caught up to here.  It
seems entirely unacceptable to not acquire locks on workers to me.
Maybe I'm missing something, but why do/did the patches in this thread
require that / introduce that? We didn't have that kind of concept
before, no?  The group locking stuff should rely / require that kind of
thing, no?

Greetings,

Andres Freund

Re: executor relation handling

От

Tom Lane

Дата:

05 октября 2018 г., 01:27:59

Andres Freund <andres@anarazel.de> writes:
> I've not really followed this thread, and just caught up to here.  It
> seems entirely unacceptable to not acquire locks on workers to me.
> Maybe I'm missing something, but why do/did the patches in this thread
> require that / introduce that? We didn't have that kind of concept
> before, no?  The group locking stuff should rely / require that kind of
> thing, no?

I'm possibly confused, but I thought that the design of parallel query
involved an expectation that workers didn't need to get their own locks.
What we've determined so far in this thread is that workers *do* get
their own locks (or did before yesterday), but I'd been supposing that
that was accidental not intentional.

In any case, I definitely intend that they will be getting their own
locks again after the dust has settled.  Panic not.

            regards, tom lane

Re: executor relation handling

От

Andres Freund

Дата:

05 октября 2018 г., 01:34:44

Hi,

On 2018-10-04 15:27:59 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > I've not really followed this thread, and just caught up to here.  It
> > seems entirely unacceptable to not acquire locks on workers to me.
> > Maybe I'm missing something, but why do/did the patches in this thread
> > require that / introduce that? We didn't have that kind of concept
> > before, no?  The group locking stuff should rely / require that kind of
> > thing, no?
> 
> I'm possibly confused, but I thought that the design of parallel query
> involved an expectation that workers didn't need to get their own
> locks.

Not as far as I'm aware of - but I'm not exactly the expert
there. There's an exception that some lock classes don't conflict
between the leader and the workers - that's group locking
(a1c1af2a1f60). But the locks still have to be acquired, and I think
it's quite dangerous not to do so.  The group locking logic is required
because otherwise it'd be trivial to get into deadlocks, and some of the
restrictions around parallel query are required to make that safe.

> What we've determined so far in this thread is that workers *do* get
> their own locks (or did before yesterday), but I'd been supposing that
> that was accidental not intentional.

I don't think it was accidental.

Greetings,

Andres Freund

Re: executor relation handling

От

Andres Freund

Дата:

05 октября 2018 г., 01:56:09

On 2018-10-04 12:34:44 -0700, Andres Freund wrote:
> Hi,
>
> On 2018-10-04 15:27:59 -0400, Tom Lane wrote:
> > Andres Freund <andres@anarazel.de> writes:
> > > I've not really followed this thread, and just caught up to here.  It
> > > seems entirely unacceptable to not acquire locks on workers to me.
> > > Maybe I'm missing something, but why do/did the patches in this thread
> > > require that / introduce that? We didn't have that kind of concept
> > > before, no?  The group locking stuff should rely / require that kind of
> > > thing, no?
> >
> > I'm possibly confused, but I thought that the design of parallel query
> > involved an expectation that workers didn't need to get their own
> > locks.
>
> Not as far as I'm aware of - but I'm not exactly the expert
> there. There's an exception that some lock classes don't conflict
> between the leader and the workers - that's group locking
> (a1c1af2a1f60). But the locks still have to be acquired, and I think
> it's quite dangerous not to do so.  The group locking logic is required
> because otherwise it'd be trivial to get into deadlocks, and some of the
> restrictions around parallel query are required to make that safe.

Re-read docs + code just to make sure.  Here's the relevant readme parts:

src/backend/access/transam/README.parallel

To prevent unprincipled deadlocks when running in parallel mode, this code
also arranges for the leader and all workers to participate in group
locking.  See src/backend/storage/lmgr/README for more details.

src/backend/storage/lmgr/README:

Group Locking
-------------

As if all of that weren't already complicated enough, PostgreSQL now supports
parallelism (see src/backend/access/transam/README.parallel), which means that
we might need to resolve deadlocks that occur between gangs of related
processes rather than individual processes.  This doesn't change the basic
deadlock detection algorithm very much, but it makes the bookkeeping more
complicated.

We choose to regard locks held by processes in the same parallel group as
non-conflicting.  This means that two processes in a parallel group can hold a
self-exclusive lock on the same relation at the same time, or one process can
acquire an AccessShareLock while the other already holds AccessExclusiveLock.
This might seem dangerous and could be in some cases (more on that below), but
if we didn't do this then parallel query would be extremely prone to
self-deadlock.  For example, a parallel query against a relation on which the
leader already had AccessExclusiveLock would hang, because the workers would
try to lock the same relation and be blocked by the leader; yet the leader
can't finish until it receives completion indications from all workers.  An
undetected deadlock results.  This is far from the only scenario where such a
problem happens.  The same thing will occur if the leader holds only
AccessShareLock, the worker seeks AccessShareLock, but between the time the
leader attempts to acquire the lock and the time the worker attempts to
acquire it, some other process queues up waiting for an AccessExclusiveLock.
In this case, too, an indefinite hang results.

...

So yes, locks are expected to be acquired in workers.

Greetings,

Andres Freund

Re: executor relation handling

От

Robert Haas

Дата:

05 октября 2018 г., 01:57:40

On Thu, Oct 4, 2018 at 3:28 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'm possibly confused, but I thought that the design of parallel query
> involved an expectation that workers didn't need to get their own locks.

You are, indeed, confused.  A heck of a lot of effort went into making
sure that the workers COULD take their own locks, and into trying to
make sure that didn't break anything.  That effort may or may not have
been entirely successful, but I'm pretty sure that having them NOT
take locks is going to be a lot worse.

> What we've determined so far in this thread is that workers *do* get
> their own locks (or did before yesterday), but I'd been supposing that
> that was accidental not intentional.

Nope, that was intentional.

> In any case, I definitely intend that they will be getting their own
> locks again after the dust has settled.  Panic not.

/me unloads metaphorical bazooka.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: executor relation handling

От

Tom Lane

Дата:

05 октября 2018 г., 02:40:25

Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Oct 4, 2018 at 3:28 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> What we've determined so far in this thread is that workers *do* get
>> their own locks (or did before yesterday), but I'd been supposing that
>> that was accidental not intentional.

> Nope, that was intentional.

Fair enough --- in which case, the patch series we're working on here
was broken to suppose that it could get away with removing that.

As I said, I'll make sure the locking is back before I finish with this.

            regards, tom lane

Re: executor relation handling

От

Tom Lane

Дата:

05 октября 2018 г., 02:59:57

Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes:
> I've rebased the remaining patches.  I broke down one of the patches into
> 2 and re-ordered the patches as follows:

> 0001: introduces a function that opens range table relations and maintains
> them in an array indexes by RT index

> 0002: introduces a new field in EState that's an array of RangeTblEntry
> pointers and revises macros used in the executor that access RTEs to
> return them from the array (David Rowley co-authored this one)

I've pushed 0001 and 0002 with mostly cosmetic changes.  One thing I
wanted to point out explicitly, though, is that I found this bit of 0002
to be a seriously bad idea:

--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -20,6 +20,7 @@
 #include "executor/instrument.h"
 #include "lib/pairingheap.h"
 #include "nodes/params.h"
+#include "nodes/parsenodes.h"
 #include "nodes/plannodes.h"
 #include "utils/hsearch.h"
 #include "utils/queryenvironment.h"

Please do not add #includes of fundamental headers to other fundamental
headers without clearing it with somebody.  There's little enough
structure to our header collection now.  I don't want to end up in a
situation where effectively the entire header set gets pulled into
every .c file, or worse that we have actual reference loops in the
headers.  (This is not an academic problem; somebody actually created
such a loop awhile back.  Cleaning it up, by the time we'd recognized
the problem, was really painful.)

> 0003: moves result relation and ExecRowMark initialization out of InitPlan
> and into ExecInit* routines of respective nodes

I am finding myself pretty unconvinced by this one; it seems like mostly
a random reallocation of responsibility with little advantage.  The
particular thing that brought me to a screeching halt was seeing that
the patch removed ExecFindRowMark, despite the fact that that's part
of our advertised FDW API (see fdwhandler.sgml), and it didn't provide
any alternative way for an FDW to find out at runtime whether it's
subject to a row locking requirement.

I thought for a minute about just leaving the function in place, but
that wouldn't work because both nodeLockRows and nodeModifyTable are
written so that they find^H^H^Hbuild their rowmarks only after recursing
to initialize their child plan nodes; so a child node that tried to use
ExecFindRowMark during ExecInitNode would get the wrong answer.  Of
course, we could consider changing the order of operations during
initialization of those node types, but I'm not really seeing a compelling
reason why we should whack things around that much.

So I'm inclined to just omit 0003.  AFAICS this would only mean that
we couldn't drop the global PlanRowMarks list from PlannedStmt, which
does not bother me much.

> 0005: teaches planner to remove PlanRowMarks corresponding to dummy relations

I'm not entirely sold on the value of that either?

            regards, tom lane

Re: executor relation handling

От

Amit Langote

Дата:

05 октября 2018 г., 08:06:29

On 2018/10/05 5:59, Tom Lane wrote:
> Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes:
>> I've rebased the remaining patches.  I broke down one of the patches into
>> 2 and re-ordered the patches as follows:
> 
>> 0001: introduces a function that opens range table relations and maintains
>> them in an array indexes by RT index
> 
>> 0002: introduces a new field in EState that's an array of RangeTblEntry
>> pointers and revises macros used in the executor that access RTEs to
>> return them from the array (David Rowley co-authored this one)
> 
> I've pushed 0001 and 0002 with mostly cosmetic changes.

Thanks a lot.

> One thing I
> wanted to point out explicitly, though, is that I found this bit of 0002
> to be a seriously bad idea:
> 
> --- a/src/include/nodes/execnodes.h
> +++ b/src/include/nodes/execnodes.h
> @@ -20,6 +20,7 @@
>  #include "executor/instrument.h"
>  #include "lib/pairingheap.h"
>  #include "nodes/params.h"
> +#include "nodes/parsenodes.h"
>  #include "nodes/plannodes.h"
>  #include "utils/hsearch.h"
>  #include "utils/queryenvironment.h"
> 
> Please do not add #includes of fundamental headers to other fundamental
> headers without clearing it with somebody.  There's little enough
> structure to our header collection now.  I don't want to end up in a
> situation where effectively the entire header set gets pulled into
> every .c file, or worse that we have actual reference loops in the
> headers.  (This is not an academic problem; somebody actually created
> such a loop awhile back.  Cleaning it up, by the time we'd recognized
> the problem, was really painful.)

Okay, sorry about that.  I was slightly nervous that I had to do it when
doing it, but forgot to mention that explicitly in the commit message or
the email.

>> 0003: moves result relation and ExecRowMark initialization out of InitPlan
>> and into ExecInit* routines of respective nodes
> 
> I am finding myself pretty unconvinced by this one; it seems like mostly
> a random reallocation of responsibility with little advantage.  The
> particular thing that brought me to a screeching halt was seeing that
> the patch removed ExecFindRowMark, despite the fact that that's part
> of our advertised FDW API (see fdwhandler.sgml), and it didn't provide
> any alternative way for an FDW to find out at runtime whether it's
> subject to a row locking requirement.
> 
> I thought for a minute about just leaving the function in place, but
> that wouldn't work because both nodeLockRows and nodeModifyTable are
> written so that they find^H^H^Hbuild their rowmarks only after recursing
> to initialize their child plan nodes; so a child node that tried to use
> ExecFindRowMark during ExecInitNode would get the wrong answer.  Of
> course, we could consider changing the order of operations during
> initialization of those node types, but I'm not really seeing a compelling
> reason why we should whack things around that much.
> 
> So I'm inclined to just omit 0003.  AFAICS this would only mean that
> we couldn't drop the global PlanRowMarks list from PlannedStmt, which
> does not bother me much.

To be honest, I too had begun to fail to see the point of this patch since
yesterday.  In fact, getting this one to pass make check-world took a bit
of head-scratching due to its interaction with EvalPlanQuals EState
building, so I was almost to drop it from the series.  But I felt that it
might still be a good idea to get rid of what was described as special
case code.  Reading your argument against it based on how it messes up
some of the assumptions regarding ExecRowMark design, I'm willing to let
go of this one as more or less a cosmetic improvement.

>> 0005: teaches planner to remove PlanRowMarks corresponding to dummy relations
> 
> I'm not entirely sold on the value of that either?

If you look at the first email on this thread, you can see that trimming
the PlanRowMarks list down to just the ones needed during execution
results in non-trivial improvement in SELECT FOR SHARE on partitioned
tables with large number of partitions:

<quote>
Speedup is more pronounced with a benchmark that needs RowMarks, because
one of the patches (0003) removes overhead around handling them.

$ cat /tmp/select-lt-for-share.sql
select * from lt where b = 999 for share;

master

tps = 94.095985 (excluding connections establishing)
tps = 93.955702 (excluding connections establishing)

<snip>

patch 0003 (prune PlanRowMarks)

tps = 712.544029 (excluding connections establishing)
tps = 717.540052 (excluding connections establishing)
</quote>

But on reflection, this seems more like adding special case code to the
planner just for this, rather than solving the more general problem of
initializing *any* partition information after pruning, being discussed
over at "speeding up planning with partitions" thread [1].  So, I don't
mind dropping this one too.

So, that leaves us with only the patch that revises the plan node fields
in light of having relieved the executor of doing any locking of its own
in *some* cases.  That patch's premise is that we don't need the fields
that the patch removes because they're only referenced for the locking
purposes.  But, if those plan nodes support parallel execution and hence
will be passed to parallel workers for execution who will need to lock the
tables contained in the plan nodes, then they better contain all the
information needed for locking *every* affected tables, so we had better
not removed it.  Plan nodes in question are Append, MergeAppend, and
ModifyTable, of which only the first two support parallel execution, but
maybe we should just leave all of them alone for now.  IOW, I'm not sure
about that patch either.  Thoughts?

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/9d7c5112-cb99-6a47-d3be-cf1ee6862a1d@lab.ntt.co.jp

Re: executor relation handling

От

Tom Lane

Дата:

07 октября 2018 г., 00:59:08

Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes:
> On 2018/10/05 5:59, Tom Lane wrote:
>> So I'm inclined to just omit 0003.  AFAICS this would only mean that
>> we couldn't drop the global PlanRowMarks list from PlannedStmt, which
>> does not bother me much.

> To be honest, I too had begun to fail to see the point of this patch since
> yesterday.  In fact, getting this one to pass make check-world took a bit
> of head-scratching due to its interaction with EvalPlanQuals EState
> building, so I was almost to drop it from the series.  But I felt that it
> might still be a good idea to get rid of what was described as special
> case code.  Reading your argument against it based on how it messes up
> some of the assumptions regarding ExecRowMark design, I'm willing to let
> go of this one as more or less a cosmetic improvement.

OK.  We do need to fix the comments in InitPlan that talk about acquiring
locks and how that makes us need to do things in a particular order, but
I'll go take care of that.

> So, that leaves us with only the patch that revises the plan node fields
> in light of having relieved the executor of doing any locking of its own
> in *some* cases.  That patch's premise is that we don't need the fields
> that the patch removes because they're only referenced for the locking
> purposes.  But, if those plan nodes support parallel execution and hence
> will be passed to parallel workers for execution who will need to lock the
> tables contained in the plan nodes, then they better contain all the
> information needed for locking *every* affected tables, so we had better
> not removed it.

No, I think this is unduly pessimistic.  We need to make sure that a
parallel worker has lock on tables it's actually touching, but I don't
see why that should imply a requirement to hold lock on parent tables
it never touches.

The reasons why we need locks on tables not physically accessed by the
query are (a) to ensure that we've blocked, or received sinval messages
for, any DDL related to views or partition parent tables, in case that
would invalidate the plan; (b) to allow firing triggers safely, in
the case of partition parent tables.  Neither of these issues apply to
a parallel worker -- the plan is already frozen before it can ever
start, and it isn't going to be firing any triggers either.

In particular, I think it's fine to get rid of
ExecLockNonLeafAppendTables.  In the parent, that's clearly useless code
now: we have already locked *every* RTE_RELATION entry in the rangetable,
either when the parser/rewriter/planner added the RTE to the list to begin
with, or during AcquireExecutorLocks if the plan was retrieved from the
plancache.  In a child it seems unnecessary as long as we're locking the
leaf rels we actually touch.

Possibly, we might fix the problem of inadequate locking in worker
processes by having them do the equivalent of AcquireExecutorLocks, ie
just run through the whole RT list and lock everything.  I think we'd soon
end up doing that if we ever try to allow parallelization of non-read-only
queries; but that's a long way off AFAIK.  For read-only queries it seems
like it'll be fine if we just rejigger ExecGetRangeTableRelation to take a
lock when IsParallelWorker.

            regards, tom lane

Re: executor relation handling

От

Tom Lane

Дата:

08 октября 2018 г., 00:55:07

Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes:
> 0004: removes useless fields from certain planner nodes whose only purpose
> has been to assist the executor lock relations in proper order

I've pushed most of 0004 now; obviously, not the parts removing
PlannedStmt.rowMarks, since that's not possible without rearrangement
of the executor's RowMark handling.

I didn't like the idea of unifying ModifyTable.nominalRelation with
the partition root info.  Those fields serve different masters ---
nominalRelation, at least in its original intent, is only meant for
use of EXPLAIN and might have nothing to do with what happens at
execution.  So even though unifying them would work today, we might
regret it down the line.  Instead I left that field alone and added
a separate rootRelation field to carry the partition root RT index,
which ends up being the same number of fields anyway since we don't
need a flag for is-the-nominal-relation-a-partition-root.

Still need to think a bit more about whether we want 0005 in
anything like its current form.

            regards, tom lane

Re: executor relation handling

От

Tom Lane

Дата:

08 октября 2018 г., 05:18:15

I wrote:
> Still need to think a bit more about whether we want 0005 in
> anything like its current form.

So I poked at that for a bit, and soon realized that the *main* problem
there is that ExecFindRowMark() eats O(N^2) time due to repeated searches
of the es_rowMarks list.  While the patch as stated would improve that
for cases where most of the partitions can be pruned at plan time, it
does nothing much for cases where they can't.  However, it's pretty
trivial to fix that: let's just use an array not a list.  Patch 0001
attached does that.

A further improvement we could consider is to avoid opening the relcache
entries for pruned-away relations.  I could not find a way to do that
that was less invasive than removing ExecRowMark.relation and requiring
callers to call a new function to get the relation if they need it.
Patch 0002 attached is a delta on top of 0001 that does that.

Replicating your select-lt-for-share test case as best I can (you never
actually specified it carefully), I find that the TPS rate on my
workstation goes from about 250 tps with HEAD to 920 tps with patch 0001
or 1130 tps with patch 0002.  This compares to about 1600 tps for the
non-FOR-SHARE version of the query.

However, we should keep in mind that without partitioning overhead
(ie "select * from lt_999 where b = 999 for share"), the TPS rate
is over 25800 tps.  Most of the overhead in the partitioned case seems
to be from acquiring locks on rangetable entries that we won't ever
use, and none of these patch variants are touching that problem.
So ISTM that the *real* win for this scenario is going to come from
teaching the system to prune unwanted relations from the query
altogether, not just from the PlanRowMark list.

Keeping that comparison in mind, I'm inclined to think that 0001
is the best thing to do for now.  The incremental win from 0002
is not big enough to justify the API break it creates, while your
0005 is not really attacking the problem the right way.

            regards, tom lane

diff --git a/src/backend/executor/execCurrent.c b/src/backend/executor/execCurrent.c
index 7480cf5..aadf749 100644
--- a/src/backend/executor/execCurrent.c
+++ b/src/backend/executor/execCurrent.c
@@ -91,21 +91,22 @@ execCurrentOf(CurrentOfExpr *cexpr,
      * the other code can't, while the non-FOR-UPDATE case allows use of WHERE
      * CURRENT OF with an insensitive cursor.
      */
-    if (queryDesc->estate->es_rowMarks)
+    if (queryDesc->estate->es_rowmarks)
     {
         ExecRowMark *erm;
-        ListCell   *lc;
+        Index        i;

         /*
          * Here, the query must have exactly one FOR UPDATE/SHARE reference to
          * the target table, and we dig the ctid info out of that.
          */
         erm = NULL;
-        foreach(lc, queryDesc->estate->es_rowMarks)
+        for (i = 0; i < queryDesc->estate->es_range_table_size; i++)
         {
-            ExecRowMark *thiserm = (ExecRowMark *) lfirst(lc);
+            ExecRowMark *thiserm = queryDesc->estate->es_rowmarks[i];

-            if (!RowMarkRequiresRowShareLock(thiserm->markType))
+            if (thiserm == NULL ||
+                !RowMarkRequiresRowShareLock(thiserm->markType))
                 continue;        /* ignore non-FOR UPDATE/SHARE items */

             if (thiserm->relid == table_oid)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index b6abad5..0a766ef 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -909,61 +909,68 @@ InitPlan(QueryDesc *queryDesc, int eflags)
     }

     /*
-     * Next, build the ExecRowMark list from the PlanRowMark(s), if any.
+     * Next, build the ExecRowMark array from the PlanRowMark(s), if any.
      */
-    estate->es_rowMarks = NIL;
-    foreach(l, plannedstmt->rowMarks)
+    if (plannedstmt->rowMarks)
     {
-        PlanRowMark *rc = (PlanRowMark *) lfirst(l);
-        Oid            relid;
-        Relation    relation;
-        ExecRowMark *erm;
+        estate->es_rowmarks = (ExecRowMark **)
+            palloc0(estate->es_range_table_size * sizeof(ExecRowMark *));
+        foreach(l, plannedstmt->rowMarks)
+        {
+            PlanRowMark *rc = (PlanRowMark *) lfirst(l);
+            Oid            relid;
+            Relation    relation;
+            ExecRowMark *erm;

-        /* ignore "parent" rowmarks; they are irrelevant at runtime */
-        if (rc->isParent)
-            continue;
+            /* ignore "parent" rowmarks; they are irrelevant at runtime */
+            if (rc->isParent)
+                continue;

-        /* get relation's OID (will produce InvalidOid if subquery) */
-        relid = exec_rt_fetch(rc->rti, estate)->relid;
+            /* get relation's OID (will produce InvalidOid if subquery) */
+            relid = exec_rt_fetch(rc->rti, estate)->relid;

-        /* open relation, if we need to access it for this mark type */
-        switch (rc->markType)
-        {
-            case ROW_MARK_EXCLUSIVE:
-            case ROW_MARK_NOKEYEXCLUSIVE:
-            case ROW_MARK_SHARE:
-            case ROW_MARK_KEYSHARE:
-            case ROW_MARK_REFERENCE:
-                relation = ExecGetRangeTableRelation(estate, rc->rti);
-                break;
-            case ROW_MARK_COPY:
-                /* no physical table access is required */
-                relation = NULL;
-                break;
-            default:
-                elog(ERROR, "unrecognized markType: %d", rc->markType);
-                relation = NULL;    /* keep compiler quiet */
-                break;
-        }
+            /* open relation, if we need to access it for this mark type */
+            switch (rc->markType)
+            {
+                case ROW_MARK_EXCLUSIVE:
+                case ROW_MARK_NOKEYEXCLUSIVE:
+                case ROW_MARK_SHARE:
+                case ROW_MARK_KEYSHARE:
+                case ROW_MARK_REFERENCE:
+                    relation = ExecGetRangeTableRelation(estate, rc->rti);
+                    break;
+                case ROW_MARK_COPY:
+                    /* no physical table access is required */
+                    relation = NULL;
+                    break;
+                default:
+                    elog(ERROR, "unrecognized markType: %d", rc->markType);
+                    relation = NULL;    /* keep compiler quiet */
+                    break;
+            }

-        /* Check that relation is a legal target for marking */
-        if (relation)
-            CheckValidRowMarkRel(relation, rc->markType);
-
-        erm = (ExecRowMark *) palloc(sizeof(ExecRowMark));
-        erm->relation = relation;
-        erm->relid = relid;
-        erm->rti = rc->rti;
-        erm->prti = rc->prti;
-        erm->rowmarkId = rc->rowmarkId;
-        erm->markType = rc->markType;
-        erm->strength = rc->strength;
-        erm->waitPolicy = rc->waitPolicy;
-        erm->ermActive = false;
-        ItemPointerSetInvalid(&(erm->curCtid));
-        erm->ermExtra = NULL;
-
-        estate->es_rowMarks = lappend(estate->es_rowMarks, erm);
+            /* Check that relation is a legal target for marking */
+            if (relation)
+                CheckValidRowMarkRel(relation, rc->markType);
+
+            erm = (ExecRowMark *) palloc(sizeof(ExecRowMark));
+            erm->relation = relation;
+            erm->relid = relid;
+            erm->rti = rc->rti;
+            erm->prti = rc->prti;
+            erm->rowmarkId = rc->rowmarkId;
+            erm->markType = rc->markType;
+            erm->strength = rc->strength;
+            erm->waitPolicy = rc->waitPolicy;
+            erm->ermActive = false;
+            ItemPointerSetInvalid(&(erm->curCtid));
+            erm->ermExtra = NULL;
+
+            Assert(erm->rti > 0 && erm->rti <= estate->es_range_table_size &&
+                   estate->es_rowmarks[erm->rti - 1] == NULL);
+
+            estate->es_rowmarks[erm->rti - 1] = erm;
+        }
     }

     /*
@@ -2394,13 +2401,12 @@ ExecUpdateLockMode(EState *estate, ResultRelInfo *relinfo)
 ExecRowMark *
 ExecFindRowMark(EState *estate, Index rti, bool missing_ok)
 {
-    ListCell   *lc;
-
-    foreach(lc, estate->es_rowMarks)
+    if (rti > 0 && rti <= estate->es_range_table_size &&
+        estate->es_rowmarks != NULL)
     {
-        ExecRowMark *erm = (ExecRowMark *) lfirst(lc);
+        ExecRowMark *erm = estate->es_rowmarks[rti - 1];

-        if (erm->rti == rti)
+        if (erm)
             return erm;
     }
     if (!missing_ok)
@@ -3131,6 +3137,7 @@ EvalPlanQualStart(EPQState *epqstate, EState *parentestate, Plan *planTree)
     estate->es_range_table_array = parentestate->es_range_table_array;
     estate->es_range_table_size = parentestate->es_range_table_size;
     estate->es_relations = parentestate->es_relations;
+    estate->es_rowmarks = parentestate->es_rowmarks;
     estate->es_plannedstmt = parentestate->es_plannedstmt;
     estate->es_junkFilter = parentestate->es_junkFilter;
     estate->es_output_cid = parentestate->es_output_cid;
@@ -3148,7 +3155,6 @@ EvalPlanQualStart(EPQState *epqstate, EState *parentestate, Plan *planTree)
     }
     /* es_result_relation_info must NOT be copied */
     /* es_trig_target_relations must NOT be copied */
-    estate->es_rowMarks = parentestate->es_rowMarks;
     estate->es_top_eflags = parentestate->es_top_eflags;
     estate->es_instrument = parentestate->es_instrument;
     /* es_auxmodifytables must NOT be copied */
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index d98e90e..71c6b5d 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -113,6 +113,7 @@ CreateExecutorState(void)
     estate->es_range_table_array = NULL;
     estate->es_range_table_size = 0;
     estate->es_relations = NULL;
+    estate->es_rowmarks = NULL;
     estate->es_plannedstmt = NULL;

     estate->es_junkFilter = NULL;
@@ -142,8 +143,6 @@ CreateExecutorState(void)

     estate->es_tupleTable = NIL;

-    estate->es_rowMarks = NIL;
-
     estate->es_processed = 0;
     estate->es_lastoid = InvalidOid;

@@ -709,6 +708,12 @@ ExecInitRangeTable(EState *estate, List *rangeTable)
      */
     estate->es_relations = (Relation *)
         palloc0(estate->es_range_table_size * sizeof(Relation));
+
+    /*
+     * es_rowmarks is also parallel to the es_range_table_array, but it's
+     * allocated only if needed.
+     */
+    estate->es_rowmarks = NULL;
 }

 /*
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 657b593..8347089 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -34,6 +34,7 @@

 struct PlanState;                /* forward references in this file */
 struct ParallelHashJoinState;
+struct ExecRowMark;
 struct ExprState;
 struct ExprContext;
 struct RangeTblEntry;            /* avoid including parsenodes.h here */
@@ -491,6 +492,8 @@ typedef struct EState
     Index        es_range_table_size;    /* size of the range table arrays */
     Relation   *es_relations;    /* Array of per-range-table-entry Relation
                                  * pointers, or NULL if not yet opened */
+    struct ExecRowMark **es_rowmarks;    /* Array of per-range-table-entry
+                                         * ExecRowMarks, or NULL if none */
     PlannedStmt *es_plannedstmt;    /* link to top of plan tree */
     const char *es_sourceText;    /* Source text from QueryDesc */

@@ -537,8 +540,6 @@ typedef struct EState

     List       *es_tupleTable;    /* List of TupleTableSlots */

-    List       *es_rowMarks;    /* List of ExecRowMarks */
-
     uint64        es_processed;    /* # of tuples processed */
     Oid            es_lastoid;        /* last oid processed (by INSERT) */

@@ -607,7 +608,9 @@ typedef struct EState
  * node that sources the relation (e.g., for a foreign table the FDW can use
  * ermExtra to hold information).
  *
- * EState->es_rowMarks is a list of these structs.
+ * EState->es_rowmarks is an array of these structs, indexed by RT index,
+ * with NULLs for irrelevant RT indexes.  es_rowmarks itself is NULL if
+ * there are no rowmarks.
  */
 typedef struct ExecRowMark
 {
@@ -629,7 +632,7 @@ typedef struct ExecRowMark
  *       additional runtime representation of FOR [KEY] UPDATE/SHARE clauses
  *
  * Each LockRows and ModifyTable node keeps a list of the rowmarks it needs to
- * deal with.  In addition to a pointer to the related entry in es_rowMarks,
+ * deal with.  In addition to a pointer to the related entry in es_rowmarks,
  * this struct carries the column number(s) of the resjunk columns associated
  * with the rowmark (see comments for PlanRowMark for more detail).  In the
  * case of ModifyTable, there has to be a separate ExecAuxRowMark list for
@@ -638,7 +641,7 @@ typedef struct ExecRowMark
  */
 typedef struct ExecAuxRowMark
 {
-    ExecRowMark *rowmark;        /* related entry in es_rowMarks */
+    ExecRowMark *rowmark;        /* related entry in es_rowmarks */
     AttrNumber    ctidAttNo;        /* resno of ctid junk attribute, if any */
     AttrNumber    toidAttNo;        /* resno of tableoid junk attribute, if any */
     AttrNumber    wholeAttNo;        /* resno of whole-row junk attribute, if any */
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 0a766ef..1775e77 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -919,7 +919,6 @@ InitPlan(QueryDesc *queryDesc, int eflags)
         {
             PlanRowMark *rc = (PlanRowMark *) lfirst(l);
             Oid            relid;
-            Relation    relation;
             ExecRowMark *erm;

             /* ignore "parent" rowmarks; they are irrelevant at runtime */
@@ -929,32 +928,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
             /* get relation's OID (will produce InvalidOid if subquery) */
             relid = exec_rt_fetch(rc->rti, estate)->relid;

-            /* open relation, if we need to access it for this mark type */
-            switch (rc->markType)
-            {
-                case ROW_MARK_EXCLUSIVE:
-                case ROW_MARK_NOKEYEXCLUSIVE:
-                case ROW_MARK_SHARE:
-                case ROW_MARK_KEYSHARE:
-                case ROW_MARK_REFERENCE:
-                    relation = ExecGetRangeTableRelation(estate, rc->rti);
-                    break;
-                case ROW_MARK_COPY:
-                    /* no physical table access is required */
-                    relation = NULL;
-                    break;
-                default:
-                    elog(ERROR, "unrecognized markType: %d", rc->markType);
-                    relation = NULL;    /* keep compiler quiet */
-                    break;
-            }
-
-            /* Check that relation is a legal target for marking */
-            if (relation)
-                CheckValidRowMarkRel(relation, rc->markType);
-
             erm = (ExecRowMark *) palloc(sizeof(ExecRowMark));
-            erm->relation = relation;
             erm->relid = relid;
             erm->rti = rc->rti;
             erm->prti = rc->prti;
@@ -963,6 +937,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
             erm->strength = rc->strength;
             erm->waitPolicy = rc->waitPolicy;
             erm->ermActive = false;
+            erm->ermChecked = false;
             ItemPointerSetInvalid(&(erm->curCtid));
             erm->ermExtra = NULL;

@@ -2462,6 +2437,20 @@ ExecBuildAuxRowMark(ExecRowMark *erm, List *targetlist)
     return aerm;
 }

+Relation
+ExecRowMarkGetRelation(EState *estate, ExecRowMark *erm)
+{
+    Relation relation = ExecGetRangeTableRelation(estate, erm->rti);
+
+    /* Check that relation is a legal target for marking, if we haven't */
+    if (!erm->ermChecked)
+    {
+        CheckValidRowMarkRel(relation, erm->markType);
+        erm->ermChecked = true;
+    }
+    return relation;
+}
+

 /*
  * EvalPlanQual logic --- recheck modified tuple(s) to see if we want to
@@ -2935,10 +2924,9 @@ EvalPlanQualFetchRowMarks(EPQState *epqstate)

         if (erm->markType == ROW_MARK_REFERENCE)
         {
+            Relation relation = ExecRowMarkGetRelation(epqstate->estate, erm);
             HeapTuple    copyTuple;

-            Assert(erm->relation != NULL);
-
             /* fetch the tuple's ctid */
             datum = ExecGetJunkAttribute(epqstate->origslot,
                                          aerm->ctidAttNo,
@@ -2948,18 +2936,18 @@ EvalPlanQualFetchRowMarks(EPQState *epqstate)
                 continue;

             /* fetch requests on foreign tables must be passed to their FDW */
-            if (erm->relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+            if (relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
             {
                 FdwRoutine *fdwroutine;
                 bool        updated = false;

-                fdwroutine = GetFdwRoutineForRelation(erm->relation, false);
+                fdwroutine = GetFdwRoutineForRelation(relation, false);
                 /* this should have been checked already, but let's be safe */
                 if (fdwroutine->RefetchForeignRow == NULL)
                     ereport(ERROR,
                             (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                              errmsg("cannot lock rows in foreign table \"%s\"",
-                                    RelationGetRelationName(erm->relation))));
+                                    RelationGetRelationName(relation))));
                 copyTuple = fdwroutine->RefetchForeignRow(epqstate->estate,
                                                           erm,
                                                           datum,
@@ -2979,7 +2967,7 @@ EvalPlanQualFetchRowMarks(EPQState *epqstate)
                 Buffer        buffer;

                 tuple.t_self = *((ItemPointer) DatumGetPointer(datum));
-                if (!heap_fetch(erm->relation, SnapshotAny, &tuple, &buffer,
+                if (!heap_fetch(relation, SnapshotAny, &tuple, &buffer,
                                 false, NULL))
                     elog(ERROR, "failed to fetch tuple for EvalPlanQual recheck");

diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 6db345a..4e161c0 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -74,6 +74,7 @@ lnext:
     {
         ExecAuxRowMark *aerm = (ExecAuxRowMark *) lfirst(lc);
         ExecRowMark *erm = aerm->rowmark;
+        Relation relation;
         HeapTuple  *testTuple;
         Datum        datum;
         bool        isNull;
@@ -122,19 +123,22 @@ lnext:
         if (isNull)
             elog(ERROR, "ctid is NULL");

+        /* access the tuple's relation */
+        relation = ExecRowMarkGetRelation(estate, erm);
+
         /* requests for foreign tables must be passed to their FDW */
-        if (erm->relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+        if (relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
         {
             FdwRoutine *fdwroutine;
             bool        updated = false;

-            fdwroutine = GetFdwRoutineForRelation(erm->relation, false);
+            fdwroutine = GetFdwRoutineForRelation(relation, false);
             /* this should have been checked already, but let's be safe */
             if (fdwroutine->RefetchForeignRow == NULL)
                 ereport(ERROR,
                         (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                          errmsg("cannot lock rows in foreign table \"%s\"",
-                                RelationGetRelationName(erm->relation))));
+                                RelationGetRelationName(relation))));
             copyTuple = fdwroutine->RefetchForeignRow(estate,
                                                       erm,
                                                       datum,
@@ -180,7 +184,7 @@ lnext:
                 break;
         }

-        test = heap_lock_tuple(erm->relation, &tuple,
+        test = heap_lock_tuple(relation, &tuple,
                                estate->es_output_cid,
                                lockmode, erm->waitPolicy, true,
                                &buffer, &hufd);
@@ -230,7 +234,7 @@ lnext:
                 }

                 /* updated, so fetch and lock the updated version */
-                copyTuple = EvalPlanQualFetch(estate, erm->relation,
+                copyTuple = EvalPlanQualFetch(estate, relation,
                                               lockmode, erm->waitPolicy,
                                               &hufd.ctid, hufd.xmax);

@@ -286,6 +290,7 @@ lnext:
         {
             ExecAuxRowMark *aerm = (ExecAuxRowMark *) lfirst(lc);
             ExecRowMark *erm = aerm->rowmark;
+            Relation relation;
             HeapTupleData tuple;
             Buffer        buffer;

@@ -309,13 +314,16 @@ lnext:
                 continue;
             }

+            /* access the tuple's relation */
+            relation = ExecRowMarkGetRelation(estate, erm);
+
             /* foreign tables should have been fetched above */
-            Assert(erm->relation->rd_rel->relkind != RELKIND_FOREIGN_TABLE);
+            Assert(relation->rd_rel->relkind != RELKIND_FOREIGN_TABLE);
             Assert(ItemPointerIsValid(&(erm->curCtid)));

             /* okay, fetch the tuple */
             tuple.t_self = erm->curCtid;
-            if (!heap_fetch(erm->relation, SnapshotAny, &tuple, &buffer,
+            if (!heap_fetch(relation, SnapshotAny, &tuple, &buffer,
                             false, NULL))
                 elog(ERROR, "failed to fetch tuple for EvalPlanQual recheck");

diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index c9ed198..4dbbb3b 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -190,6 +190,7 @@ extern void ExecWithCheckOptions(WCOKind kind, ResultRelInfo *resultRelInfo,
 extern LockTupleMode ExecUpdateLockMode(EState *estate, ResultRelInfo *relinfo);
 extern ExecRowMark *ExecFindRowMark(EState *estate, Index rti, bool missing_ok);
 extern ExecAuxRowMark *ExecBuildAuxRowMark(ExecRowMark *erm, List *targetlist);
+extern Relation ExecRowMarkGetRelation(EState *estate, ExecRowMark *erm);
 extern TupleTableSlot *EvalPlanQual(EState *estate, EPQState *epqstate,
              Relation relation, Index rti, int lockmode,
              ItemPointer tid, TransactionId priorXmax);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 8347089..e2cd7ea 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -597,16 +597,20 @@ typedef struct EState
  * ExecRowMark -
  *       runtime representation of FOR [KEY] UPDATE/SHARE clauses
  *
- * When doing UPDATE, DELETE, or SELECT FOR [KEY] UPDATE/SHARE, we will have an
- * ExecRowMark for each non-target relation in the query (except inheritance
- * parent RTEs, which can be ignored at runtime).  Virtual relations such as
- * subqueries-in-FROM will have an ExecRowMark with relation == NULL.  See
- * PlanRowMark for details about most of the fields.  In addition to fields
- * directly derived from PlanRowMark, we store an activity flag (to denote
- * inactive children of inheritance trees), curCtid, which is used by the
- * WHERE CURRENT OF code, and ermExtra, which is available for use by the plan
- * node that sources the relation (e.g., for a foreign table the FDW can use
- * ermExtra to hold information).
+ * When doing UPDATE, DELETE, or SELECT FOR [KEY] UPDATE/SHARE, we will have
+ * an ExecRowMark for each non-target relation in the query (except
+ * inheritance parent RTEs, which can be ignored at runtime).  See PlanRowMark
+ * for details about most of the fields.  In addition to fields directly
+ * derived from PlanRowMark, we store an activity flag (to denote inactive
+ * children of inheritance trees); a "checked" flag (to denote whether we've
+ * verified the relation is of appropriate type to be marked); curCtid, which
+ * is used by the WHERE CURRENT OF code; and ermExtra, which is available for
+ * use by the plan node that sources the relation (e.g., for a foreign table
+ * the FDW can use ermExtra to hold information).
+ *
+ * For performance reasons, we don't open or verify the relkind of tables
+ * that are never accessed in a query; this is important in partitioned
+ * tables with many partitions.  Use ExecRowMarkGetRelation to do that.
  *
  * EState->es_rowmarks is an array of these structs, indexed by RT index,
  * with NULLs for irrelevant RT indexes.  es_rowmarks itself is NULL if
@@ -614,8 +618,7 @@ typedef struct EState
  */
 typedef struct ExecRowMark
 {
-    Relation    relation;        /* opened and suitably locked relation */
-    Oid            relid;            /* its OID (or InvalidOid, if subquery) */
+    Oid            relid;            /* relation's OID (InvalidOid if subquery) */
     Index        rti;            /* its range table index */
     Index        prti;            /* parent range table index, if child */
     Index        rowmarkId;        /* unique identifier for resjunk columns */
@@ -623,6 +626,7 @@ typedef struct ExecRowMark
     LockClauseStrength strength;    /* LockingClause's strength, or LCS_NONE */
     LockWaitPolicy waitPolicy;    /* NOWAIT and SKIP LOCKED */
     bool        ermActive;        /* is this mark relevant for current tuple? */
+    bool        ermChecked;        /* have we verified relation's relkind? */
     ItemPointerData curCtid;    /* ctid of currently locked tuple, if any */
     void       *ermExtra;        /* available for use by relation source node */
 } ExecRowMark;

Re: executor relation handling

От

David Rowley

Дата:

08 октября 2018 г., 05:55:35

On 8 October 2018 at 12:18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> However, we should keep in mind that without partitioning overhead
> (ie "select * from lt_999 where b = 999 for share"), the TPS rate
> is over 25800 tps.  Most of the overhead in the partitioned case seems
> to be from acquiring locks on rangetable entries that we won't ever
> use, and none of these patch variants are touching that problem.
> So ISTM that the *real* win for this scenario is going to come from
> teaching the system to prune unwanted relations from the query
> altogether, not just from the PlanRowMark list.

Idle thought:  I wonder if we could add another field to the
RangeTblEntry; "delaylock". Set that to true in the planner for all
other_member rels that are partitions then not obtain locks on those
during AcquireExecutorLocks(). Instead, grab the lock in
ExecGetRangeTableRelation() the first time through.

We'd still obtain the lock for the table named in the query at the
normal time so cached plans could properly be invalidated. We'd need
to ensure that anything that could be changed in the partitions to
cause a plan to become invalid properly obtains a lock on the
partitioned table, all the way to the top of the hierarchy.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: executor relation handling

От

Tom Lane

Дата:

08 октября 2018 г., 06:13:54

David Rowley <david.rowley@2ndquadrant.com> writes:
> On 8 October 2018 at 12:18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> So ISTM that the *real* win for this scenario is going to come from
>> teaching the system to prune unwanted relations from the query
>> altogether, not just from the PlanRowMark list.

> Idle thought:  I wonder if we could add another field to the
> RangeTblEntry; "delaylock". Set that to true in the planner for all
> other_member rels that are partitions then not obtain locks on those
> during AcquireExecutorLocks(). Instead, grab the lock in
> ExecGetRangeTableRelation() the first time through.

Hmm, I'm afraid that's not terribly safe, unless there are ALTER
TABLE restrictions on partitions that I'm not aware of.  For instance,
a leaf partition might have an index it didn't inherit from the parent,
and the query plan might be intending to use that index.  If you don't
take lock on the leaf, you might not know that the index has been dropped
so that a new plan is needed.

The idea I had in mind was to allow hard pruning of any leaf that's
been excluded *at plan time* based on partition constraints seen in
its parent rel(s).  That should be safe enough as long as we take
locks on all the non-leaf rels --- then we'd know about any change
in the constraint situation.

Rather than coping with renumbering the RTEs, it might be easiest
to invent an "RTE_DUMMY" RTE type that a hard-pruned RTE could be
changed to.

            regards, tom lane

Re: executor relation handling

От

David Rowley

Дата:

08 октября 2018 г., 06:29:48

On 8 October 2018 at 13:13, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The idea I had in mind was to allow hard pruning of any leaf that's
> been excluded *at plan time* based on partition constraints seen in
> its parent rel(s).  That should be safe enough as long as we take
> locks on all the non-leaf rels --- then we'd know about any change
> in the constraint situation.
>
> Rather than coping with renumbering the RTEs, it might be easiest
> to invent an "RTE_DUMMY" RTE type that a hard-pruned RTE could be
> changed to.

The problem with that is that, if we get [1] done in PG12, then the
RTE_DUMMYs would not exist, as we'd only have RTEs in the range table
for partitions that survived plan-time pruning.

It also leaves a problem to solve in the unneeded locks being taken on
partitions for PREPAREd queries using run-time pruning.

[1] https://commitfest.postgresql.org/20/1778/

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: executor relation handling

От

Tom Lane

Дата:

08 октября 2018 г., 18:38:44

I wrote:
> Keeping that comparison in mind, I'm inclined to think that 0001
> is the best thing to do for now.  The incremental win from 0002
> is not big enough to justify the API break it creates, while your
> 0005 is not really attacking the problem the right way.

I've pushed 0001 now.  I believe that closes out all the patches
discussed in this thread, so I've marked the CF entry committed.
Thanks for all the hard work!

            regards, tom lane

Re: executor relation handling

От

Tom Lane

Дата:

08 октября 2018 г., 18:38:44

I wrote:
> Keeping that comparison in mind, I'm inclined to think that 0001
> is the best thing to do for now.  The incremental win from 0002
> is not big enough to justify the API break it creates, while your
> 0005 is not really attacking the problem the right way.

I've pushed 0001 now.  I believe that closes out all the patches
discussed in this thread, so I've marked the CF entry committed.
Thanks for all the hard work!

            regards, tom lane

Re: executor relation handling

От

Amit Langote

Дата:

09 октября 2018 г., 09:11:38

On 2018/10/07 3:59, Tom Lane wrote:
> Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes:
>> On 2018/10/05 5:59, Tom Lane wrote:
>>> So I'm inclined to just omit 0003.  AFAICS this would only mean that
>>> we couldn't drop the global PlanRowMarks list from PlannedStmt, which
>>> does not bother me much.
> 
>> To be honest, I too had begun to fail to see the point of this patch since
>> yesterday.  In fact, getting this one to pass make check-world took a bit
>> of head-scratching due to its interaction with EvalPlanQuals EState
>> building, so I was almost to drop it from the series.  But I felt that it
>> might still be a good idea to get rid of what was described as special
>> case code.  Reading your argument against it based on how it messes up
>> some of the assumptions regarding ExecRowMark design, I'm willing to let
>> go of this one as more or less a cosmetic improvement.
> 
> OK.  We do need to fix the comments in InitPlan that talk about acquiring
> locks and how that makes us need to do things in a particular order, but
> I'll go take care of that.

Thanks for doing that.

>> So, that leaves us with only the patch that revises the plan node fields
>> in light of having relieved the executor of doing any locking of its own
>> in *some* cases.  That patch's premise is that we don't need the fields
>> that the patch removes because they're only referenced for the locking
>> purposes.  But, if those plan nodes support parallel execution and hence
>> will be passed to parallel workers for execution who will need to lock the
>> tables contained in the plan nodes, then they better contain all the
>> information needed for locking *every* affected tables, so we had better
>> not removed it.
> 
> No, I think this is unduly pessimistic.  We need to make sure that a
> parallel worker has lock on tables it's actually touching, but I don't
> see why that should imply a requirement to hold lock on parent tables
> it never touches.
> 
> The reasons why we need locks on tables not physically accessed by the
> query are (a) to ensure that we've blocked, or received sinval messages
> for, any DDL related to views or partition parent tables, in case that
> would invalidate the plan; (b) to allow firing triggers safely, in
> the case of partition parent tables.  Neither of these issues apply to
> a parallel worker -- the plan is already frozen before it can ever
> start, and it isn't going to be firing any triggers either.
> 
> In particular, I think it's fine to get rid of
> ExecLockNonLeafAppendTables.  In the parent, that's clearly useless code
> now: we have already locked *every* RTE_RELATION entry in the rangetable,
> either when the parser/rewriter/planner added the RTE to the list to begin
> with, or during AcquireExecutorLocks if the plan was retrieved from the
> plancache.  In a child it seems unnecessary as long as we're locking the
> leaf rels we actually touch.
> 
> Possibly, we might fix the problem of inadequate locking in worker
> processes by having them do the equivalent of AcquireExecutorLocks, ie
> just run through the whole RT list and lock everything.  I think we'd soon
> end up doing that if we ever try to allow parallelization of non-read-only
> queries; but that's a long way off AFAIK.  For read-only queries it seems
> like it'll be fine if we just rejigger ExecGetRangeTableRelation to take a
> lock when IsParallelWorker.

Okay, thanks for the explanation.  It's now clearer to me why parallel
workers don't really need to lock non-leaf relations.

Thanks,
Amit

Re: executor relation handling

От

Amit Langote

Дата:

09 октября 2018 г., 09:11:38

On 2018/10/07 3:59, Tom Lane wrote:
> Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes:
>> On 2018/10/05 5:59, Tom Lane wrote:
>>> So I'm inclined to just omit 0003.  AFAICS this would only mean that
>>> we couldn't drop the global PlanRowMarks list from PlannedStmt, which
>>> does not bother me much.
> 
>> To be honest, I too had begun to fail to see the point of this patch since
>> yesterday.  In fact, getting this one to pass make check-world took a bit
>> of head-scratching due to its interaction with EvalPlanQuals EState
>> building, so I was almost to drop it from the series.  But I felt that it
>> might still be a good idea to get rid of what was described as special
>> case code.  Reading your argument against it based on how it messes up
>> some of the assumptions regarding ExecRowMark design, I'm willing to let
>> go of this one as more or less a cosmetic improvement.
> 
> OK.  We do need to fix the comments in InitPlan that talk about acquiring
> locks and how that makes us need to do things in a particular order, but
> I'll go take care of that.

Thanks for doing that.

>> So, that leaves us with only the patch that revises the plan node fields
>> in light of having relieved the executor of doing any locking of its own
>> in *some* cases.  That patch's premise is that we don't need the fields
>> that the patch removes because they're only referenced for the locking
>> purposes.  But, if those plan nodes support parallel execution and hence
>> will be passed to parallel workers for execution who will need to lock the
>> tables contained in the plan nodes, then they better contain all the
>> information needed for locking *every* affected tables, so we had better
>> not removed it.
> 
> No, I think this is unduly pessimistic.  We need to make sure that a
> parallel worker has lock on tables it's actually touching, but I don't
> see why that should imply a requirement to hold lock on parent tables
> it never touches.
> 
> The reasons why we need locks on tables not physically accessed by the
> query are (a) to ensure that we've blocked, or received sinval messages
> for, any DDL related to views or partition parent tables, in case that
> would invalidate the plan; (b) to allow firing triggers safely, in
> the case of partition parent tables.  Neither of these issues apply to
> a parallel worker -- the plan is already frozen before it can ever
> start, and it isn't going to be firing any triggers either.
> 
> In particular, I think it's fine to get rid of
> ExecLockNonLeafAppendTables.  In the parent, that's clearly useless code
> now: we have already locked *every* RTE_RELATION entry in the rangetable,
> either when the parser/rewriter/planner added the RTE to the list to begin
> with, or during AcquireExecutorLocks if the plan was retrieved from the
> plancache.  In a child it seems unnecessary as long as we're locking the
> leaf rels we actually touch.
> 
> Possibly, we might fix the problem of inadequate locking in worker
> processes by having them do the equivalent of AcquireExecutorLocks, ie
> just run through the whole RT list and lock everything.  I think we'd soon
> end up doing that if we ever try to allow parallelization of non-read-only
> queries; but that's a long way off AFAIK.  For read-only queries it seems
> like it'll be fine if we just rejigger ExecGetRangeTableRelation to take a
> lock when IsParallelWorker.

Okay, thanks for the explanation.  It's now clearer to me why parallel
workers don't really need to lock non-leaf relations.

Thanks,
Amit

Re: executor relation handling

От

Amit Langote

Дата:

09 октября 2018 г., 09:26:34

On 2018/10/08 3:55, Tom Lane wrote:
> I didn't like the idea of unifying ModifyTable.nominalRelation with
> the partition root info.  Those fields serve different masters ---
> nominalRelation, at least in its original intent, is only meant for
> use of EXPLAIN and might have nothing to do with what happens at
> execution.  So even though unifying them would work today, we might
> regret it down the line.  Instead I left that field alone and added
> a separate rootRelation field to carry the partition root RT index,
> which ends up being the same number of fields anyway since we don't
> need a flag for is-the-nominal-relation-a-partition-root.

Thanks for pushing that.  I'd also named it 'rootRelation' in my original
patch before David had objected to calling it that, because a command may
not specify the "actual" root of a partition tree; it could be a non-root
partitioned table.  He'd suggested 'partitionedTarget' for the new field
[1], stressing the "target" part.  Maybe, 'rootRelation' isn't too
confusing though.

Thanks,a
Amit

[1]
https://www.postgresql.org/message-id/CAKJS1f_M0jkgL-d%3Dk-rf6TMzghATDmZ67nzja1tz4h3G%3D27e7Q%40mail.gmail.com

Re: executor relation handling

От

Amit Langote

Дата:

09 октября 2018 г., 09:26:34

On 2018/10/08 3:55, Tom Lane wrote:
> I didn't like the idea of unifying ModifyTable.nominalRelation with
> the partition root info.  Those fields serve different masters ---
> nominalRelation, at least in its original intent, is only meant for
> use of EXPLAIN and might have nothing to do with what happens at
> execution.  So even though unifying them would work today, we might
> regret it down the line.  Instead I left that field alone and added
> a separate rootRelation field to carry the partition root RT index,
> which ends up being the same number of fields anyway since we don't
> need a flag for is-the-nominal-relation-a-partition-root.

Thanks for pushing that.  I'd also named it 'rootRelation' in my original
patch before David had objected to calling it that, because a command may
not specify the "actual" root of a partition tree; it could be a non-root
partitioned table.  He'd suggested 'partitionedTarget' for the new field
[1], stressing the "target" part.  Maybe, 'rootRelation' isn't too
confusing though.

Thanks,a
Amit

[1]
https://www.postgresql.org/message-id/CAKJS1f_M0jkgL-d%3Dk-rf6TMzghATDmZ67nzja1tz4h3G%3D27e7Q%40mail.gmail.com

Re: executor relation handling

От

Amit Langote

Дата:

09 октября 2018 г., 10:17:20

On 2018/10/09 0:38, Tom Lane wrote:
> I wrote:
>> Keeping that comparison in mind, I'm inclined to think that 0001
>> is the best thing to do for now.  The incremental win from 0002
>> is not big enough to justify the API break it creates, while your
>> 0005 is not really attacking the problem the right way.
> 
> I've pushed 0001 now.  I believe that closes out all the patches
> discussed in this thread, so I've marked the CF entry committed.
> Thanks for all the hard work!

Thanks a lot for reviewing and committing.

Regards,
Amit

Re: executor relation handling

От

Amit Langote

Дата:

09 октября 2018 г., 10:17:20

On 2018/10/09 0:38, Tom Lane wrote:
> I wrote:
>> Keeping that comparison in mind, I'm inclined to think that 0001
>> is the best thing to do for now.  The incremental win from 0002
>> is not big enough to justify the API break it creates, while your
>> 0005 is not really attacking the problem the right way.
> 
> I've pushed 0001 now.  I believe that closes out all the patches
> discussed in this thread, so I've marked the CF entry committed.
> Thanks for all the hard work!

Thanks a lot for reviewing and committing.

Regards,
Amit

Re: executor relation handling

От

Amit Langote

Дата:

09 октября 2018 г., 10:31:06

On 2018/10/08 9:29, David Rowley wrote:
> On 8 October 2018 at 13:13, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The idea I had in mind was to allow hard pruning of any leaf that's
>> been excluded *at plan time* based on partition constraints seen in
>> its parent rel(s).  That should be safe enough as long as we take
>> locks on all the non-leaf rels --- then we'd know about any change
>> in the constraint situation.
>>
>> Rather than coping with renumbering the RTEs, it might be easiest
>> to invent an "RTE_DUMMY" RTE type that a hard-pruned RTE could be
>> changed to.
> 
> The problem with that is that, if we get [1] done in PG12, then the
> RTE_DUMMYs would not exist, as we'd only have RTEs in the range table
> for partitions that survived plan-time pruning.
...

> [1] https://commitfest.postgresql.org/20/1778/

Yeah, the patch proposed at [1] postpones creating partition RTEs (hence
locking them) to a point after pruning, which also means we create only
the necessary RTEs.  In fact, it's not just the RTEs, but child
PlanRowMarks, whose creation is postponed to after pruning.  So, I
admitted upthread that my proposed patch here would only add code that
will become useless if we're able to get [1] in.

Thanks,
Amit

Re: executor relation handling

От

Amit Langote

Дата:

09 октября 2018 г., 10:31:06

On 2018/10/08 9:29, David Rowley wrote:
> On 8 October 2018 at 13:13, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The idea I had in mind was to allow hard pruning of any leaf that's
>> been excluded *at plan time* based on partition constraints seen in
>> its parent rel(s).  That should be safe enough as long as we take
>> locks on all the non-leaf rels --- then we'd know about any change
>> in the constraint situation.
>>
>> Rather than coping with renumbering the RTEs, it might be easiest
>> to invent an "RTE_DUMMY" RTE type that a hard-pruned RTE could be
>> changed to.
> 
> The problem with that is that, if we get [1] done in PG12, then the
> RTE_DUMMYs would not exist, as we'd only have RTEs in the range table
> for partitions that survived plan-time pruning.
...

> [1] https://commitfest.postgresql.org/20/1778/

Yeah, the patch proposed at [1] postpones creating partition RTEs (hence
locking them) to a point after pruning, which also means we create only
the necessary RTEs.  In fact, it's not just the RTEs, but child
PlanRowMarks, whose creation is postponed to after pruning.  So, I
admitted upthread that my proposed patch here would only add code that
will become useless if we're able to get [1] in.

Thanks,
Amit

Re: executor relation handling

От

Amit Langote

Дата:

09 октября 2018 г., 11:00:51

On 2018/10/08 8:18, Tom Lane wrote:
> I wrote:
>> Still need to think a bit more about whether we want 0005 in
>> anything like its current form.
> 
> So I poked at that for a bit, and soon realized that the *main* problem
> there is that ExecFindRowMark() eats O(N^2) time due to repeated searches
> of the es_rowMarks list.

Yeah, I proposed the patch only because of stumbling across O(N^2)
behavior, but went to solve it with the planner hack, which I agree is an
inferior solution overall.

> While the patch as stated would improve that
> for cases where most of the partitions can be pruned at plan time, it
> does nothing much for cases where they can't.  However, it's pretty
> trivial to fix that: let's just use an array not a list.  Patch 0001
> attached does that.
>
> A further improvement we could consider is to avoid opening the relcache
> entries for pruned-away relations.  I could not find a way to do that
> that was less invasive than removing ExecRowMark.relation and requiring
> callers to call a new function to get the relation if they need it.
> Patch 0002 attached is a delta on top of 0001 that does that.
>
> Replicating your select-lt-for-share test case as best I can (you never
> actually specified it carefully), I find that the TPS rate on my
> workstation goes from about 250 tps with HEAD to 920 tps with patch 0001
> or 1130 tps with patch 0002.  This compares to about 1600 tps for the
> non-FOR-SHARE version of the query.
>
> However, we should keep in mind that without partitioning overhead
> (ie "select * from lt_999 where b = 999 for share"), the TPS rate
> is over 25800 tps.  Most of the overhead in the partitioned case seems
> to be from acquiring locks on rangetable entries that we won't ever
> use, and none of these patch variants are touching that problem.
> So ISTM that the *real* win for this scenario is going to come from
> teaching the system to prune unwanted relations from the query
> altogether, not just from the PlanRowMark list.

Agreed, which is why I mentioned the other patch [1], which gets us closer
to that goal.

> Keeping that comparison in mind, I'm inclined to think that 0001
> is the best thing to do for now.  The incremental win from 0002
> is not big enough to justify the API break it creates, while your
> 0005 is not really attacking the problem the right way.

I agree that 0002's improvement is only incremental and would lose its
appeal if we're able to solve the bigger problem of removing range table
and other overhead when planning with large number of partitions, but that
might take a while.

Thanks,
Amit

[1] https://commitfest.postgresql.org/20/1778/

Re: executor relation handling

От

Amit Langote

Дата:

09 октября 2018 г., 11:00:51

On 2018/10/08 8:18, Tom Lane wrote:
> I wrote:
>> Still need to think a bit more about whether we want 0005 in
>> anything like its current form.
> 
> So I poked at that for a bit, and soon realized that the *main* problem
> there is that ExecFindRowMark() eats O(N^2) time due to repeated searches
> of the es_rowMarks list.

Yeah, I proposed the patch only because of stumbling across O(N^2)
behavior, but went to solve it with the planner hack, which I agree is an
inferior solution overall.

> While the patch as stated would improve that
> for cases where most of the partitions can be pruned at plan time, it
> does nothing much for cases where they can't.  However, it's pretty
> trivial to fix that: let's just use an array not a list.  Patch 0001
> attached does that.
>
> A further improvement we could consider is to avoid opening the relcache
> entries for pruned-away relations.  I could not find a way to do that
> that was less invasive than removing ExecRowMark.relation and requiring
> callers to call a new function to get the relation if they need it.
> Patch 0002 attached is a delta on top of 0001 that does that.
>
> Replicating your select-lt-for-share test case as best I can (you never
> actually specified it carefully), I find that the TPS rate on my
> workstation goes from about 250 tps with HEAD to 920 tps with patch 0001
> or 1130 tps with patch 0002.  This compares to about 1600 tps for the
> non-FOR-SHARE version of the query.
>
> However, we should keep in mind that without partitioning overhead
> (ie "select * from lt_999 where b = 999 for share"), the TPS rate
> is over 25800 tps.  Most of the overhead in the partitioned case seems
> to be from acquiring locks on rangetable entries that we won't ever
> use, and none of these patch variants are touching that problem.
> So ISTM that the *real* win for this scenario is going to come from
> teaching the system to prune unwanted relations from the query
> altogether, not just from the PlanRowMark list.

Agreed, which is why I mentioned the other patch [1], which gets us closer
to that goal.

> Keeping that comparison in mind, I'm inclined to think that 0001
> is the best thing to do for now.  The incremental win from 0002
> is not big enough to justify the API break it creates, while your
> 0005 is not really attacking the problem the right way.

I agree that 0002's improvement is only incremental and would lose its
appeal if we're able to solve the bigger problem of removing range table
and other overhead when planning with large number of partitions, but that
might take a while.

Thanks,
Amit

[1] https://commitfest.postgresql.org/20/1778/

Re: executor relation handling

От

Tom Lane

Дата:

09 октября 2018 г., 17:07:41

Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes:
> On 2018/10/08 3:55, Tom Lane wrote:
>> I didn't like the idea of unifying ModifyTable.nominalRelation with
>> the partition root info.  Those fields serve different masters ---
>> nominalRelation, at least in its original intent, is only meant for
>> use of EXPLAIN and might have nothing to do with what happens at
>> execution.  So even though unifying them would work today, we might
>> regret it down the line.  Instead I left that field alone and added
>> a separate rootRelation field to carry the partition root RT index,
>> which ends up being the same number of fields anyway since we don't
>> need a flag for is-the-nominal-relation-a-partition-root.

> Thanks for pushing that.  I'd also named it 'rootRelation' in my original
> patch before David had objected to calling it that, because a command may
> not specify the "actual" root of a partition tree; it could be a non-root
> partitioned table.  He'd suggested 'partitionedTarget' for the new field
> [1], stressing the "target" part.  Maybe, 'rootRelation' isn't too
> confusing though.

Well, it's the root so far as the current query is concerned --- we do not
take any semantic account of partitioning levels that might exist above
the table named in the query, do we?

            regards, tom lane

Re: executor relation handling

От

Amit Langote

Дата:

09 октября 2018 г., 18:16:14

On Tue, Oct 9, 2018 at 11:07 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes:
> > On 2018/10/08 3:55, Tom Lane wrote:
> >> I didn't like the idea of unifying ModifyTable.nominalRelation with
> >> the partition root info.  Those fields serve different masters ---
> >> nominalRelation, at least in its original intent, is only meant for
> >> use of EXPLAIN and might have nothing to do with what happens at
> >> execution.  So even though unifying them would work today, we might
> >> regret it down the line.  Instead I left that field alone and added
> >> a separate rootRelation field to carry the partition root RT index,
> >> which ends up being the same number of fields anyway since we don't
> >> need a flag for is-the-nominal-relation-a-partition-root.
>
> > Thanks for pushing that.  I'd also named it 'rootRelation' in my original
> > patch before David had objected to calling it that, because a command may
> > not specify the "actual" root of a partition tree; it could be a non-root
> > partitioned table.  He'd suggested 'partitionedTarget' for the new field
> > [1], stressing the "target" part.  Maybe, 'rootRelation' isn't too
> > confusing though.
>
> Well, it's the root so far as the current query is concerned --- we do not
> take any semantic account of partitioning levels that might exist above
> the table named in the query, do we?

We don't, and I personally agree with the reasoning behind calling it
rootRelation.

Thanks,
Amit

Re: executor relation handling

От

Robert Haas

Дата:

09 октября 2018 г., 21:27:44

On Sat, Oct 6, 2018 at 2:59 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The reasons why we need locks on tables not physically accessed by the
> query are (a) to ensure that we've blocked, or received sinval messages
> for, any DDL related to views or partition parent tables, in case that
> would invalidate the plan; (b) to allow firing triggers safely, in
> the case of partition parent tables.  Neither of these issues apply to
> a parallel worker -- the plan is already frozen before it can ever
> start, and it isn't going to be firing any triggers either.

That last part could *easily* change in a future release.  We've
already started to allow CTAS with parallel query, and there have
already been multiple people wanting to allow more.  It would be a
shame if we threw up additional obstacles in the way of that...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: executor relation handling

От

Tom Lane

Дата:

09 октября 2018 г., 21:35:02

Robert Haas <robertmhaas@gmail.com> writes:
> On Sat, Oct 6, 2018 at 2:59 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The reasons why we need locks on tables not physically accessed by the
>> query are (a) to ensure that we've blocked, or received sinval messages
>> for, any DDL related to views or partition parent tables, in case that
>> would invalidate the plan; (b) to allow firing triggers safely, in
>> the case of partition parent tables.  Neither of these issues apply to
>> a parallel worker -- the plan is already frozen before it can ever
>> start, and it isn't going to be firing any triggers either.

> That last part could *easily* change in a future release.  We've
> already started to allow CTAS with parallel query, and there have
> already been multiple people wanting to allow more.  It would be a
> shame if we threw up additional obstacles in the way of that...

I hardly think that this is the most serious issue in the way of
doing non-read-only things in parallel workers.

In any case, a parallel worker would surely have to open any
relations it is going to fire triggers for.  If it gets the correct
lock when it does that, all is well.  If not, the Assert in
relation_open will complain.

            regards, tom lane

Re: executor relation handling

От

Robert Haas

Дата:

11 октября 2018 г., 06:06:39

On Tue, Oct 9, 2018 at 2:35 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > That last part could *easily* change in a future release.  We've
> > already started to allow CTAS with parallel query, and there have
> > already been multiple people wanting to allow more.  It would be a
> > shame if we threw up additional obstacles in the way of that...
>
> I hardly think that this is the most serious issue in the way of
> doing non-read-only things in parallel workers.

My concern, as I said, is about adding new obstacles.

> In any case, a parallel worker would surely have to open any
> relations it is going to fire triggers for.  If it gets the correct
> lock when it does that, all is well.  If not, the Assert in
> relation_open will complain.

Well, in that case, no issues.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: executor relation handling

От

Julien Rouhaud

Дата:

10 февраля 2019 г., 02:24:24

Hi,

On Sun, Sep 30, 2018 at 7:18 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> I think that the call sites should ultimately look like
>
>         Assert(CheckRelationLockedByMe(...));
>
> but for hunting down the places where the assertion currently fails,
> it's more convenient if it's just an elog(WARNING).

I just hit one of the asserts (in relation_open()) added in
b04aeb0a053.  Here's a simple reproducer:

DROP TABLE IF EXISTS test;
CREATE TABLE test (id integer primary key);
PREPARE s AS DELETE FROM test WHERE id = 1;
EXECUTE s;
EXECUTE s;

This comes from ExecInitIndexScan() and ExecInitBitmapIndexScan(),
which open the index without lock if the parent table is a target
relation:

    /*
     * Open the index relation.
     *
     * If the parent table is one of the target relations of the query, then
     * InitPlan already opened and write-locked the index, so we can avoid
     * taking another lock here.  Otherwise we need a normal reader's lock.
     */
    relistarget = ExecRelationIsTargetRelation(estate, node->scan.scanrelid);
    indexstate->iss_RelationDesc = index_open(node->indexid,
                                              relistarget ? NoLock :
AccessShareLock);

And digging into InitPlan() up to ExecInitModifyTable():

        /*
         * If there are indices on the result relation, open them and save
         * descriptors in the result relation info, so that we can add new
         * index entries for the tuples we add/update.  We need not do this
         * for a DELETE, however, since deletion doesn't affect indexes. Also,
         * inside an EvalPlanQual operation, the indexes might be open
         * already, since we share the resultrel state with the original
         * query.
         */
        if (resultRelInfo->ri_RelationDesc->rd_rel->relhasindex &&
            operation != CMD_DELETE &&
            resultRelInfo->ri_IndexRelationDescs == NULL)
            ExecOpenIndices(resultRelInfo,
                            node->onConflictAction != ONCONFLICT_NONE);

So, this is problematic with a cached plan on a DELETE query since the
lock on the target relation's index will never be acquired (the lock
is acquired on the first execution in get_relation_info()).  This
doesn't seem unsafe though, since DROP INDEX [CONCURRENTLY] will still
acquire lock on the index relation or query xid before dropping the
index.

I'm not sure of what's the best way to fix this problem.  I wanted to
modify ExecInitIndexScan() and ExecInitBitmapIndexScan() to acquire
the lock for a DELETE on the target relation, however I don't think
that we have that information at this point.  Maybe just
unconditionally acquire an AccessShareLock on the index in those
functions?

Re: executor relation handling

От

Tom Lane

Дата:

10 февраля 2019 г., 02:31:49

Julien Rouhaud <rjuju123@gmail.com> writes:
> I just hit one of the asserts (in relation_open()) added in
> b04aeb0a053.  Here's a simple reproducer:

Yeah, I think this is the same issue being discussed in

https://www.postgresql.org/message-id/flat/19465.1541636036%40sss.pgh.pa.us

I imagine the patch David recently posted in that thread will fix this,
but can you check, and/or review the patch?

            regards, tom lane

Re: executor relation handling

От

Julien Rouhaud

Дата:

10 февраля 2019 г., 03:00:58

On Sun, Feb 10, 2019 at 12:31 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Julien Rouhaud <rjuju123@gmail.com> writes:
> > I just hit one of the asserts (in relation_open()) added in
> > b04aeb0a053.  Here's a simple reproducer:
>
> Yeah, I think this is the same issue being discussed in
>
> https://www.postgresql.org/message-id/flat/19465.1541636036%40sss.pgh.pa.us

Ah indeed, sorry I totally missed that thread.

>> I imagine the patch David recently posted in that thread will fix this,
> but can you check, and/or review the patch?

I tried quickly and it does fix my test case.  It's quite late here,
so I'll review the patch tomorrow.
Thanks.

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: executor relation handling

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения