Обсуждение: Inadequate executor locking of indexes

Поиск
Список
Период
Сортировка

Inadequate executor locking of indexes

От
Tom Lane
Дата:
I discovered that it's possible to trigger relation_open's new assertion
about having a lock on the relation by the simple expedient of running
the core regression tests with plan_cache_mode = force_generic_plan.
(This doesn't give me a terribly warm feeling about how thoroughly we've
exercised the relevant code, but I don't know what to do about that.)

The reason for the crash is that nodeIndexscan.c and friends all open
their index-to-scan with code like

    /*
     * Open the index relation.
     *
     * If the parent table is one of the target relations of the query, then
     * InitPlan already opened and write-locked the index, so we can avoid
     * taking another lock here.  Otherwise we need a normal reader's lock.
     */
    relistarget = ExecRelationIsTargetRelation(estate, node->scan.scanrelid);
    indexstate->iss_RelationDesc = index_open(node->indexid,
                                              relistarget ? NoLock : AccessShareLock);

Now, at the time this code was written, InitPlan did indeed ensure that
indexes of a plan's result relation were already opened and locked,
because it calls InitResultRelInfo which used to call ExecOpenIndices.
Nowadays, the ExecOpenIndices part is postponed to ExecInitModifyTable,
which figures it can optimize matters:

        /*
         * If there are indices on the result relation, open them and save
         * descriptors in the result relation info, so that we can add new
         * index entries for the tuples we add/update.  We need not do this
         * for a DELETE, however, since deletion doesn't affect indexes. Also,
         * inside an EvalPlanQual operation, the indexes might be open
         * already, since we share the resultrel state with the original
         * query.
         */
        if (resultRelInfo->ri_RelationDesc->rd_rel->relhasindex &&
            operation != CMD_DELETE &&
            resultRelInfo->ri_IndexRelationDescs == NULL)
            ExecOpenIndices(resultRelInfo,
                            node->onConflictAction != ONCONFLICT_NONE);

Therefore, in a plan consisting of a DELETE ModifyTable atop an indexscan
of the target table, we are opening the index without the executor
acquiring any lock on the index.  The only thing that saves us from
triggering that Assert is that in most cases the planner has already
taken a lock on any index it considered (and not released that lock).
But using a prepared plan breaks that.

This problem is ancient; it's unrelated to the recent changes to reduce
executor locking, because that only concerned table locks not index locks.
I'm not certain how much real-world impact it has, since holding a lock
on the index's parent table is probably enough to prevent dire things
from happening in most cases.  Still, it seems like a bug.

There are several things we might do to fix this:

1. Drop the "operation != CMD_DELETE" condition from the above-quoted bit
in ExecInitModifyTable.  We might be forced into that someday anyway if
we want to support non-heap-style tables, since most other peoples'
versions of indexes do want to know about deletions.

2. Drop the target-table optimization from nodeIndexscan.c and friends,
and just always open the scan index with AccessShareLock.  (BTW, it's
a bit inconsistent that these nodes don't release their index locks
at the end; ExecCloseIndices does.)

3. Teach nodeIndexscan.c and friends about the operation != CMD_DELETE
exception, so that they get the lock for themselves in that case.  This
seems pretty ugly/fragile, but it's about the only option that won't end
in adding index lock-acquisition overhead in some cases.  (But given the
planner's behavior, it's not clear to me how often that really matters.)

BTW, another thing that is bothering me about this is that, both with
the current code and options #1 and #3, there's an assumption that any
indexscan on a query's target table must be underneath the relevant
ModifyTable plan node.  Given some other plan tree structure it might
be possible to initialize the indexscan node before the ModifyTable,
breaking the assumption that the latter already got index locks.  I'm
not sure if that's actually possible at present, but it doesn't seem
like I'd want to bet on it always being so in the future.  This might
be an argument for going with option #2, which eliminates all such
assumptions.

Another idea is to make things work more like we just made them work for
tables, which is to ensure that all index locks are taken before reaching
the executor, or at least as part of some other processing than the plan
tree walk.  That would be a good deal more work than any of the options
listed above, though, and it would definitely not be back-patchable if we
decide we need a back-patched fix.

Thoughts?

            regards, tom lane


Re: Inadequate executor locking of indexes

От
David Rowley
Дата:
On Thu, 8 Nov 2018 at 13:14, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> I discovered that it's possible to trigger relation_open's new assertion
> about having a lock on the relation by the simple expedient of running
> the core regression tests with plan_cache_mode = force_generic_plan.


> There are several things we might do to fix this:
>
> 1. Drop the "operation != CMD_DELETE" condition from the above-quoted bit
> in ExecInitModifyTable.  We might be forced into that someday anyway if
> we want to support non-heap-style tables, since most other peoples'
> versions of indexes do want to know about deletions.
>
> 2. Drop the target-table optimization from nodeIndexscan.c and friends,
> and just always open the scan index with AccessShareLock.  (BTW, it's
> a bit inconsistent that these nodes don't release their index locks
> at the end; ExecCloseIndices does.)
>
> 3. Teach nodeIndexscan.c and friends about the operation != CMD_DELETE
> exception, so that they get the lock for themselves in that case.  This
> seems pretty ugly/fragile, but it's about the only option that won't end
> in adding index lock-acquisition overhead in some cases.  (But given the
> planner's behavior, it's not clear to me how often that really matters.)

Since I missed this and only discovered this was a problem when
someone else reported it today, and since I already did my own
analysis separately in [1], then my vote is for #2.  For partitioned
table updates, there may be many result relations which can cause
ExecRelationIsTargetRelation() to become very slow, in such a case any
additional redundant lock would be cheap by comparison.

Ideally, the locking code would realise we already hold a stronger
lock and skip the lock, but I don't see how that's realistically
possible without probing the hash table for all stronger lock types
first, which would likely damage the performance of locking.

[1] https://www.postgresql.org/message-id/CAKJS1f-DyKTYyMf9oxn1PQ%3DWyEOOjfVcV-dCc6eB9eat_MYPeA%40mail.gmail.com

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Inadequate executor locking of indexes

От
Tom Lane
Дата:
David Rowley <david.rowley@2ndquadrant.com> writes:
> On Thu, 8 Nov 2018 at 13:14, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> There are several things we might do to fix this:
>>
>> 1. Drop the "operation != CMD_DELETE" condition from the above-quoted bit
>> in ExecInitModifyTable.  We might be forced into that someday anyway if
>> we want to support non-heap-style tables, since most other peoples'
>> versions of indexes do want to know about deletions.
>>
>> 2. Drop the target-table optimization from nodeIndexscan.c and friends,
>> and just always open the scan index with AccessShareLock.  (BTW, it's
>> a bit inconsistent that these nodes don't release their index locks
>> at the end; ExecCloseIndices does.)
>>
>> 3. Teach nodeIndexscan.c and friends about the operation != CMD_DELETE
>> exception, so that they get the lock for themselves in that case.  This
>> seems pretty ugly/fragile, but it's about the only option that won't end
>> in adding index lock-acquisition overhead in some cases.  (But given the
>> planner's behavior, it's not clear to me how often that really matters.)

> Since I missed this and only discovered this was a problem when
> someone else reported it today, and since I already did my own
> analysis separately in [1], then my vote is for #2.

Thinking more about this, the problem I noted previously about two of
these solutions not working if the index scan node is not physically
underneath the ModifyTable node actually applies to all three :-(.
It's a slightly different issue for #2, namely that what we risk is
first taking AccessShareLock and then upgrading to RowExclusiveLock.
Since there are places (not many) that take ShareLock on indexes,
this would pose a deadlock risk.

Now, after more thought, I believe that it's probably impossible
to have a plan tree in which ExecRelationIsTargetRelation would
return true at an indexscan node that's not underneath the ModifyTable
node.  What *is* possible is that we have such an indexscan on a
different RTE for the same table.  I constructed this admittedly
artificial example in the regression database:

# explain with x1 as (select * from tenk1 t1 where unique1 = 42),
x2 as (update tenk1 t2 set two = 2 where unique2 = 11 returning *)
select * from x1,x2 where x1.ten = x2.ten;
                                          QUERY PLAN
----------------------------------------------------------------------------------------------
 Nested Loop  (cost=16.61..16.66 rows=1 width=488)
   Join Filter: (x1.ten = x2.ten)
   CTE x1
     ->  Index Scan using tenk1_unique1 on tenk1 t1  (cost=0.29..8.30 rows=1 width=244)
           Index Cond: (unique1 = 42)
   CTE x2
     ->  Update on tenk1 t2  (cost=0.29..8.30 rows=1 width=250)
           ->  Index Scan using tenk1_unique2 on tenk1 t2  (cost=0.29..8.30 rows=1 width=250)
                 Index Cond: (unique2 = 11)
   ->  CTE Scan on x1  (cost=0.00..0.02 rows=1 width=244)
   ->  CTE Scan on x2  (cost=0.00..0.02 rows=1 width=244)
(11 rows)

Because the CTEs will be initialized in order, this presents a case
where the lock-upgrade hazard exists today: the x1 indexscan will
open tenk1_unique1 with AccessShareLock and then x2's ModifyTable
node will upgrade that to RowExclusiveLock.  None of the proposed
fixes improve this.

I'm beginning to think that postponing target-index locking to
ExecInitModifyTable was a damfool idea and we should undo it.

> For partitioned
> table updates, there may be many result relations which can cause
> ExecRelationIsTargetRelation() to become very slow, in such a case any
> additional redundant lock would be cheap by comparison.

Yeah, it'd be nice to get rid of the need for that.

            regards, tom lane


Re: Inadequate executor locking of indexes

От
David Rowley
Дата:
On Sat, 24 Nov 2018 at 05:25, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Now, after more thought, I believe that it's probably impossible
> to have a plan tree in which ExecRelationIsTargetRelation would
> return true at an indexscan node that's not underneath the ModifyTable
> node.  What *is* possible is that we have such an indexscan on a
> different RTE for the same table.  I constructed this admittedly
> artificial example in the regression database:
>
> # explain with x1 as (select * from tenk1 t1 where unique1 = 42),
> x2 as (update tenk1 t2 set two = 2 where unique2 = 11 returning *)
> select * from x1,x2 where x1.ten = x2.ten;

Well, that problem exists with more than just indexes. Relations could
be problematic too. An even more simple artificial example would be:

select * from t1 inner join t1 t2 on t1.a=t2.a for update of t2;

We could fix that in the executor by processing the rangetable in the
planner, first throwing the whole thing into a hash table by Oid and
finding the strongest lock level and applying that level to each
(non-dummy) matching RangeTblEntry's rellockmode.  While we're there
we could add a new field for indexlockmode and do the same process on
that.   However... there might not be much point as this does nothing
for the same problem that exists in parse analyze.  That may be much
harder or even impossible to fix.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Inadequate executor locking of indexes

От
Amit Langote
Дата:
Sorry for jumping in late on this.

On 2018/11/24 1:25, Tom Lane wrote:
> David Rowley <david.rowley@2ndquadrant.com> writes:
> Thinking more about this, the problem I noted previously about two of
> these solutions not working if the index scan node is not physically
> underneath the ModifyTable node actually applies to all three :-(.
> It's a slightly different issue for #2, namely that what we risk is
> first taking AccessShareLock and then upgrading to RowExclusiveLock.
> Since there are places (not many) that take ShareLock on indexes,
> this would pose a deadlock risk.
> 
> Now, after more thought, I believe that it's probably impossible
> to have a plan tree in which ExecRelationIsTargetRelation would
> return true at an indexscan node that's not underneath the ModifyTable
> node.  What *is* possible is that we have such an indexscan on a
> different RTE for the same table.  I constructed this admittedly
> artificial example in the regression database:
> 
> # explain with x1 as (select * from tenk1 t1 where unique1 = 42),
> x2 as (update tenk1 t2 set two = 2 where unique2 = 11 returning *)
> select * from x1,x2 where x1.ten = x2.ten;
>                                           QUERY PLAN                                          
> ----------------------------------------------------------------------------------------------
>  Nested Loop  (cost=16.61..16.66 rows=1 width=488)
>    Join Filter: (x1.ten = x2.ten)
>    CTE x1
>      ->  Index Scan using tenk1_unique1 on tenk1 t1  (cost=0.29..8.30 rows=1 width=244)
>            Index Cond: (unique1 = 42)
>    CTE x2
>      ->  Update on tenk1 t2  (cost=0.29..8.30 rows=1 width=250)
>            ->  Index Scan using tenk1_unique2 on tenk1 t2  (cost=0.29..8.30 rows=1 width=250)
>                  Index Cond: (unique2 = 11)
>    ->  CTE Scan on x1  (cost=0.00..0.02 rows=1 width=244)
>    ->  CTE Scan on x2  (cost=0.00..0.02 rows=1 width=244)
> (11 rows)
> 
> Because the CTEs will be initialized in order, this presents a case
> where the lock-upgrade hazard exists today: the x1 indexscan will
> open tenk1_unique1 with AccessShareLock and then x2's ModifyTable
> node will upgrade that to RowExclusiveLock.  None of the proposed
> fixes improve this.

Provided we want to keep ExecRelationIsTargetRelation going forward, how
about modifying it such that we compare the scan relation's OID with that
of the result relations, not the RT index?  Like in the attached.

> I'm beginning to think that postponing target-index locking to
> ExecInitModifyTable was a damfool idea and we should undo it.

+1

Also as already proposed, InitPlan should lock result relation indexes
even for DELETEs.

>> For partitioned
>> table updates, there may be many result relations which can cause
>> ExecRelationIsTargetRelation() to become very slow, in such a case any
>> additional redundant lock would be cheap by comparison.
> 
> Yeah, it'd be nice to get rid of the need for that.

As David mentioned elsewhere, can we add the ResultRelInfos to a hash
table if there are at least a certain number of result relations?
3f2393edefa did that for UPDATE tuple routing efficiency.

Thanks,
Amit

Вложения

Re: Inadequate executor locking of indexes

От
David Rowley
Дата:
On Mon, 26 Nov 2018 at 17:37, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
> On 2018/11/24 1:25, Tom Lane wrote:
> > I'm beginning to think that postponing target-index locking to
> > ExecInitModifyTable was a damfool idea and we should undo it.
>
> +1
>
> Also as already proposed, InitPlan should lock result relation indexes
> even for DELETEs.

I'd rather see it fixed another way.  The reason being, if we get [1],
then that opens the door to run-time partition pruning for
UPDATE/DELETE, which is currently blocked due to lack of knowledge
about baserestrictinfos for the base partitioned relation because
inheritance_planner() does not plan for the partitioned table, only
its partitions.  Doing the index opening work during InitPlan() means
we do that for all partitions, even the ones that will later be
run-time pruned. If we can doing it during init of a ModifyTable node,
then we can likely do it after the run-time pruning takes place.
Since Amit and I are both working towards making partitioning faster,
I imagined he would also not want to do something that could slow it
down significantly, if there was some alternative way to fix it, at
least.

I'm making efforts to delay per-partition work even further in the
executor, for example locking of the per-partition result relations
until after run-time pruning would be a significant win for
partitioned tables with many partitions when generic plans are in use.
Moving things back to InitPlan() flies in the face of that work.

[1] https://commitfest.postgresql.org/20/1778/

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Inadequate executor locking of indexes

От
Amit Langote
Дата:
On 2018/11/26 14:25, David Rowley wrote:
> On Mon, 26 Nov 2018 at 17:37, Amit Langote
> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>> On 2018/11/24 1:25, Tom Lane wrote:
>>> I'm beginning to think that postponing target-index locking to
>>> ExecInitModifyTable was a damfool idea and we should undo it.
>>
>> +1
>>
>> Also as already proposed, InitPlan should lock result relation indexes
>> even for DELETEs.
> 
> I'd rather see it fixed another way.  The reason being, if we get [1],
> then that opens the door to run-time partition pruning for
> UPDATE/DELETE, which is currently blocked due to lack of knowledge
> about baserestrictinfos for the base partitioned relation because
> inheritance_planner() does not plan for the partitioned table, only
> its partitions.  Doing the index opening work during InitPlan() means
> we do that for all partitions, even the ones that will later be
> run-time pruned. If we can doing it during init of a ModifyTable node,
> then we can likely do it after the run-time pruning takes place.
> Since Amit and I are both working towards making partitioning faster,
> I imagined he would also not want to do something that could slow it
> down significantly, if there was some alternative way to fix it, at
> least.
> 
> I'm making efforts to delay per-partition work even further in the
> executor, for example locking of the per-partition result relations
> until after run-time pruning would be a significant win for
> partitioned tables with many partitions when generic plans are in use.
> Moving things back to InitPlan() flies in the face of that work.
> 
> [1] https://commitfest.postgresql.org/20/1778/

That's an interesting point.  Although, considering the concerns that Tom
raised about the same index relation being locked such that lock-strength
upgrade occurs (his example containing two CTEs), we'll have to find a way
to do the ModifyTable run-time pruning such that result relations and
their indexes (possibly under multiple ModifyTable nodes) are locked with
RowExclusiveLock before they're locked with AccessShareLock lock as scan
relations.  For example, we might be able to find a way to do the
ModifyTable run-time pruning in InitPlan before initializing plan trees.

That said, I don't quite understand how the lock-strength upgrade
occurring in the way being discussed here (AccessShareLock for scanning to
RowExclusiveLock for modifying) leads to deadlock hazard?

Thanks,
Amit



Re: Inadequate executor locking of indexes

От
David Rowley
Дата:
On Mon, 26 Nov 2018 at 18:57, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
> On 2018/11/26 14:25, David Rowley wrote:
> > I'm making efforts to delay per-partition work even further in the
> > executor, for example locking of the per-partition result relations
> > until after run-time pruning would be a significant win for
> > partitioned tables with many partitions when generic plans are in use.
> > Moving things back to InitPlan() flies in the face of that work.
> >
> > [1] https://commitfest.postgresql.org/20/1778/
>
> That's an interesting point.  Although, considering the concerns that Tom
> raised about the same index relation being locked such that lock-strength
> upgrade occurs (his example containing two CTEs), we'll have to find a way
> to do the ModifyTable run-time pruning such that result relations and
> their indexes (possibly under multiple ModifyTable nodes) are locked with
> RowExclusiveLock before they're locked with AccessShareLock lock as scan
> relations.  For example, we might be able to find a way to do the
> ModifyTable run-time pruning in InitPlan before initializing plan trees.

I thought my idea of the processing the rangetable at the end of
planning to determine the strongest lock per relation would have
solved that.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Inadequate executor locking of indexes

От
Amit Langote
Дата:
On 2018/11/27 6:19, David Rowley wrote:
> On Mon, 26 Nov 2018 at 18:57, Amit Langote
> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>> On 2018/11/26 14:25, David Rowley wrote:
>>> I'm making efforts to delay per-partition work even further in the
>>> executor, for example locking of the per-partition result relations
>>> until after run-time pruning would be a significant win for
>>> partitioned tables with many partitions when generic plans are in use.
>>> Moving things back to InitPlan() flies in the face of that work.
>>>
>>> [1] https://commitfest.postgresql.org/20/1778/
>>
>> That's an interesting point.  Although, considering the concerns that Tom
>> raised about the same index relation being locked such that lock-strength
>> upgrade occurs (his example containing two CTEs), we'll have to find a way
>> to do the ModifyTable run-time pruning such that result relations and
>> their indexes (possibly under multiple ModifyTable nodes) are locked with
>> RowExclusiveLock before they're locked with AccessShareLock lock as scan
>> relations.  For example, we might be able to find a way to do the
>> ModifyTable run-time pruning in InitPlan before initializing plan trees.
> 
> I thought my idea of the processing the rangetable at the end of
> planning to determine the strongest lock per relation would have
> solved that.

Yeah, would be nice to make that work.

Thanks,
Amit



Re: Inadequate executor locking of indexes

От
David Rowley
Дата:
On Tue, 27 Nov 2018 at 19:00, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
> On 2018/11/27 6:19, David Rowley wrote:
> > On Mon, 26 Nov 2018 at 18:57, Amit Langote
> >> That's an interesting point.  Although, considering the concerns that Tom
> >> raised about the same index relation being locked such that lock-strength
> >> upgrade occurs (his example containing two CTEs), we'll have to find a way
> >> to do the ModifyTable run-time pruning such that result relations and
> >> their indexes (possibly under multiple ModifyTable nodes) are locked with
> >> RowExclusiveLock before they're locked with AccessShareLock lock as scan
> >> relations.  For example, we might be able to find a way to do the
> >> ModifyTable run-time pruning in InitPlan before initializing plan trees.
> >
> > I thought my idea of the processing the rangetable at the end of
> > planning to determine the strongest lock per relation would have
> > solved that.
>
> Yeah, would be nice to make that work.

Here's a very rough and incomplete patch just to demo what I had in
mind. The finalize_lockmodes() is likely more or less complete, just
minus me testing it works.  What's mostly missing is changing all the
places that grab index locks to use the new idxlockmode field. I've
really just changed index scan and index only scan and just stared a
bit at nodeModifyTable.c wondering what I should do with that
operation != CMD_DELETE test before the ExecOpenIndices() call.

The patch also includes code to determine the strongest rellockmode
per relation.  Perhaps it's not really worth doing that since parse
analyze could still cause some lock upgrade hazards. The code that's
there just fixes things for the executor, so really would only have an
effect when executing cached plans.

If this looks like a good path to go in, then I can produce something
a bit more finished. I'm just a bit unsure when exactly I can do that
as I'm on leave and have other commitments to take care of.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: Inadequate executor locking of indexes

От
Amit Kapila
Дата:
On Fri, Nov 23, 2018 at 9:55 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Thinking more about this, the problem I noted previously about two of
> these solutions not working if the index scan node is not physically
> underneath the ModifyTable node actually applies to all three :-(.
> It's a slightly different issue for #2, namely that what we risk is
> first taking AccessShareLock and then upgrading to RowExclusiveLock.
> Since there are places (not many) that take ShareLock on indexes,
> this would pose a deadlock risk.
>

Can you be a bit more specific on what exact deadlock risk you are
seeing here as Amit L asked about it and I am also curious to know?
One way I could see is:

Session-1
begin;
Lock table foo in Access Share Mode;

Session-2
begin;
Lock table foo in Share Mode;

Session-1
Lock table foo in Row Exclusive Mode;  --here it will wait for session-2

Session-2
Lock table foo in Access Exclusive Mode;  --here it will lead to deadlock


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: Inadequate executor locking of indexes

От
David Rowley
Дата:
On Wed, 28 Nov 2018 at 01:55, David Rowley <david.rowley@2ndquadrant.com> wrote:
> If this looks like a good path to go in, then I can produce something
> a bit more finished. I'm just a bit unsure when exactly I can do that
> as I'm on leave and have other commitments to take care of.

This patch is still on my list, so I had another look at what I did
back in November...

I've changed a couple of things:

1. Changed nodeBitmapIndexscan.c now properly uses the RangeTblEntry's
idxlockmode field.
2. Renamed a few variables in finalize_lockmodes().

I'm keen to get some feedback if we should go about fixing things this
way.  One thing that's still on my mind is that the parser is still at
risk of lock upgrade hazards. This patch only fixes the executor.  I
don't quite see how it would be possible to fix the same in the
parser.

I was also looking at each call site that calls ExecOpenIndices(). I
don't think it's great that ExecInitModifyTable() has its own logic to
skip calling that function for DELETE.  I wondered if it shouldn't
somehow depend on what the idxlockmode is set to.  I also saw that
apply_handle_delete() makes a call to ExecOpenIndices(). I don't think
that one is needed, but I didn't test anything to make sure. Maybe
that's for another thread anyway.

Updated patch is attached.

Adding to the March commitfest as a bug fix.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: Inadequate executor locking of indexes

От
Julien Rouhaud
Дата:
Hi,

On Tue, Feb 5, 2019 at 5:16 AM David Rowley
<david.rowley@2ndquadrant.com> wrote:
>
> I've changed a couple of things:
>
> 1. Changed nodeBitmapIndexscan.c now properly uses the RangeTblEntry's
> idxlockmode field.
> 2. Renamed a few variables in finalize_lockmodes().
>
> I'm keen to get some feedback if we should go about fixing things this
> way.  One thing that's still on my mind is that the parser is still at
> risk of lock upgrade hazards. This patch only fixes the executor.  I
> don't quite see how it would be possible to fix the same in the
> parser.

+        /*
+         * If there are multiple instances of the same rel with varying lock
+         * strengths then set the strongest lock level to each instance of
+         * that relation.
+         */
+        if (applystrongest)
[...]

The patch is quite straightforward, so I don't have general comments
on it.  However, I think that the idxlockmode initialization is
problematic: you're using the statement's commandType so this doesn't
work with CTE.  For instance, with this artificial query

WITH s AS (UPDATE test set id = 1 WHERE id =1) select 1;

will take an AccessShareLock on test's index while it should have an
RowExclusiveLock.  I guess that you have to code similar lock upgrade
logic for the indexes, inspecting the planTree and subplans to find
the correct command type.

>
> I was also looking at each call site that calls ExecOpenIndices(). I
> don't think it's great that ExecInitModifyTable() has its own logic to
> skip calling that function for DELETE.  I wondered if it shouldn't
> somehow depend on what the idxlockmode is set to.

I don't think that it's great either.  However for DELETE we shouldn't
simply call ExecOpenIndices(), but open only the used indexes right?


Re: Inadequate executor locking of indexes

От
David Rowley
Дата:
On Mon, 11 Feb 2019 at 01:22, Julien Rouhaud <rjuju123@gmail.com> wrote:
> The patch is quite straightforward, so I don't have general comments
> on it.  However, I think that the idxlockmode initialization is
> problematic: you're using the statement's commandType so this doesn't
> work with CTE.  For instance, with this artificial query
>
> WITH s AS (UPDATE test set id = 1 WHERE id =1) select 1;
>
> will take an AccessShareLock on test's index while it should have an
> RowExclusiveLock.  I guess that you have to code similar lock upgrade
> logic for the indexes, inspecting the planTree and subplans to find
> the correct command type.

Good catch. I'm a bit stuck on the best way to fix this.  So far I can
only think of, either;

1. Adding a new field to RangeTblEntry to indicate the operation type
that's being performed on the relation; or
2. Adding a Bitmapset field to PlannerGlobal that sets the rtable
indexes of RangeTblEntry items that belong to DELETEs and ignore these
when setting resultRelids in finalize_lockmodes().

For #2, the only place I can see to do this is
add_rtes_to_flat_rtable(), which would require either passing the
PlannerInfo into the function, or at least its parse's commandType.

I don't really like either, but don't have any other ideas at the moment.

> > I was also looking at each call site that calls ExecOpenIndices(). I
> > don't think it's great that ExecInitModifyTable() has its own logic to
> > skip calling that function for DELETE.  I wondered if it shouldn't
> > somehow depend on what the idxlockmode is set to.
>
> I don't think that it's great either.  However for DELETE we shouldn't
> simply call ExecOpenIndices(), but open only the used indexes right?

No, I don't think so. The "used" index(es) will be opened in the scan
node(s).   The reason I didn't like it much is that I wanted to keep
the logic for deciding what lock level to use in the planner.  The
executor seems to have more knowledge than I think maybe it should.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Inadequate executor locking of indexes

От
Julien Rouhaud
Дата:
On Mon, Feb 11, 2019 at 5:32 AM David Rowley
<david.rowley@2ndquadrant.com> wrote:
>
> On Mon, 11 Feb 2019 at 01:22, Julien Rouhaud <rjuju123@gmail.com> wrote:
> > The patch is quite straightforward, so I don't have general comments
> > on it.  However, I think that the idxlockmode initialization is
> > problematic: you're using the statement's commandType so this doesn't
> > work with CTE.  For instance, with this artificial query
> >
> > WITH s AS (UPDATE test set id = 1 WHERE id =1) select 1;
> >
> > will take an AccessShareLock on test's index while it should have an
> > RowExclusiveLock.  I guess that you have to code similar lock upgrade
> > logic for the indexes, inspecting the planTree and subplans to find
> > the correct command type.
>
> Good catch. I'm a bit stuck on the best way to fix this.  So far I can
> only think of, either;
>
> 1. Adding a new field to RangeTblEntry to indicate the operation type
> that's being performed on the relation; or
> 2. Adding a Bitmapset field to PlannerGlobal that sets the rtable
> indexes of RangeTblEntry items that belong to DELETEs and ignore these
> when setting resultRelids in finalize_lockmodes().
>
> For #2, the only place I can see to do this is
> add_rtes_to_flat_rtable(), which would require either passing the
> PlannerInfo into the function, or at least its parse's commandType.
>
> I don't really like either, but don't have any other ideas at the moment.

But we would still need the same lock level upgrade logic on indexes
for cases like CTE with a mix of INSERT, UPDATE and DELETE on the same
relation I think.  #1 seems like a better solution to me.

> > > I was also looking at each call site that calls ExecOpenIndices(). I
> > > don't think it's great that ExecInitModifyTable() has its own logic to
> > > skip calling that function for DELETE.  I wondered if it shouldn't
> > > somehow depend on what the idxlockmode is set to.
> >
> > I don't think that it's great either.  However for DELETE we shouldn't
> > simply call ExecOpenIndices(), but open only the used indexes right?
>
> No, I don't think so. The "used" index(es) will be opened in the scan
> node(s).   The reason I didn't like it much is that I wanted to keep
> the logic for deciding what lock level to use in the planner.  The
> executor seems to have more knowledge than I think maybe it should.

Ah, I see.  Thanks.


Re: Inadequate executor locking of indexes

От
David Rowley
Дата:
On Tue, 12 Feb 2019 at 09:58, Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Mon, Feb 11, 2019 at 5:32 AM David Rowley
> <david.rowley@2ndquadrant.com> wrote:
> > 1. Adding a new field to RangeTblEntry to indicate the operation type
> > that's being performed on the relation; or
> > 2. Adding a Bitmapset field to PlannerGlobal that sets the rtable
> > indexes of RangeTblEntry items that belong to DELETEs and ignore these
> > when setting resultRelids in finalize_lockmodes().
> >
> > For #2, the only place I can see to do this is
> > add_rtes_to_flat_rtable(), which would require either passing the
> > PlannerInfo into the function, or at least its parse's commandType.
> >
> > I don't really like either, but don't have any other ideas at the moment.
>
> But we would still need the same lock level upgrade logic on indexes
> for cases like CTE with a mix of INSERT, UPDATE and DELETE on the same
> relation I think.  #1 seems like a better solution to me.

I think I'd rather find some way to do it that didn't denormalise the
parse nodes like that.  It seems very strange to have a CmdType in the
Query struct, and then another set of them in RangeTblEntry. Besides
bloating the size of the RangeTblEntry struct a bit, it also could
lead to inconsistency bugs where the two CmdTypes differ, for some
reason.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Inadequate executor locking of indexes

От
David Rowley
Дата:
On Tue, 19 Feb 2019 at 12:13, David Rowley <david.rowley@2ndquadrant.com> wrote:
>
> On Tue, 12 Feb 2019 at 09:58, Julien Rouhaud <rjuju123@gmail.com> wrote:
> >
> > On Mon, Feb 11, 2019 at 5:32 AM David Rowley
> > <david.rowley@2ndquadrant.com> wrote:
> > > 1. Adding a new field to RangeTblEntry to indicate the operation type
> > > that's being performed on the relation; or
> > > 2. Adding a Bitmapset field to PlannerGlobal that sets the rtable
> > > indexes of RangeTblEntry items that belong to DELETEs and ignore these
> > > when setting resultRelids in finalize_lockmodes().
> > >
> > > For #2, the only place I can see to do this is
> > > add_rtes_to_flat_rtable(), which would require either passing the
> > > PlannerInfo into the function, or at least its parse's commandType.
> > >
> > > I don't really like either, but don't have any other ideas at the moment.
> >
> > But we would still need the same lock level upgrade logic on indexes
> > for cases like CTE with a mix of INSERT, UPDATE and DELETE on the same
> > relation I think.  #1 seems like a better solution to me.
>
> I think I'd rather find some way to do it that didn't denormalise the
> parse nodes like that.  It seems very strange to have a CmdType in the
> Query struct, and then another set of them in RangeTblEntry. Besides
> bloating the size of the RangeTblEntry struct a bit, it also could
> lead to inconsistency bugs where the two CmdTypes differ, for some
> reason.

I had another go at this patch and fixed the problem by just setting
the idxlockmode inside the planner just before the call to
expand_inherited_tables().  This allows the lockmode to be copied into
child RTEs.  Doing it later in planning is more of a problem since we
don't really store all the result relations in PlannerInfo, we only
store the one that was written in the query.  The others are stored in
the ModifyTable path and then later in the ModifyTable plan.  The only
other place I could see to do it (without adding an extra field in
PlannerInfo) was during createplan... which does not at all seem like
the right place, and is also too late for get_relation_info(), see
below.

The finalize_lockmodes() now does the same thing for idxlockmode as it
does for rellockmode, i.e, just set the strongest lock for each unique
rel oid.

However, during my work on this, I saw a few things that made me
wonder if all this is over-engineered and worth the trouble.  The
first thing I saw was that in get_relation_info() we obtain a
RowExclusiveLock if the relation is the result relation. This means we
obtain a RowExclusiveLock for DELETE targets too.  This is
inconsistent with what the executor tries to do and results in us
taking a weaker lock during execution than we do during planning.  It
also means we don't bump into the lock in the local lock table which
results in slower locking. That problem would be exaggerated with
partitioned tables with a large number of partitions or inheritance
parents with lots of children, however, likely that'll be drowned out
by all the other slow stuff around there.

Another thing that's on my mind is that in the patch in
nodeModifyTable.c we still do:

if (resultRelInfo->ri_RelationDesc->rd_rel->relhasindex &&
operation != CMD_DELETE &&
resultRelInfo->ri_IndexRelationDescs == NULL)
ExecOpenIndices(resultRelInfo,
node->onConflictAction != ONCONFLICT_NONE);

i.e don't open the indexes for DELETEs.  I had ideas that maybe this
could be changed to check the idxlockmode and open the indexes if it's
above AccessSharedLock.  There didn't seem to be a very nice way to
fetch the RangeTblEntry from the ResultRelInfo though, so I didn't do
that, but leaving it as is does not really work well with the patch as
I really want the planner be the thing that decides what happens here.


My final thoughts were that I see a lot of work going on to implement
pluggable storage.  I ended up having an offlist conversation with
Thomas Munro about zheap and what its requirements were for locking
indexes during deletes. It seems modifying the index during delete is
not something that happens with the current version, but is planned
for future versions.  In any case, we can't really assume zheap is
going to be the only additional storage engine implementation that's
going to get hooked into this.

Current thoughts are, do we:

1. Allow the storage engine to communicate its locking needs via an
API function call and add code to check that in the
determine_index_lockmode function (new in the attached patch)
2. Just get rid of the "operation != CMD_DELETE &&" line in the above
code and just always lock indexes for DELETE targets in
RowExclusiveLock mode.

#2 would not address Tom's mention of there possibly being some way to
have the index scan node initialise before the modify table node
(currently we pass NoLock for indexes belonging to result rels in the
index scan nodes).  I can't quite imagine at the moment how that's
possible, but maybe my imagination is just not good enough.  We could
fix that by passing RowExclusiveLock instead of NoLock. It just means
we'd discover the lock already exists in the local lock table...
unless of course there is a case where the index scan gets initialised
before modify table is.

I'm adding Andres, Robert, Thomas, Amit and Haribabu due to the
involvement with pluggable storage and zheap.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: Inadequate executor locking of indexes

От
Amit Langote
Дата:
(this is not a reply to your full proposal, just something I thought to
point out)

On 2019/03/13 10:38, David Rowley wrote:
> i.e don't open the indexes for DELETEs.  I had ideas that maybe this
> could be changed to check the idxlockmode and open the indexes if it's
> above AccessSharedLock.  There didn't seem to be a very nice way to
> fetch the RangeTblEntry from the ResultRelInfo though,

Did you miss ri_RangeTableIndex?  It's the range table index of the result
relation for which a given ResultRelInfo is created.

Thanks,
Amit



Re: Inadequate executor locking of indexes

От
David Rowley
Дата:
On Wed, 13 Mar 2019 at 14:55, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
> Did you miss ri_RangeTableIndex?  It's the range table index of the result
> relation for which a given ResultRelInfo is created.

I did indeed. I'll hold off modifying the patch in favour of seeing
what other people think about what should be done here.

Thanks for pointing out ri_RangeTableIndex.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Inadequate executor locking of indexes

От
Amit Langote
Дата:
Hi David,

On 2019/03/13 10:38, David Rowley wrote:
> I had another go at this patch and fixed the problem by just setting
> the idxlockmode inside the planner just before the call to
> expand_inherited_tables().  This allows the lockmode to be copied into
> child RTEs.

I have one question about the relation between idxlockmode and
rellockmode?  From skimming the patch, it appears that they're almost
always set to the same value.  If so, why not use rellockmode for index
locking too?

> #2 would not address Tom's mention of there possibly being some way to
> have the index scan node initialise before the modify table node
> (currently we pass NoLock for indexes belonging to result rels in the
> index scan nodes).  I can't quite imagine at the moment how that's
> possible, but maybe my imagination is just not good enough.  We could
> fix that by passing RowExclusiveLock instead of NoLock. It just means
> we'd discover the lock already exists in the local lock table...
> unless of course there is a case where the index scan gets initialised
> before modify table is.

Tom gave an example upthread that looked like this:

explain (costs off) with x1 as materialized (select * from foo where a =
1), x2 as (update foo set a = a where a = 1 returning *) select * from x1,
x2 where x1.a = x2.a;
                     QUERY PLAN
────────────────────────────────────────────────────
 Hash Join
   Hash Cond: (x1.a = x2.a)
   CTE x1
     ->  Bitmap Heap Scan on foo
           Recheck Cond: (a = 1)
           ->  Bitmap Index Scan on foo_a_idx
                 Index Cond: (a = 1)
   CTE x2
     ->  Update on foo foo_1
           ->  Bitmap Heap Scan on foo foo_1
                 Recheck Cond: (a = 1)
                 ->  Bitmap Index Scan on foo_a_idx
                       Index Cond: (a = 1)
   ->  CTE Scan on x1
   ->  Hash
         ->  CTE Scan on x2
(16 rows)

When InitPlan() invokes ExecInitNode on the subplans of x1 and x2, it does
so in that order.  So, ExecInitBitmapIndexScan for x1 is called before
ExecInitModifyTable for x2.

But if finalize_lockmodes() in your patch set lockmodes correctly,
ExecInitBitmapIndexScan() called for x1 ought to lock the index with
RowExclusiveLock, no?

Thanks,
Amit



Re: Inadequate executor locking of indexes

От
David Rowley
Дата:
Thanks for having a look at this.

On Wed, 13 Mar 2019 at 22:45, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
> I have one question about the relation between idxlockmode and
> rellockmode?  From skimming the patch, it appears that they're almost
> always set to the same value.  If so, why not use rellockmode for index
> locking too?

Maybe that's possible, but it would mean giving up on locking indexes
during DELETE with AccessShareLock. That would become
RowExclusiveLock. Also FOR UPDATE would lock indexes with RowShareLock
instead of AccessShareLock.

> > #2 would not address Tom's mention of there possibly being some way to
> > have the index scan node initialise before the modify table node
> > (currently we pass NoLock for indexes belonging to result rels in the
> > index scan nodes).  I can't quite imagine at the moment how that's
> > possible, but maybe my imagination is just not good enough.  We could
> > fix that by passing RowExclusiveLock instead of NoLock. It just means
> > we'd discover the lock already exists in the local lock table...
> > unless of course there is a case where the index scan gets initialised
> > before modify table is.
>
> Tom gave an example upthread that looked like this:

[...]

> But if finalize_lockmodes() in your patch set lockmodes correctly,
> ExecInitBitmapIndexScan() called for x1 ought to lock the index with
> RowExclusiveLock, no?

Yeah, in the executor the bitmap index scan uses a RowExclusiveLock.
There's still the issue in the planner that get_relation_info() will
still take an AccessShareLock when planning the 1st CTE and then
upgrade it to RowExclusiveLock once it gets to the 2nd.  It's not
really very obvious how that can be fixed, but at least we don't start
using indexes without any locks.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Inadequate executor locking of indexes

От
Amit Langote
Дата:
On 2019/03/14 7:12, David Rowley wrote:
> Thanks for having a look at this.
> 
> On Wed, 13 Mar 2019 at 22:45, Amit Langote
> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>> I have one question about the relation between idxlockmode and
>> rellockmode?  From skimming the patch, it appears that they're almost
>> always set to the same value.  If so, why not use rellockmode for index
>> locking too?
> 
> Maybe that's possible, but it would mean giving up on locking indexes
> during DELETE with AccessShareLock. That would become
> RowExclusiveLock. Also FOR UPDATE would lock indexes with RowShareLock
> instead of AccessShareLock.

If the correct lock is taken in both cases by the current code, then maybe
there's no need to change anything?  What does idxlockmode improve about
the existing situation?  One thing I can imagine it improves is that we
don't need the potentially expensive ExecRelationIsTargetRelation() check
anymore, because idxlockmode gives that information for free.

>>> #2 would not address Tom's mention of there possibly being some way to
>>> have the index scan node initialise before the modify table node
>>> (currently we pass NoLock for indexes belonging to result rels in the
>>> index scan nodes).  I can't quite imagine at the moment how that's
>>> possible, but maybe my imagination is just not good enough.  We could
>>> fix that by passing RowExclusiveLock instead of NoLock. It just means
>>> we'd discover the lock already exists in the local lock table...
>>> unless of course there is a case where the index scan gets initialised
>>> before modify table is.
>>
>> Tom gave an example upthread that looked like this:
> 
> [...]
> 
>> But if finalize_lockmodes() in your patch set lockmodes correctly,
>> ExecInitBitmapIndexScan() called for x1 ought to lock the index with
>> RowExclusiveLock, no?
> 
> Yeah, in the executor the bitmap index scan uses a RowExclusiveLock.
> There's still the issue in the planner that get_relation_info() will
> still take an AccessShareLock when planning the 1st CTE and then
> upgrade it to RowExclusiveLock once it gets to the 2nd.  It's not
> really very obvious how that can be fixed, but at least we don't start
> using indexes without any locks.

If we're to fix that upgrade hazard, maybe we'll need a separate phase to
determine the strongest lock to take on each table referenced in different
parts of the query (across CTEs) that runs before
pg_analyze_and_rewrite()?  We surely can't use table OIDs as keys though.

Thanks,
Amit



Re: Inadequate executor locking of indexes

От
David Rowley
Дата:
On Thu, 14 Mar 2019 at 21:52, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
> If the correct lock is taken in both cases by the current code, then maybe
> there's no need to change anything?  What does idxlockmode improve about
> the existing situation?  One thing I can imagine it improves is that we
> don't need the potentially expensive ExecRelationIsTargetRelation() check
> anymore, because idxlockmode gives that information for free.

I'm aiming to fix the problem described by Tom when he started this
thread. It's pretty easy to get an assert failure in master.

create table t1(a int);
create index on t1(a);
prepare q1 as delete from t1 where a = 1;
execute q1;
execute q1;
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

this fails because modifytable does not open the indexes for DELETE
and the index scan node thinks it can get away with NoLock on the
index since it thinks modifytable already locked it. We hit the
CheckRelationLockedByMe() Assert in relation_open().

The idea with the patch is to a) fix this; and b) try and reduce the
chances of bugs in this area in the future by having something more
central decide the lock level we need rather than having various bits
of code that currently disagree with each other about what needs to be
done; and c) attempt to fix the lock upgrade hazard as best as we can.

Maybe you're right about being able to use rellockmode for indexes,
but I assume that we lowered the lock level for indexes for some
reason, and this would reverse that.

> If we're to fix that upgrade hazard, maybe we'll need a separate phase to
> determine the strongest lock to take on each table referenced in different
> parts of the query (across CTEs) that runs before
> pg_analyze_and_rewrite()?  We surely can't use table OIDs as keys though.

I don't think that's workable as we've yet to perform
expand_inherited_tables() and some other subquery might reference some
partition directly.

I think this borderline impossible to fix completely.  To do it just
for the planner we'll need to plan each subquery as far as determining
all relations that will be used. Probably at least as far as
expand_inherited_tables(), but not as far as calling
get_relation_info() on any relation. We'd then do the same for all the
other subqueries then do finalize_lockmodes(), only then could we call
get_relation_info(), otherwise, we might call it with too weak a lock
and later we might need to obtain a stronger lock.  But even then it's
possible we get lock upgrades in the parser too, for example, select *
from t1 a, t1 b for update of b;

Maybe if we can't fix that hazard completely, then it's not worth
adding any code for at all, but we still need something for a) and
fixing b) seems like a good idea too. If we don't care enough for c)
then we can use the patch without finalize_lockmodes().

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Inadequate executor locking of indexes

От
Tom Lane
Дата:
David Rowley <david.rowley@2ndquadrant.com> writes:
> On Thu, 14 Mar 2019 at 21:52, Amit Langote
> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>> If the correct lock is taken in both cases by the current code, then maybe
>> there's no need to change anything?

> I'm aiming to fix the problem described by Tom when he started this
> thread. It's pretty easy to get an assert failure in master.

Yeah, removing the Assert failure is a minimum requirement.  Whether
we need to do more than that can be debated.

> I think this borderline impossible to fix completely.

After further review I concur with that position.  The cases where
we have lock upgrade hazards are where some subquery uses a table
in a way that requires a stronger lock than is needed for some other
reference that was processed earlier.  It's kind of pointless to
guarantee that we avoid that in the planner or executor if the parser
already hit the problem; and it seems darn near impossible, and
certainly impractical, to avoid the problem while parsing.

(Actually, even if we fixed the parser, I'm not sure we'd be out of
the woods.  Consider a case where a subquery requires AccessShareLock
on some table, but then when we come to expand inheritance in the
planner, we discover that that same table is a child or partition of
some target table, so now we need a stronger lock on it.)

I think therefore that we should forget about the idea of avoiding
lock upgrade hazards, at least for situations where the hazard is
caused by conflicting table references that are all written by the
user.  That's not common, and since none of these lock types are
exclusive, the odds of an actual deadlock are low anyway.

> Maybe you're right about being able to use rellockmode for indexes,
> but I assume that we lowered the lock level for indexes for some
> reason, and this would reverse that.

I kind of think that we *should* use rellockmode for indexes too.
To the extent that we're doing differently from that now, I think
it's accidental not intentional.  It would perhaps have been difficult
to clean that up completely before we added rellockmode, but now that
we've got that we should use it.  I feel that adding a second field
for index lock mode would just be perpetuating some accidental
inconsistencies.

In short, I think we should take the parts of this patch that modify
the index_open calls, but make them use rte->rellockmode; and forget
all the other parts.

BTW, I'd also suggest expanding the header comment for 
ExecRelationIsTargetRelation to explain that it's no longer
used in the core code, but we keep it around because FDWs
might want it.

            regards, tom lane



Re: Inadequate executor locking of indexes

От
David Rowley
Дата:
On Wed, 3 Apr 2019 at 06:03, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> David Rowley <david.rowley@2ndquadrant.com> writes:
> > On Thu, 14 Mar 2019 at 21:52, Amit Langote
> > Maybe you're right about being able to use rellockmode for indexes,
> > but I assume that we lowered the lock level for indexes for some
> > reason, and this would reverse that.
>
> I kind of think that we *should* use rellockmode for indexes too.
> To the extent that we're doing differently from that now, I think
> it's accidental not intentional.  It would perhaps have been difficult
> to clean that up completely before we added rellockmode, but now that
> we've got that we should use it.  I feel that adding a second field
> for index lock mode would just be perpetuating some accidental
> inconsistencies.

Okay. That certainly makes the patch a good bit more simple.

> In short, I think we should take the parts of this patch that modify
> the index_open calls, but make them use rte->rellockmode; and forget
> all the other parts.

Done.

> BTW, I'd also suggest expanding the header comment for
> ExecRelationIsTargetRelation to explain that it's no longer
> used in the core code, but we keep it around because FDWs
> might want it.

Also done.

I also looked over other index_open() calls in the planner and found a
bunch of places in selfuncs.c that we open an index to grab some
information then close it again releasing the lock.  At this stage
get_relation_info() should have already grabbed what it needs and
stored it into an IndexOptInfo, so we might have no need to access the
index again. However, if any code was added that happened to assume
the index was already locked then we'd get the same Assert failure
that we're fixing here. I've ended up changing these calls so that
they also use rellockmode, which may make the lock just a trip to the
local lock table for relations that have rellockmode >
AccessShareLock.  I also changed the index_close to use NoLock so we
hold the lock.

I scanned around other usages of index_open() and saw that
gin_clean_pending_list() uses an AccessShareLock. That seems strange
since it modifies the index.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: Inadequate executor locking of indexes

От
Amit Langote
Дата:
Hi David,

Thanks for updating the patch.

On 2019/04/04 9:14, David Rowley wrote:
> I also looked over other index_open() calls in the planner and found a
> bunch of places in selfuncs.c that we open an index to grab some
> information then close it again releasing the lock.  At this stage
> get_relation_info() should have already grabbed what it needs and
> stored it into an IndexOptInfo, so we might have no need to access the
> index again. However, if any code was added that happened to assume
> the index was already locked then we'd get the same Assert failure
> that we're fixing here. I've ended up changing these calls so that
> they also use rellockmode, which may make the lock just a trip to the
> local lock table for relations that have rellockmode >
> AccessShareLock.  I also changed the index_close to use NoLock so we
> hold the lock.

Sorry, I didn't understand why it wouldn't be OK to pass NoLock to
index_open, for example, here:

@@ -5191,7 +5191,14 @@ get_actual_variable_range(PlannerInfo *root,
VariableStatData *vardata,
              * necessarily on the index.
              */
             heapRel = table_open(rte->relid, NoLock);
-            indexRel = index_open(index->indexoid, AccessShareLock);
+
+            /*
+             * We use the same lock level as the relation as it may have
+             * already been locked with that level.  Using the same lock level
+             * can save a trip to the shared lock manager.
+             */
+            Assert(rte->rellockmode != NoLock);
+            indexRel = index_open(index->indexoid, rte->rellockmode);

Especially seeing that the table itself is opened without lock.  If there
are any Assert failures, wouldn't that need to be fixed in the upstream
code (such as get_relation_info)?

Also, I noticed that there is infer_arbiter_indexes() too, which opens the
target table's indexes with RowExclusiveLock.  I thought for a second
that's a index-locking site in the planner that you may have missed, but
turns out it might very well be the first time those indexes are locked in
a given insert query's processing, because query_planner doesn't need to
plan access to the result relation, so get_relation_info is not called.

> I scanned around other usages of index_open() and saw that
> gin_clean_pending_list() uses an AccessShareLock. That seems strange
> since it modifies the index.

Yeah, other maintenance tasks modifying an index, such as
brin_summarize_range(), take ShareUpdateExclusiveLock.

Thanks,
Amit




Re: Inadequate executor locking of indexes

От
David Rowley
Дата:
On Thu, 4 Apr 2019 at 15:01, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
> Sorry, I didn't understand why it wouldn't be OK to pass NoLock to
> index_open, for example, here:
>
> @@ -5191,7 +5191,14 @@ get_actual_variable_range(PlannerInfo *root,
> VariableStatData *vardata,
>                          * necessarily on the index.
>                          */
>                         heapRel = table_open(rte->relid, NoLock);
> -                       indexRel = index_open(index->indexoid, AccessShareLock);
> +
> +                       /*
> +                        * We use the same lock level as the relation as it may have
> +                        * already been locked with that level.  Using the same lock level
> +                        * can save a trip to the shared lock manager.
> +                        */
> +                       Assert(rte->rellockmode != NoLock);
> +                       indexRel = index_open(index->indexoid, rte->rellockmode);
>
> Especially seeing that the table itself is opened without lock.  If there
> are any Assert failures, wouldn't that need to be fixed in the upstream
> code (such as get_relation_info)?

I put that Assert there to ensure that everything that calls it has
properly set RangeTblEntry's rellockmode. If there's some valid reason
for that to be NoLock then it's the wrong thing to do, but I can't
think of a valid reason.

> Also, I noticed that there is infer_arbiter_indexes() too, which opens the
> target table's indexes with RowExclusiveLock.  I thought for a second
> that's a index-locking site in the planner that you may have missed, but
> turns out it might very well be the first time those indexes are locked in
> a given insert query's processing, because query_planner doesn't need to
> plan access to the result relation, so get_relation_info is not called.

I skimmed over that and thought that the rellockmode would always be
RowExclusiveLock anyway, so didn't change it. However, it would make
sense to use rellockmode for consistency. We're already looking up the
RangeTblEntry to get the relid, so getting the rellockmode is about
free.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Inadequate executor locking of indexes

От
Amit Langote
Дата:
On 2019/04/04 11:13, David Rowley wrote:
> On Thu, 4 Apr 2019 at 15:01, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>> Sorry, I didn't understand why it wouldn't be OK to pass NoLock to
>> index_open, for example, here:
>>
>> @@ -5191,7 +5191,14 @@ get_actual_variable_range(PlannerInfo *root,
>> VariableStatData *vardata,
>>                          * necessarily on the index.
>>                          */
>>                         heapRel = table_open(rte->relid, NoLock);
>> -                       indexRel = index_open(index->indexoid, AccessShareLock);
>> +
>> +                       /*
>> +                        * We use the same lock level as the relation as it may have
>> +                        * already been locked with that level.  Using the same lock level
>> +                        * can save a trip to the shared lock manager.
>> +                        */
>> +                       Assert(rte->rellockmode != NoLock);
>> +                       indexRel = index_open(index->indexoid, rte->rellockmode);
>>
>> Especially seeing that the table itself is opened without lock.  If there
>> are any Assert failures, wouldn't that need to be fixed in the upstream
>> code (such as get_relation_info)?
> 
> I put that Assert there to ensure that everything that calls it has
> properly set RangeTblEntry's rellockmode. If there's some valid reason
> for that to be NoLock then it's the wrong thing to do, but I can't
> think of a valid reason.

Which means the index must have been locked with that mode somewhere
upstream, so there's no need to lock it again.  Why not save the local
hash table lookup too if we can?

BTW, get_actual_variable_range is static to selfuncs.c and other public
functions that are entry point to get_actual_variable_range's
functionality appear to require having built a RelOptInfo first, which
means (even for third party code) having gone through get_relation_info
and opened the indexes.  That may be too much wishful thinking though.

>> Also, I noticed that there is infer_arbiter_indexes() too, which opens the
>> target table's indexes with RowExclusiveLock.  I thought for a second
>> that's a index-locking site in the planner that you may have missed, but
>> turns out it might very well be the first time those indexes are locked in
>> a given insert query's processing, because query_planner doesn't need to
>> plan access to the result relation, so get_relation_info is not called.
> 
> I skimmed over that and thought that the rellockmode would always be
> RowExclusiveLock anyway, so didn't change it. However, it would make
> sense to use rellockmode for consistency. We're already looking up the
> RangeTblEntry to get the relid, so getting the rellockmode is about
> free.

Yeah, maybe it'd be a good idea, if only for consistency, to fetch the
rellockmode of the resultRelation RTE and use it for index_open.

Thanks,
Amit




Re: Inadequate executor locking of indexes

От
David Rowley
Дата:
On Thu, 4 Apr 2019 at 15:35, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
> BTW, get_actual_variable_range is static to selfuncs.c and other public
> functions that are entry point to get_actual_variable_range's
> functionality appear to require having built a RelOptInfo first, which
> means (even for third party code) having gone through get_relation_info
> and opened the indexes.  That may be too much wishful thinking though.

I think we should do this. We have the CheckRelationLockedByMe Asserts
to alert us if this ever gets broken. I think even if something added
a get_relation_info_hook to alter the indexes somehow, say to remove
one then it should still be okay as get_actual_variable_range() only
looks at index that are in that list and the other two functions are
dealing with Paths, which couldn't exist for an index that was removed
from RelOptInfo->indexlist.

> >> Also, I noticed that there is infer_arbiter_indexes() too, which opens the
> >> target table's indexes with RowExclusiveLock.  I thought for a second
> >> that's a index-locking site in the planner that you may have missed, but
> >> turns out it might very well be the first time those indexes are locked in
> >> a given insert query's processing, because query_planner doesn't need to
> >> plan access to the result relation, so get_relation_info is not called.
> >
> > I skimmed over that and thought that the rellockmode would always be
> > RowExclusiveLock anyway, so didn't change it. However, it would make
> > sense to use rellockmode for consistency. We're already looking up the
> > RangeTblEntry to get the relid, so getting the rellockmode is about
> > free.
>
> Yeah, maybe it'd be a good idea, if only for consistency, to fetch the
> rellockmode of the resultRelation RTE and use it for index_open.

I've changed infer_arbiter_indexes() too, but decided to use
rellockmode rather than NoLock since we're not using
RelOptInfo->indexlist.  If anything uses get_relation_info_hook to
remove indexes and then closes removed ones releasing the lock, then
we could end up with problems here.

v2 attached.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: Inadequate executor locking of indexes

От
David Rowley
Дата:
On Fri, 5 Apr 2019 at 02:22, David Rowley <david.rowley@2ndquadrant.com> wrote:
> v2 attached.

Wrong patch.  Here's what I meant to send.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: Inadequate executor locking of indexes

От
Tom Lane
Дата:
David Rowley <david.rowley@2ndquadrant.com> writes:
> Wrong patch.  Here's what I meant to send.

Pushed with some minor tweaking, mostly comments.

Some notes:

* We now have a general convention that queries always take the same lock
type on indexes as on their parent tables, but that convention is not
respected by index DDL.  I'm not sure if there is any additional cleanup
that should be done there.  The DDL code intends to block concurrent
execution of other DDL on the same index, in most cases, so it may be
fine.  At the very least, the different lock levels for those cases are
clearly intentional, while I don't think that was true for DML.

* I dropped the extra assertion you'd added to infer_arbiter_indexes,
as it didn't seem necessary or appropriate.  That function's just
interested in inspecting the index, so it doesn't need to assume anything
about how strong the lock is.  I think the comment that was there was
just trying to justify the shortcut of hard-wiring the lock level as
RowExclusiveLock.

* I went ahead and changed gin_clean_pending_list() to take
RowExclusiveLock not AccessShareLock on its target index.  I'm not quite
sure that AccessShareLock is an actual bug there; it may be that GIN's
internal conventions are such that that's safe.  But it sure seemed darn
peculiar, and at risk of causing future problems.

            regards, tom lane



Re: Inadequate executor locking of indexes

От
David Rowley
Дата:
On Fri, 5 Apr 2019 at 08:26, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Pushed with some minor tweaking, mostly comments.

Thanks for tweaking and pushing this.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Inadequate executor locking of indexes

От
Andres Freund
Дата:
Hi,

On 2018-11-23 17:41:26 +1300, David Rowley wrote:
> Ideally, the locking code would realise we already hold a stronger
> lock and skip the lock, but I don't see how that's realistically
> possible without probing the hash table for all stronger lock types
> first, which would likely damage the performance of locking.

That seems kind of a self-made problem to me. I don't think there's
anything really forcing us to have the locallock hashtable contain the
lockmode.  It'd not be a trivial change, but we could have the locallock
entry have enough information to allow us to avoid hitting the shared
table if we hold a stronger lock already.  The biggest complexity
probably would be that we'd need code to downgrade the shared lock we
currently hold, when the more heavyweight lock is released.


This made me look at LockAcquireExtended() just now. When we acquire a
lock that's weaker than one already held, and there's another backend
waiting for a conflicting lock, that only works if NOWAIT isn't
used. That's because only ProcSleep() gets around to checking whether we
already hold a stronger lock, but LockAcquireExtended() bails out for
NOWAIT long before that.  None of that is documented in
LockAcquireExtended(). Isn't that a bit weird?

Greetings,

Andres Freund